AutoFE: Automating Feature Engineering to Improve Machine Learning Performance | By Nusudanai Wangpratham | May 2023

Machine Learning


Nusudanai Wang Prasam
data driven investor

Feature engineering is a key step in the machine learning process. Feature engineering extracts relevant information, normalizes data, and incorporates domain expertise to help models better understand underlying patterns, make more accurate predictions, and improve the effectiveness and accuracy of machine learning models. can be improved. .

However, feature engineering is considered complex. It requires data understanding, domain knowledge, feature selection, extraction, trade-offs, and the iterative nature of the process. Creating meaningful features that capture relevant information while avoiding overfitting requires deep data understanding, domain expertise, and careful decision-making. So today I will be discussing his AutoFE tool.

The AutoFE system automates feature engineering to generate a large set of new interpretable features by combining information from the original features. We use evolutionary algorithms to discover a set of features that significantly improve the performance of traditional classification. The system is effective and robust, achieving an average 25.24% improvement in predictive performance for all classification algorithms compared to the baseline performance obtained with the original features. The advantage of AutoFE is that it can save the time and effort of manually engineering features and significantly improve the performance of machine learning models.

AutoFE is designed to automate the key feature engineering for building highly successful machine learning models. The problem AutoFE solves is to combine the information of the original features to generate a large set of interpretable new features and use evolutionary algorithms to discover feature sets that significantly improve the performance of traditional classification. That’s it.

The AutoFE architecture consists of a feature generator, a splitter, a distributed system of feature selectors, and an evaluator. A feature generator uses the current feature set to generate a large new feature set. A splitter splits the dataset into a training set and a validation set. A distributed system of feature selectors selects a subset of features from the set of generated features. Evaluators use classification algorithms to assess the performance of selected features. AutoFE’s architecture is designed to be scalable and efficient, taking advantage of parallelism to speed up the feature selection process, as shown below.

AutoFE is an automatic feature engineering tool that combines information from original features to generate a large set of interpretable features. We use evolutionary algorithms to discover features that significantly improve the performance of traditional classification algorithms. The system is effective and robust, with average predictive performance improvements. AutoFE automates feature engineering to save time and effort while improving machine learning model performance. Its architecture is designed to be scalable and efficient, leveraging parallelism to expedite the feature selection process.

Example in Python: https://github.com/AxsPlayer/Tool_Auto-FE/blob/master/auto-FE/auto_fe_test.ipynb



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *