Feature engineering is a critical but time-consuming task in machine learning. In particular, in cases where raw features can be transformed and combined into new features, the search space is exponentially large. Existing feature selection methods try to identify the best representations. However, the selected feature representations are often very complex, hard to understand, and might suffer from overfitting. Therefore, we propose a system that leverages feature set complexity to prune the huge feature search space. Preliminary experiments show that our system generates representations that are less complex, yield higher classification accuracy, and generalize better to unseen data than current state-of-the-art feature selection and construction methods.
To run the experiments, first, you need to set the paths in a configuration file with the name of your machine. Examples can be found here: ~/new_project/fastsklearnfeature/configuration/resources
We provide a small jupyter notebook as an example: Example Notebook
cd new_project/
python3.7 -m pip install .
We already applied our system for the datasets Blood Transfusion Service Center, Banknote Authentication, Ecoli, Statlog (Heart), German Credit, House Prices: