[edit]
DIFER: Differentiable Automated Feature Engineering
Proceedings of the First International Conference on Automated Machine Learning, PMLR 188:17/1-17, 2022.
Abstract
Feature engineering, a crucial step of machine learning, aims to extract useful features from raw data to improve model performance. In recent years, great efforts have been devoted to Automated Feature Engineering (AutoFE) to replace expensive human labor. However, all existing methods treat AutoFE as an optimization problem over a discrete feature space. Huge search space leads to significant computational overhead. Unlike previous work, we perform AutoFE in a continuous vector space and propose a differentiable method called DIFER in this paper. We first introduce a feature optimizer based on the encoder-predictor-decoder framework, which maps features into the continuous vector space via the encoder, optimizes the embedding along the gradient direction induced by the predictor, and recovers better features from the optimized embedding by the decoder. Based on the feature optimizer, we employ a feature evolution method to search for better features iteratively. Extensive experiments on classification and regression datasets demonstrate that DIFER can significantly outperform the state-of-the-art AutoFE methods in terms of both model performance and computational efficiency. The implementation of DIFER is avaialable on \url{https://anonymous.4open.science/r/DIFER-3FBC/}.