DIFER: Differentiable Automated Feature Engineering

Guanghui Zhu, Zhuoer Xu, Chunfeng Yuan, Yihua Huang
Proceedings of the First International Conference on Automated Machine Learning, PMLR 188:17/1-17, 2022.

Abstract

Feature engineering, a crucial step of machine learning, aims to extract useful features from raw data to improve model performance. In recent years, great efforts have been devoted to Automated Feature Engineering (AutoFE) to replace expensive human labor. However, all existing methods treat AutoFE as an optimization problem over a discrete feature space. Huge search space leads to significant computational overhead. Unlike previous work, we perform AutoFE in a continuous vector space and propose a differentiable method called DIFER in this paper. We first introduce a feature optimizer based on the encoder-predictor-decoder framework, which maps features into the continuous vector space via the encoder, optimizes the embedding along the gradient direction induced by the predictor, and recovers better features from the optimized embedding by the decoder. Based on the feature optimizer, we employ a feature evolution method to search for better features iteratively. Extensive experiments on classification and regression datasets demonstrate that DIFER can significantly outperform the state-of-the-art AutoFE methods in terms of both model performance and computational efficiency. The implementation of DIFER is avaialable on \url{https://anonymous.4open.science/r/DIFER-3FBC/}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v188-zhu22a, title = {DIFER: Differentiable Automated Feature Engineering}, author = {Zhu, Guanghui and Xu, Zhuoer and Yuan, Chunfeng and Huang, Yihua}, booktitle = {Proceedings of the First International Conference on Automated Machine Learning}, pages = {17/1--17}, year = {2022}, editor = {Guyon, Isabelle and Lindauer, Marius and van der Schaar, Mihaela and Hutter, Frank and Garnett, Roman}, volume = {188}, series = {Proceedings of Machine Learning Research}, month = {25--27 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v188/zhu22a/zhu22a.pdf}, url = {https://proceedings.mlr.press/v188/zhu22a.html}, abstract = {Feature engineering, a crucial step of machine learning, aims to extract useful features from raw data to improve model performance. In recent years, great efforts have been devoted to Automated Feature Engineering (AutoFE) to replace expensive human labor. However, all existing methods treat AutoFE as an optimization problem over a discrete feature space. Huge search space leads to significant computational overhead. Unlike previous work, we perform AutoFE in a continuous vector space and propose a differentiable method called DIFER in this paper. We first introduce a feature optimizer based on the encoder-predictor-decoder framework, which maps features into the continuous vector space via the encoder, optimizes the embedding along the gradient direction induced by the predictor, and recovers better features from the optimized embedding by the decoder. Based on the feature optimizer, we employ a feature evolution method to search for better features iteratively. Extensive experiments on classification and regression datasets demonstrate that DIFER can significantly outperform the state-of-the-art AutoFE methods in terms of both model performance and computational efficiency. The implementation of DIFER is avaialable on \url{https://anonymous.4open.science/r/DIFER-3FBC/}.} }
Endnote
%0 Conference Paper %T DIFER: Differentiable Automated Feature Engineering %A Guanghui Zhu %A Zhuoer Xu %A Chunfeng Yuan %A Yihua Huang %B Proceedings of the First International Conference on Automated Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Isabelle Guyon %E Marius Lindauer %E Mihaela van der Schaar %E Frank Hutter %E Roman Garnett %F pmlr-v188-zhu22a %I PMLR %P 17/1--17 %U https://proceedings.mlr.press/v188/zhu22a.html %V 188 %X Feature engineering, a crucial step of machine learning, aims to extract useful features from raw data to improve model performance. In recent years, great efforts have been devoted to Automated Feature Engineering (AutoFE) to replace expensive human labor. However, all existing methods treat AutoFE as an optimization problem over a discrete feature space. Huge search space leads to significant computational overhead. Unlike previous work, we perform AutoFE in a continuous vector space and propose a differentiable method called DIFER in this paper. We first introduce a feature optimizer based on the encoder-predictor-decoder framework, which maps features into the continuous vector space via the encoder, optimizes the embedding along the gradient direction induced by the predictor, and recovers better features from the optimized embedding by the decoder. Based on the feature optimizer, we employ a feature evolution method to search for better features iteratively. Extensive experiments on classification and regression datasets demonstrate that DIFER can significantly outperform the state-of-the-art AutoFE methods in terms of both model performance and computational efficiency. The implementation of DIFER is avaialable on \url{https://anonymous.4open.science/r/DIFER-3FBC/}.
APA
Zhu, G., Xu, Z., Yuan, C. & Huang, Y.. (2022). DIFER: Differentiable Automated Feature Engineering. Proceedings of the First International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 188:17/1-17 Available from https://proceedings.mlr.press/v188/zhu22a.html.

Related Material