OpenFE: Automated Feature Generation with Expert-level Performance

Tianping Zhang; Zheyu Aqa Zhang; Zhiyuan Fan; Haoyan Luo; Fengyuan Liu; Qian Liu; Wei Cao; Li Jian

OpenFE: Automated Feature Generation with Expert-level Performance

Tianping Zhang, Zheyu Aqa Zhang, Zhiyuan Fan, Haoyan Luo, Fengyuan Liu, Qian Liu, Wei Cao, Li Jian

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:41880-41901, 2023.

Abstract

The goal of automated feature generation is to liberate machine learning experts from the laborious task of manual feature generation, which is crucial for improving the learning performance of tabular data. The major challenge in automated feature generation is to efficiently and accurately identify effective features from a vast pool of candidate features. In this paper, we present OpenFE, an automated feature generation tool that provides competitive results against machine learning experts. OpenFE achieves high efficiency and accuracy with two components: 1) a novel feature boosting method for accurately evaluating the incremental performance of candidate features and 2) a two-stage pruning algorithm that performs feature pruning in a coarse-to-fine manner. Extensive experiments on ten benchmark datasets show that OpenFE outperforms existing baseline methods by a large margin. We further evaluate OpenFE in two Kaggle competitions with thousands of data science teams participating. In the two competitions, features generated by OpenFE with a simple baseline model can beat 99.3% and 99.6% data science teams respectively. In addition to the empirical results, we provide a theoretical perspective to show that feature generation can be beneficial in a simple yet representative setting.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-zhang23ay,
  title = 	 {{O}pen{FE}: Automated Feature Generation with Expert-level Performance},
  author =       {Zhang, Tianping and Zhang, Zheyu Aqa and Fan, Zhiyuan and Luo, Haoyan and Liu, Fengyuan and Liu, Qian and Cao, Wei and Jian, Li},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {41880--41901},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/zhang23ay/zhang23ay.pdf},
  url = 	 {https://proceedings.mlr.press/v202/zhang23ay.html},
  abstract = 	 {The goal of automated feature generation is to liberate machine learning experts from the laborious task of manual feature generation, which is crucial for improving the learning performance of tabular data. The major challenge in automated feature generation is to efficiently and accurately identify effective features from a vast pool of candidate features. In this paper, we present OpenFE, an automated feature generation tool that provides competitive results against machine learning experts. OpenFE achieves high efficiency and accuracy with two components: 1) a novel feature boosting method for accurately evaluating the incremental performance of candidate features and 2) a two-stage pruning algorithm that performs feature pruning in a coarse-to-fine manner. Extensive experiments on ten benchmark datasets show that OpenFE outperforms existing baseline methods by a large margin. We further evaluate OpenFE in two Kaggle competitions with thousands of data science teams participating. In the two competitions, features generated by OpenFE with a simple baseline model can beat 99.3% and 99.6% data science teams respectively. In addition to the empirical results, we provide a theoretical perspective to show that feature generation can be beneficial in a simple yet representative setting.}
}

Endnote

%0 Conference Paper
%T OpenFE: Automated Feature Generation with Expert-level Performance
%A Tianping Zhang
%A Zheyu Aqa Zhang
%A Zhiyuan Fan
%A Haoyan Luo
%A Fengyuan Liu
%A Qian Liu
%A Wei Cao
%A Li Jian
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-zhang23ay
%I PMLR
%P 41880--41901
%U https://proceedings.mlr.press/v202/zhang23ay.html
%V 202
%X The goal of automated feature generation is to liberate machine learning experts from the laborious task of manual feature generation, which is crucial for improving the learning performance of tabular data. The major challenge in automated feature generation is to efficiently and accurately identify effective features from a vast pool of candidate features. In this paper, we present OpenFE, an automated feature generation tool that provides competitive results against machine learning experts. OpenFE achieves high efficiency and accuracy with two components: 1) a novel feature boosting method for accurately evaluating the incremental performance of candidate features and 2) a two-stage pruning algorithm that performs feature pruning in a coarse-to-fine manner. Extensive experiments on ten benchmark datasets show that OpenFE outperforms existing baseline methods by a large margin. We further evaluate OpenFE in two Kaggle competitions with thousands of data science teams participating. In the two competitions, features generated by OpenFE with a simple baseline model can beat 99.3% and 99.6% data science teams respectively. In addition to the empirical results, we provide a theoretical perspective to show that feature generation can be beneficial in a simple yet representative setting.

APA


Zhang, T., Zhang, Z.A., Fan, Z., Luo, H., Liu, F., Liu, Q., Cao, W. & Jian, L.. (2023). OpenFE: Automated Feature Generation with Expert-level Performance. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:41880-41901 Available from https://proceedings.mlr.press/v202/zhang23ay.html.

OpenFE: Automated Feature Generation with Expert-level Performance

Abstract

Cite this Paper

Related Material