End-to-end Feature Selection Approach for Learning Skinny Trees

Shibal Ibrahim, Kayhan Behdin, Rahul Mazumder
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2863-2871, 2024.

Abstract

We propose a new optimization-based approach for feature selection in tree ensembles, an important problem in statistics and machine learning. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature selection post-training based on feature importance scores, while very popular, they are known to have drawbacks. We propose Skinny Trees: an end-to-end toolkit for feature selection in tree ensembles where we train a tree ensemble while controlling the number of selected features. Our optimization-based approach learns an ensemble of differentiable trees, and simultaneously performs feature selection using a grouped $\ell_0$-regularizer. We use first-order methods for optimization and present convergence guarantees for our approach. We use a dense-to-sparse regularization scheduling scheme that can lead to more expressive and sparser tree ensembles. On 15 synthetic and real-world datasets, Skinny Trees can achieve $1.5{\times}$–$620{\times}$ feature compression rates, leading up to $10{\times}$ faster inference over dense trees, without any loss in performance. Skinny Trees lead to superior feature selection than many existing toolkits e.g., in terms of AUC performance for 25% feature budget, Skinny Trees outperforms LightGBM by 10.2% (up to 37.7%), and Random Forests by 3% (up to 12.5%).

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-ibrahim24a, title = { End-to-end Feature Selection Approach for Learning Skinny Trees }, author = {Ibrahim, Shibal and Behdin, Kayhan and Mazumder, Rahul}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {2863--2871}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/ibrahim24a/ibrahim24a.pdf}, url = {https://proceedings.mlr.press/v238/ibrahim24a.html}, abstract = { We propose a new optimization-based approach for feature selection in tree ensembles, an important problem in statistics and machine learning. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature selection post-training based on feature importance scores, while very popular, they are known to have drawbacks. We propose Skinny Trees: an end-to-end toolkit for feature selection in tree ensembles where we train a tree ensemble while controlling the number of selected features. Our optimization-based approach learns an ensemble of differentiable trees, and simultaneously performs feature selection using a grouped $\ell_0$-regularizer. We use first-order methods for optimization and present convergence guarantees for our approach. We use a dense-to-sparse regularization scheduling scheme that can lead to more expressive and sparser tree ensembles. On 15 synthetic and real-world datasets, Skinny Trees can achieve $1.5{\times}$–$620{\times}$ feature compression rates, leading up to $10{\times}$ faster inference over dense trees, without any loss in performance. Skinny Trees lead to superior feature selection than many existing toolkits e.g., in terms of AUC performance for 25% feature budget, Skinny Trees outperforms LightGBM by 10.2% (up to 37.7%), and Random Forests by 3% (up to 12.5%). } }
Endnote
%0 Conference Paper %T End-to-end Feature Selection Approach for Learning Skinny Trees %A Shibal Ibrahim %A Kayhan Behdin %A Rahul Mazumder %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-ibrahim24a %I PMLR %P 2863--2871 %U https://proceedings.mlr.press/v238/ibrahim24a.html %V 238 %X We propose a new optimization-based approach for feature selection in tree ensembles, an important problem in statistics and machine learning. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature selection post-training based on feature importance scores, while very popular, they are known to have drawbacks. We propose Skinny Trees: an end-to-end toolkit for feature selection in tree ensembles where we train a tree ensemble while controlling the number of selected features. Our optimization-based approach learns an ensemble of differentiable trees, and simultaneously performs feature selection using a grouped $\ell_0$-regularizer. We use first-order methods for optimization and present convergence guarantees for our approach. We use a dense-to-sparse regularization scheduling scheme that can lead to more expressive and sparser tree ensembles. On 15 synthetic and real-world datasets, Skinny Trees can achieve $1.5{\times}$–$620{\times}$ feature compression rates, leading up to $10{\times}$ faster inference over dense trees, without any loss in performance. Skinny Trees lead to superior feature selection than many existing toolkits e.g., in terms of AUC performance for 25% feature budget, Skinny Trees outperforms LightGBM by 10.2% (up to 37.7%), and Random Forests by 3% (up to 12.5%).
APA
Ibrahim, S., Behdin, K. & Mazumder, R.. (2024). End-to-end Feature Selection Approach for Learning Skinny Trees . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2863-2871 Available from https://proceedings.mlr.press/v238/ibrahim24a.html.

Related Material