Random Subspace with Trees for Feature Selection Under Memory Constraints

Antonio Sutera, Célia Châtel, Gilles Louppe, Louis Wehenkel, Pierre Geurts
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:929-937, 2018.

Abstract

Dealing with datasets of very high dimension is a major challenge in machine learning. In this paper, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this setting, we propose a novel tree-based feature selection approach that builds a sequence of randomized trees on small subsamples of variables mixing both variables already identified as relevant by previous models and variables randomly selected among the other variables. As our main contribution, we provide an in-depth theoretical analysis of this method in infinite sample setting. In particular, we study its soundness with respect to common definitions of feature relevance and its convergence speed under various variable dependance scenarios. We also provide some preliminary empirical results highlighting the potential of the approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-sutera18a, title = {Random Subspace with Trees for Feature Selection Under Memory Constraints}, author = {Sutera, Antonio and Châtel, Célia and Louppe, Gilles and Wehenkel, Louis and Geurts, Pierre}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {929--937}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/sutera18a/sutera18a.pdf}, url = {https://proceedings.mlr.press/v84/sutera18a.html}, abstract = {Dealing with datasets of very high dimension is a major challenge in machine learning. In this paper, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this setting, we propose a novel tree-based feature selection approach that builds a sequence of randomized trees on small subsamples of variables mixing both variables already identified as relevant by previous models and variables randomly selected among the other variables. As our main contribution, we provide an in-depth theoretical analysis of this method in infinite sample setting. In particular, we study its soundness with respect to common definitions of feature relevance and its convergence speed under various variable dependance scenarios. We also provide some preliminary empirical results highlighting the potential of the approach.} }
Endnote
%0 Conference Paper %T Random Subspace with Trees for Feature Selection Under Memory Constraints %A Antonio Sutera %A Célia Châtel %A Gilles Louppe %A Louis Wehenkel %A Pierre Geurts %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-sutera18a %I PMLR %P 929--937 %U https://proceedings.mlr.press/v84/sutera18a.html %V 84 %X Dealing with datasets of very high dimension is a major challenge in machine learning. In this paper, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this setting, we propose a novel tree-based feature selection approach that builds a sequence of randomized trees on small subsamples of variables mixing both variables already identified as relevant by previous models and variables randomly selected among the other variables. As our main contribution, we provide an in-depth theoretical analysis of this method in infinite sample setting. In particular, we study its soundness with respect to common definitions of feature relevance and its convergence speed under various variable dependance scenarios. We also provide some preliminary empirical results highlighting the potential of the approach.
APA
Sutera, A., Châtel, C., Louppe, G., Wehenkel, L. & Geurts, P.. (2018). Random Subspace with Trees for Feature Selection Under Memory Constraints. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:929-937 Available from https://proceedings.mlr.press/v84/sutera18a.html.

Related Material