Exploiting tree-based variable importances to selectively identify relevant variables

Vân Anh Huynh-Thu, Louis Wehenkel, Pierre Geurts
Proceedings of the Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery at ECML/PKDD 2008, PMLR 4:60-73, 2008.

Abstract

This paper proposes a novel statistical procedure based on permutation tests for extracting a subset of truly relevant variables from multivariate importance rankings derived from tree-based supervised learning methods. It shows also that the direct extension of the classical approach based on permutation tests for estimating false discovery rates of univariate variable scoring procedures does not extend very well to the case of multivariate tree-based importance measures.

Cite this Paper


BibTeX
@InProceedings{pmlr-v4-huynhthu08a, title = {Exploiting tree-based variable importances to selectively identify relevant variables}, author = {Huynh-Thu, Vân Anh and Wehenkel, Louis and Geurts, Pierre}, booktitle = {Proceedings of the Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery at ECML/PKDD 2008}, pages = {60--73}, year = {2008}, editor = {Saeys, Yvan and Liu, Huan and Inza, Iñaki and Wehenkel, Louis and Pee, Yves Van de}, volume = {4}, series = {Proceedings of Machine Learning Research}, address = {Antwerp, Belgium}, month = {15 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v4/huynhthu08a/huynhthu08a.pdf}, url = {https://proceedings.mlr.press/v4/huynhthu08a.html}, abstract = {This paper proposes a novel statistical procedure based on permutation tests for extracting a subset of truly relevant variables from multivariate importance rankings derived from tree-based supervised learning methods. It shows also that the direct extension of the classical approach based on permutation tests for estimating false discovery rates of univariate variable scoring procedures does not extend very well to the case of multivariate tree-based importance measures.} }
Endnote
%0 Conference Paper %T Exploiting tree-based variable importances to selectively identify relevant variables %A Vân Anh Huynh-Thu %A Louis Wehenkel %A Pierre Geurts %B Proceedings of the Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery at ECML/PKDD 2008 %C Proceedings of Machine Learning Research %D 2008 %E Yvan Saeys %E Huan Liu %E Iñaki Inza %E Louis Wehenkel %E Yves Van de Pee %F pmlr-v4-huynhthu08a %I PMLR %P 60--73 %U https://proceedings.mlr.press/v4/huynhthu08a.html %V 4 %X This paper proposes a novel statistical procedure based on permutation tests for extracting a subset of truly relevant variables from multivariate importance rankings derived from tree-based supervised learning methods. It shows also that the direct extension of the classical approach based on permutation tests for estimating false discovery rates of univariate variable scoring procedures does not extend very well to the case of multivariate tree-based importance measures.
RIS
TY - CPAPER TI - Exploiting tree-based variable importances to selectively identify relevant variables AU - Vân Anh Huynh-Thu AU - Louis Wehenkel AU - Pierre Geurts BT - Proceedings of the Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery at ECML/PKDD 2008 DA - 2008/09/11 ED - Yvan Saeys ED - Huan Liu ED - Iñaki Inza ED - Louis Wehenkel ED - Yves Van de Pee ID - pmlr-v4-huynhthu08a PB - PMLR DP - Proceedings of Machine Learning Research VL - 4 SP - 60 EP - 73 L1 - http://proceedings.mlr.press/v4/huynhthu08a/huynhthu08a.pdf UR - https://proceedings.mlr.press/v4/huynhthu08a.html AB - This paper proposes a novel statistical procedure based on permutation tests for extracting a subset of truly relevant variables from multivariate importance rankings derived from tree-based supervised learning methods. It shows also that the direct extension of the classical approach based on permutation tests for estimating false discovery rates of univariate variable scoring procedures does not extend very well to the case of multivariate tree-based importance measures. ER -
APA
Huynh-Thu, V.A., Wehenkel, L. & Geurts, P.. (2008). Exploiting tree-based variable importances to selectively identify relevant variables. Proceedings of the Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery at ECML/PKDD 2008, in Proceedings of Machine Learning Research 4:60-73 Available from https://proceedings.mlr.press/v4/huynhthu08a.html.

Related Material