Exploiting tree-based variable importances to selectively identify relevant variables

Vân Anh Huynh-Thu; Louis Wehenkel; Pierre Geurts

Exploiting tree-based variable importances to selectively identify relevant variables

Vân Anh Huynh-Thu, Louis Wehenkel, Pierre Geurts

Proceedings of the Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery at ECML/PKDD 2008, PMLR 4:60-73, 2008.

Abstract

This paper proposes a novel statistical procedure based on permutation tests for extracting a subset of truly relevant variables from multivariate importance rankings derived from tree-based supervised learning methods. It shows also that the direct extension of the classical approach based on permutation tests for estimating false discovery rates of univariate variable scoring procedures does not extend very well to the case of multivariate tree-based importance measures.

Cite this Paper

BibTeX


@InProceedings{pmlr-v4-huynhthu08a,
  title = 	 {Exploiting tree-based variable importances to selectively identify relevant variables},
  author = 	 {Huynh-Thu, Vân Anh and Wehenkel, Louis and Geurts, Pierre},
  booktitle = 	 {Proceedings of the Workshop on New Challenges for Feature Selection in Data  Mining and Knowledge Discovery at ECML/PKDD 2008},
  pages = 	 {60--73},
  year = 	 {2008},
  editor = 	 {Saeys, Yvan and Liu, Huan and Inza, Iñaki and Wehenkel, Louis and Pee, Yves Van de},
  volume = 	 {4},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Antwerp, Belgium},
  month = 	 {15 Sep},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v4/huynhthu08a/huynhthu08a.pdf},
  url = 	 {https://proceedings.mlr.press/v4/huynhthu08a.html},
  abstract = 	 {This  paper   proposes  a  novel  statistical   procedure  based  on   permutation  tests  for  extracting   a  subset  of  truly  relevant   variables  from   multivariate  importance  rankings   derived  from   tree-based  supervised learning  methods.   It shows  also that  the   direct  extension of  the  classical approach  based on  permutation   tests for  estimating false  discovery rates of  univariate variable   scoring  procedures  does  not  extend  very well  to  the  case  of   multivariate  tree-based  importance  measures.}
}

Endnote

%0 Conference Paper
%T Exploiting tree-based variable importances to selectively identify relevant variables
%A Vân Anh Huynh-Thu
%A Louis Wehenkel
%A Pierre Geurts
%B Proceedings of the Workshop on New Challenges for Feature Selection in Data  Mining and Knowledge Discovery at ECML/PKDD 2008
%C Proceedings of Machine Learning Research
%D 2008
%E Yvan Saeys
%E Huan Liu
%E Iñaki Inza
%E Louis Wehenkel
%E Yves Van de Pee	
%F pmlr-v4-huynhthu08a
%I PMLR
%P 60--73
%U https://proceedings.mlr.press/v4/huynhthu08a.html
%V 4
%X This  paper   proposes  a  novel  statistical   procedure  based  on   permutation  tests  for  extracting   a  subset  of  truly  relevant   variables  from   multivariate  importance  rankings   derived  from   tree-based  supervised learning  methods.   It shows  also that  the   direct  extension of  the  classical approach  based on  permutation   tests for  estimating false  discovery rates of  univariate variable   scoring  procedures  does  not  extend  very well  to  the  case  of   multivariate  tree-based  importance  measures.

RIS


TY  - CPAPER
TI  - Exploiting tree-based variable importances to selectively identify relevant variables
AU  - Vân Anh Huynh-Thu
AU  - Louis Wehenkel
AU  - Pierre Geurts
BT  - Proceedings of the Workshop on New Challenges for Feature Selection in Data  Mining and Knowledge Discovery at ECML/PKDD 2008
DA  - 2008/09/11
ED  - Yvan Saeys
ED  - Huan Liu
ED  - Iñaki Inza
ED  - Louis Wehenkel
ED  - Yves Van de Pee	
ID  - pmlr-v4-huynhthu08a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 4
SP  - 60
EP  - 73
L1  - http://proceedings.mlr.press/v4/huynhthu08a/huynhthu08a.pdf
UR  - https://proceedings.mlr.press/v4/huynhthu08a.html
AB  - This  paper   proposes  a  novel  statistical   procedure  based  on   permutation  tests  for  extracting   a  subset  of  truly  relevant   variables  from   multivariate  importance  rankings   derived  from   tree-based  supervised learning  methods.   It shows  also that  the   direct  extension of  the  classical approach  based on  permutation   tests for  estimating false  discovery rates of  univariate variable   scoring  procedures  does  not  extend  very well  to  the  case  of   multivariate  tree-based  importance  measures.
ER  -

APA


Huynh-Thu, V.A., Wehenkel, L. & Geurts, P.. (2008). Exploiting tree-based variable importances to selectively identify relevant variables. Proceedings of the Workshop on New Challenges for Feature Selection in Data  Mining and Knowledge Discovery at ECML/PKDD 2008, in Proceedings of Machine Learning Research 4:60-73 Available from https://proceedings.mlr.press/v4/huynhthu08a.html.

Related Material

Download PDF