Feature selection using e-values

Subhabrata Majumdar; Snigdhansu Chatterjee

Feature selection using e-values

Subhabrata Majumdar, Snigdhansu Chatterjee

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:14753-14773, 2022.

Abstract

In the context of supervised learning, we introduce the concept of e-value. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. For a p-dimensional feature space, this requires fitting only the full model and evaluating p+1 models, as opposed to the traditional requirement of fitting and evaluating 2^p models. The above e-values framework is applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure, providing consistency results. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values can be a promising general alternative to existing model-specific methods of feature selection.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-majumdar22a,
  title = 	 {Feature selection using e-values},
  author =       {Majumdar, Subhabrata and Chatterjee, Snigdhansu},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {14753--14773},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/majumdar22a/majumdar22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/majumdar22a.html},
  abstract = 	 {In the context of supervised learning, we introduce the concept of e-value. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. For a p-dimensional feature space, this requires fitting only the full model and evaluating p+1 models, as opposed to the traditional requirement of fitting and evaluating 2^p models. The above e-values framework is applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure, providing consistency results. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values can be a promising general alternative to existing model-specific methods of feature selection.}
}

Endnote

%0 Conference Paper
%T Feature selection using e-values
%A Subhabrata Majumdar
%A Snigdhansu Chatterjee
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-majumdar22a
%I PMLR
%P 14753--14773
%U https://proceedings.mlr.press/v162/majumdar22a.html
%V 162
%X In the context of supervised learning, we introduce the concept of e-value. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. For a p-dimensional feature space, this requires fitting only the full model and evaluating p+1 models, as opposed to the traditional requirement of fitting and evaluating 2^p models. The above e-values framework is applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure, providing consistency results. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values can be a promising general alternative to existing model-specific methods of feature selection.

APA


Majumdar, S. & Chatterjee, S.. (2022). Feature selection using e-values. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:14753-14773 Available from https://proceedings.mlr.press/v162/majumdar22a.html.

Related Material

Download PDF