Feature Selection in High-Dimensional Classification

Mladen Kolar, Han Liu
; Proceedings of the 30th International Conference on Machine Learning, PMLR 28(1):329-337, 2013.

Abstract

High-dimensional discriminant analysis is of fundamental importance in multivariate statistics. Existing theoretical results sharply characterize different procedures, providing sharp convergence results for the classification risk, as well as the l2 convergence results to the discriminative rule. However, sharp theoretical results for the problem of variable selection have not been established, even though model interpretation is of importance in many scientific domains. In this paper, we bridge this gap by providing sharp sufficient conditions for consistent variable selection using the ROAD estimator (Fan et al., 2010). Our results provide novel theoretical insights for the ROAD estimator. Sufficient conditions are complemented by the necessary information theoretic limits on variable selection in high-dimensional discriminant analysis. This complementary result also establishes optimality of the ROAD estimator for a certain family of problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-kolar13, title = {Feature Selection in High-Dimensional Classification}, author = {Mladen Kolar and Han Liu}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {329--337}, year = {2013}, editor = {Sanjoy Dasgupta and David McAllester}, volume = {28}, number = {1}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/kolar13.pdf}, url = {http://proceedings.mlr.press/v28/kolar13.html}, abstract = {High-dimensional discriminant analysis is of fundamental importance in multivariate statistics. Existing theoretical results sharply characterize different procedures, providing sharp convergence results for the classification risk, as well as the l2 convergence results to the discriminative rule. However, sharp theoretical results for the problem of variable selection have not been established, even though model interpretation is of importance in many scientific domains. In this paper, we bridge this gap by providing sharp sufficient conditions for consistent variable selection using the ROAD estimator (Fan et al., 2010). Our results provide novel theoretical insights for the ROAD estimator. Sufficient conditions are complemented by the necessary information theoretic limits on variable selection in high-dimensional discriminant analysis. This complementary result also establishes optimality of the ROAD estimator for a certain family of problems. } }
Endnote
%0 Conference Paper %T Feature Selection in High-Dimensional Classification %A Mladen Kolar %A Han Liu %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-kolar13 %I PMLR %J Proceedings of Machine Learning Research %P 329--337 %U http://proceedings.mlr.press %V 28 %N 1 %W PMLR %X High-dimensional discriminant analysis is of fundamental importance in multivariate statistics. Existing theoretical results sharply characterize different procedures, providing sharp convergence results for the classification risk, as well as the l2 convergence results to the discriminative rule. However, sharp theoretical results for the problem of variable selection have not been established, even though model interpretation is of importance in many scientific domains. In this paper, we bridge this gap by providing sharp sufficient conditions for consistent variable selection using the ROAD estimator (Fan et al., 2010). Our results provide novel theoretical insights for the ROAD estimator. Sufficient conditions are complemented by the necessary information theoretic limits on variable selection in high-dimensional discriminant analysis. This complementary result also establishes optimality of the ROAD estimator for a certain family of problems.
RIS
TY - CPAPER TI - Feature Selection in High-Dimensional Classification AU - Mladen Kolar AU - Han Liu BT - Proceedings of the 30th International Conference on Machine Learning PY - 2013/02/13 DA - 2013/02/13 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-kolar13 PB - PMLR SP - 329 DP - PMLR EP - 337 L1 - http://proceedings.mlr.press/v28/kolar13.pdf UR - http://proceedings.mlr.press/v28/kolar13.html AB - High-dimensional discriminant analysis is of fundamental importance in multivariate statistics. Existing theoretical results sharply characterize different procedures, providing sharp convergence results for the classification risk, as well as the l2 convergence results to the discriminative rule. However, sharp theoretical results for the problem of variable selection have not been established, even though model interpretation is of importance in many scientific domains. In this paper, we bridge this gap by providing sharp sufficient conditions for consistent variable selection using the ROAD estimator (Fan et al., 2010). Our results provide novel theoretical insights for the ROAD estimator. Sufficient conditions are complemented by the necessary information theoretic limits on variable selection in high-dimensional discriminant analysis. This complementary result also establishes optimality of the ROAD estimator for a certain family of problems. ER -
APA
Kolar, M. & Liu, H.. (2013). Feature Selection in High-Dimensional Classification. Proceedings of the 30th International Conference on Machine Learning, in PMLR 28(1):329-337

Related Material