Ultra-high Dimensional Multiple Output Learning With Simultaneous Orthogonal Matching Pursuit: Screening Approach

Mladen Kolar; Eric Xing

Ultra-high Dimensional Multiple Output Learning With Simultaneous Orthogonal Matching Pursuit: Screening Approach

Mladen Kolar, Eric Xing

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:413-420, 2010.

Abstract

We propose a novel application of the Simultaneous Orthogonal Matching Pursuit (S-OMP) procedure to perform variable selection in ultra-high dimensional multiple output regression problems, which is the first attempt to utilize multiple outputs to perform fast removal of the irrelevant variables. As our main theoretical contribution, we show that the S-OMP can be used to reduce an ultra-high number of variables to below the sample size, without losing relevant variables. We also provide formal evidence that the modified Bayesian information criterion (BIC) can be used to efficiently select the number of iterations in the S-OMP. Once the number of variables has been reduced to a manageable size, we show that a more computationally demanding procedure can be used to identify the relevant variables for each of the regression outputs. We further provide evidence on the benefit of variable selection using the regression outputs jointly, as opposed to performing variable selection for each output separately. The finite sample performance of the S-OMP has been demonstrated on extensive simulation studies.

Cite this Paper

BibTeX


@InProceedings{pmlr-v9-kolar10a,
  title = 	 {Ultra-high Dimensional Multiple Output Learning With Simultaneous Orthogonal Matching Pursuit: Screening Approach},
  author = 	 {Kolar, Mladen and Xing, Eric},
  booktitle = 	 {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {413--420},
  year = 	 {2010},
  editor = 	 {Teh, Yee Whye and Titterington, Mike},
  volume = 	 {9},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Chia Laguna Resort, Sardinia, Italy},
  month = 	 {13--15 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v9/kolar10a/kolar10a.pdf},
  url = 	 {https://proceedings.mlr.press/v9/kolar10a.html},
  abstract = 	 {We propose a novel application of the Simultaneous Orthogonal Matching Pursuit (S-OMP) procedure to perform variable selection in ultra-high dimensional multiple output regression problems, which is the first attempt to utilize multiple outputs to perform fast removal of the irrelevant variables. As our main theoretical contribution, we show that the S-OMP can be used to reduce an ultra-high number of variables to below the sample size, without losing relevant variables.  We also provide formal evidence that the modified Bayesian information criterion (BIC) can be used to efficiently select the number of iterations in the S-OMP. Once the number of variables has been reduced to a manageable size, we show that a more computationally demanding procedure can be used to identify the relevant variables for each of the regression outputs. We further provide evidence on the benefit of variable selection using the regression outputs jointly, as opposed to performing variable selection for each output separately. The finite sample performance of the S-OMP has been demonstrated on extensive simulation studies.}
}

Endnote

%0 Conference Paper
%T Ultra-high Dimensional Multiple Output Learning With Simultaneous Orthogonal Matching Pursuit: Screening Approach
%A Mladen Kolar
%A Eric Xing
%B Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2010
%E Yee Whye Teh
%E Mike Titterington	
%F pmlr-v9-kolar10a
%I PMLR
%P 413--420
%U https://proceedings.mlr.press/v9/kolar10a.html
%V 9
%X We propose a novel application of the Simultaneous Orthogonal Matching Pursuit (S-OMP) procedure to perform variable selection in ultra-high dimensional multiple output regression problems, which is the first attempt to utilize multiple outputs to perform fast removal of the irrelevant variables. As our main theoretical contribution, we show that the S-OMP can be used to reduce an ultra-high number of variables to below the sample size, without losing relevant variables.  We also provide formal evidence that the modified Bayesian information criterion (BIC) can be used to efficiently select the number of iterations in the S-OMP. Once the number of variables has been reduced to a manageable size, we show that a more computationally demanding procedure can be used to identify the relevant variables for each of the regression outputs. We further provide evidence on the benefit of variable selection using the regression outputs jointly, as opposed to performing variable selection for each output separately. The finite sample performance of the S-OMP has been demonstrated on extensive simulation studies.

RIS


TY  - CPAPER
TI  - Ultra-high Dimensional Multiple Output Learning With Simultaneous Orthogonal Matching Pursuit: Screening Approach
AU  - Mladen Kolar
AU  - Eric Xing
BT  - Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
DA  - 2010/03/31
ED  - Yee Whye Teh
ED  - Mike Titterington	
ID  - pmlr-v9-kolar10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 9
SP  - 413
EP  - 420
L1  - http://proceedings.mlr.press/v9/kolar10a/kolar10a.pdf
UR  - https://proceedings.mlr.press/v9/kolar10a.html
AB  - We propose a novel application of the Simultaneous Orthogonal Matching Pursuit (S-OMP) procedure to perform variable selection in ultra-high dimensional multiple output regression problems, which is the first attempt to utilize multiple outputs to perform fast removal of the irrelevant variables. As our main theoretical contribution, we show that the S-OMP can be used to reduce an ultra-high number of variables to below the sample size, without losing relevant variables.  We also provide formal evidence that the modified Bayesian information criterion (BIC) can be used to efficiently select the number of iterations in the S-OMP. Once the number of variables has been reduced to a manageable size, we show that a more computationally demanding procedure can be used to identify the relevant variables for each of the regression outputs. We further provide evidence on the benefit of variable selection using the regression outputs jointly, as opposed to performing variable selection for each output separately. The finite sample performance of the S-OMP has been demonstrated on extensive simulation studies.
ER  -

APA


Kolar, M. & Xing, E.. (2010). Ultra-high Dimensional Multiple Output Learning With Simultaneous Orthogonal Matching Pursuit: Screening Approach. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 9:413-420 Available from https://proceedings.mlr.press/v9/kolar10a.html.

Ultra-high Dimensional Multiple Output Learning With Simultaneous Orthogonal Matching Pursuit: Screening Approach

Abstract

Cite this Paper

Related Material