Predictive Correlation Screening: Application to Two-stage Predictor Design in High Dimension

Hamed Firouzi, Bala Rajaratnam, Alfred Hero III
Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, PMLR 31:274-288, 2013.

Abstract

We introduce a new approach to variable selection, called Predictive Correlation Screening, for predictor design. Predictive Correlation Screening (PCS) implements false positive control on the selected variables, is well suited to small sample sizes, and is scalable to high dimensions. We establish asymptotic bounds for Familywise Error Rate (FWER), and resultant mean square error of a linear predictor on the selected variables. We apply Predictive Correlation Screening to the following two-stage predictor design problem. An experimenter wants to learn a multivariate predictor of gene expressions based on successive biological samples assayed on mRNA arrays. She assays the whole genome on a few samples and from these assays she selects a small number of variables using Predictive Correlation Screening. To reduce assay cost, she subsequently assays only the selected variables on the remaining samples, to learn the predictor coefficients. We show superiority of Predictive Correlation Screening relative to LASSO and correlation learning (sometimes popularly referred to in the literature as marginal regression or simple thresholding) in terms of performance and computational complexity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v31-firouzi13a, title = {Predictive Correlation Screening: Application to Two-stage Predictor Design in High Dimension}, author = {Firouzi, Hamed and Rajaratnam, Bala and Hero III, Alfred}, booktitle = {Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics}, pages = {274--288}, year = {2013}, editor = {Carvalho, Carlos M. and Ravikumar, Pradeep}, volume = {31}, series = {Proceedings of Machine Learning Research}, address = {Scottsdale, Arizona, USA}, month = {29 Apr--01 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v31/firouzi13a.pdf}, url = {https://proceedings.mlr.press/v31/firouzi13a.html}, abstract = {We introduce a new approach to variable selection, called Predictive Correlation Screening, for predictor design. Predictive Correlation Screening (PCS) implements false positive control on the selected variables, is well suited to small sample sizes, and is scalable to high dimensions. We establish asymptotic bounds for Familywise Error Rate (FWER), and resultant mean square error of a linear predictor on the selected variables. We apply Predictive Correlation Screening to the following two-stage predictor design problem. An experimenter wants to learn a multivariate predictor of gene expressions based on successive biological samples assayed on mRNA arrays. She assays the whole genome on a few samples and from these assays she selects a small number of variables using Predictive Correlation Screening. To reduce assay cost, she subsequently assays only the selected variables on the remaining samples, to learn the predictor coefficients. We show superiority of Predictive Correlation Screening relative to LASSO and correlation learning (sometimes popularly referred to in the literature as marginal regression or simple thresholding) in terms of performance and computational complexity.} }
Endnote
%0 Conference Paper %T Predictive Correlation Screening: Application to Two-stage Predictor Design in High Dimension %A Hamed Firouzi %A Bala Rajaratnam %A Alfred Hero III %B Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2013 %E Carlos M. Carvalho %E Pradeep Ravikumar %F pmlr-v31-firouzi13a %I PMLR %P 274--288 %U https://proceedings.mlr.press/v31/firouzi13a.html %V 31 %X We introduce a new approach to variable selection, called Predictive Correlation Screening, for predictor design. Predictive Correlation Screening (PCS) implements false positive control on the selected variables, is well suited to small sample sizes, and is scalable to high dimensions. We establish asymptotic bounds for Familywise Error Rate (FWER), and resultant mean square error of a linear predictor on the selected variables. We apply Predictive Correlation Screening to the following two-stage predictor design problem. An experimenter wants to learn a multivariate predictor of gene expressions based on successive biological samples assayed on mRNA arrays. She assays the whole genome on a few samples and from these assays she selects a small number of variables using Predictive Correlation Screening. To reduce assay cost, she subsequently assays only the selected variables on the remaining samples, to learn the predictor coefficients. We show superiority of Predictive Correlation Screening relative to LASSO and correlation learning (sometimes popularly referred to in the literature as marginal regression or simple thresholding) in terms of performance and computational complexity.
RIS
TY - CPAPER TI - Predictive Correlation Screening: Application to Two-stage Predictor Design in High Dimension AU - Hamed Firouzi AU - Bala Rajaratnam AU - Alfred Hero III BT - Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics DA - 2013/04/29 ED - Carlos M. Carvalho ED - Pradeep Ravikumar ID - pmlr-v31-firouzi13a PB - PMLR DP - Proceedings of Machine Learning Research VL - 31 SP - 274 EP - 288 L1 - http://proceedings.mlr.press/v31/firouzi13a.pdf UR - https://proceedings.mlr.press/v31/firouzi13a.html AB - We introduce a new approach to variable selection, called Predictive Correlation Screening, for predictor design. Predictive Correlation Screening (PCS) implements false positive control on the selected variables, is well suited to small sample sizes, and is scalable to high dimensions. We establish asymptotic bounds for Familywise Error Rate (FWER), and resultant mean square error of a linear predictor on the selected variables. We apply Predictive Correlation Screening to the following two-stage predictor design problem. An experimenter wants to learn a multivariate predictor of gene expressions based on successive biological samples assayed on mRNA arrays. She assays the whole genome on a few samples and from these assays she selects a small number of variables using Predictive Correlation Screening. To reduce assay cost, she subsequently assays only the selected variables on the remaining samples, to learn the predictor coefficients. We show superiority of Predictive Correlation Screening relative to LASSO and correlation learning (sometimes popularly referred to in the literature as marginal regression or simple thresholding) in terms of performance and computational complexity. ER -
APA
Firouzi, H., Rajaratnam, B. & Hero III, A.. (2013). Predictive Correlation Screening: Application to Two-stage Predictor Design in High Dimension. Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 31:274-288 Available from https://proceedings.mlr.press/v31/firouzi13a.html.

Related Material