Principal Component Regression with Semirandom Observations via Matrix Completion

Aditya Bhaskara, Aravinda Kanchana Ruwanpathirana, Maheshakya Wijewardena
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2665-2673, 2021.

Abstract

Principal Component Regression (PCR) is a popular method for prediction from data, and is one way to address the so-called multi-collinearity problem in regression. It was shown recently that algorithms for PCR such as hard singular value thresholding (HSVT) are also quite robust, in that they can handle data that has missing or noisy covariates. However, such spectral approaches require strong distributional assumptions on which entries are observed. Specifically, every covariate is assumed to be observed with probability (exactly) $p$, for some value of $p$. Our goal in this work is to weaken this requirement, and as a step towards this, we study a “semi-random” model. In this model, every covariate is revealed with probability $p$, and then an adversary comes in and reveals additional covariates. While the model seems intuitively easier, it is well known that algorithms such as HSVT perform poorly. Our approach is based on studying the closely related problem of Noisy Matrix Completion in a semi-random setting. By considering a new semidefinite programming relaxation, we develop new guarantees for matrix completion, which is our core technical contribution.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-bhaskara21a, title = { Principal Component Regression with Semirandom Observations via Matrix Completion }, author = {Bhaskara, Aditya and Kanchana Ruwanpathirana, Aravinda and Wijewardena, Maheshakya}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {2665--2673}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/bhaskara21a/bhaskara21a.pdf}, url = {https://proceedings.mlr.press/v130/bhaskara21a.html}, abstract = { Principal Component Regression (PCR) is a popular method for prediction from data, and is one way to address the so-called multi-collinearity problem in regression. It was shown recently that algorithms for PCR such as hard singular value thresholding (HSVT) are also quite robust, in that they can handle data that has missing or noisy covariates. However, such spectral approaches require strong distributional assumptions on which entries are observed. Specifically, every covariate is assumed to be observed with probability (exactly) $p$, for some value of $p$. Our goal in this work is to weaken this requirement, and as a step towards this, we study a “semi-random” model. In this model, every covariate is revealed with probability $p$, and then an adversary comes in and reveals additional covariates. While the model seems intuitively easier, it is well known that algorithms such as HSVT perform poorly. Our approach is based on studying the closely related problem of Noisy Matrix Completion in a semi-random setting. By considering a new semidefinite programming relaxation, we develop new guarantees for matrix completion, which is our core technical contribution. } }
Endnote
%0 Conference Paper %T Principal Component Regression with Semirandom Observations via Matrix Completion %A Aditya Bhaskara %A Aravinda Kanchana Ruwanpathirana %A Maheshakya Wijewardena %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-bhaskara21a %I PMLR %P 2665--2673 %U https://proceedings.mlr.press/v130/bhaskara21a.html %V 130 %X Principal Component Regression (PCR) is a popular method for prediction from data, and is one way to address the so-called multi-collinearity problem in regression. It was shown recently that algorithms for PCR such as hard singular value thresholding (HSVT) are also quite robust, in that they can handle data that has missing or noisy covariates. However, such spectral approaches require strong distributional assumptions on which entries are observed. Specifically, every covariate is assumed to be observed with probability (exactly) $p$, for some value of $p$. Our goal in this work is to weaken this requirement, and as a step towards this, we study a “semi-random” model. In this model, every covariate is revealed with probability $p$, and then an adversary comes in and reveals additional covariates. While the model seems intuitively easier, it is well known that algorithms such as HSVT perform poorly. Our approach is based on studying the closely related problem of Noisy Matrix Completion in a semi-random setting. By considering a new semidefinite programming relaxation, we develop new guarantees for matrix completion, which is our core technical contribution.
APA
Bhaskara, A., Kanchana Ruwanpathirana, A. & Wijewardena, M.. (2021). Principal Component Regression with Semirandom Observations via Matrix Completion . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2665-2673 Available from https://proceedings.mlr.press/v130/bhaskara21a.html.

Related Material