Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery

Yudong Chen, Constantine Caramanis
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(1):383-391, 2013.

Abstract

Many models for sparse regression typically assume that the covariates are known completely, and without noise. Particularly in high-dimensional applications, this is often not the case. Worse yet, even estimating statistics of the noise (the noise covariance) can be a central challenge. In this paper we develop a simple variant of orthogonal matching pursuit (OMP) for precisely this setting. We show that without knowledge of the noise covariance, our algorithm recovers the support, and we provide matching lower bounds that show that our algorithm performs at the minimax optimal rate. While simple, this is the first algorithm that (provably) recovers support in a noise-distribution-oblivious manner. When knowledge of the noise-covariance is available, our algorithm matches the best-known \ell^2-recovery bounds available. We show that these too are min-max optimal. Along the way, we also obtain improved performance guarantees for OMP for the standard sparse regression problem with Gaussian noise.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-chen13d, title = {Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery}, author = {Chen, Yudong and Caramanis, Constantine}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {383--391}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {1}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/chen13d.pdf}, url = {https://proceedings.mlr.press/v28/chen13d.html}, abstract = {Many models for sparse regression typically assume that the covariates are known completely, and without noise. Particularly in high-dimensional applications, this is often not the case. Worse yet, even estimating statistics of the noise (the noise covariance) can be a central challenge. In this paper we develop a simple variant of orthogonal matching pursuit (OMP) for precisely this setting. We show that without knowledge of the noise covariance, our algorithm recovers the support, and we provide matching lower bounds that show that our algorithm performs at the minimax optimal rate. While simple, this is the first algorithm that (provably) recovers support in a noise-distribution-oblivious manner. When knowledge of the noise-covariance is available, our algorithm matches the best-known \ell^2-recovery bounds available. We show that these too are min-max optimal. Along the way, we also obtain improved performance guarantees for OMP for the standard sparse regression problem with Gaussian noise.} }
Endnote
%0 Conference Paper %T Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery %A Yudong Chen %A Constantine Caramanis %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-chen13d %I PMLR %P 383--391 %U https://proceedings.mlr.press/v28/chen13d.html %V 28 %N 1 %X Many models for sparse regression typically assume that the covariates are known completely, and without noise. Particularly in high-dimensional applications, this is often not the case. Worse yet, even estimating statistics of the noise (the noise covariance) can be a central challenge. In this paper we develop a simple variant of orthogonal matching pursuit (OMP) for precisely this setting. We show that without knowledge of the noise covariance, our algorithm recovers the support, and we provide matching lower bounds that show that our algorithm performs at the minimax optimal rate. While simple, this is the first algorithm that (provably) recovers support in a noise-distribution-oblivious manner. When knowledge of the noise-covariance is available, our algorithm matches the best-known \ell^2-recovery bounds available. We show that these too are min-max optimal. Along the way, we also obtain improved performance guarantees for OMP for the standard sparse regression problem with Gaussian noise.
RIS
TY - CPAPER TI - Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery AU - Yudong Chen AU - Constantine Caramanis BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/02/13 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-chen13d PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 1 SP - 383 EP - 391 L1 - http://proceedings.mlr.press/v28/chen13d.pdf UR - https://proceedings.mlr.press/v28/chen13d.html AB - Many models for sparse regression typically assume that the covariates are known completely, and without noise. Particularly in high-dimensional applications, this is often not the case. Worse yet, even estimating statistics of the noise (the noise covariance) can be a central challenge. In this paper we develop a simple variant of orthogonal matching pursuit (OMP) for precisely this setting. We show that without knowledge of the noise covariance, our algorithm recovers the support, and we provide matching lower bounds that show that our algorithm performs at the minimax optimal rate. While simple, this is the first algorithm that (provably) recovers support in a noise-distribution-oblivious manner. When knowledge of the noise-covariance is available, our algorithm matches the best-known \ell^2-recovery bounds available. We show that these too are min-max optimal. Along the way, we also obtain improved performance guarantees for OMP for the standard sparse regression problem with Gaussian noise. ER -
APA
Chen, Y. & Caramanis, C.. (2013). Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(1):383-391 Available from https://proceedings.mlr.press/v28/chen13d.html.

Related Material