Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps

Jonathan Kelner; Frederic Koehler; Raghu Meka; Dhruv Rohatgi

Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps

Jonathan Kelner, Frederic Koehler, Raghu Meka, Dhruv Rohatgi

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:2840-2886, 2024.

Abstract

It is well-known that the statistical performance of Lasso can suffer significantly when the covariates of interest have strong correlations. In particular, the prediction error of Lasso becomes much worse than computationally inefficient alternatives like Best Subset Selection. Due to a large conjectured computational-statistical tradeoff in the problem of sparse linear regression, it may be impossible to close this gap in general. In this work, we propose a natural sparse linear regression setting where strong correlations between covariates arise from unobserved latent variables. In this setting, we analyze the problem caused by strong correlations and design a surprisingly simple fix. While Lasso with standard normalization of covariates fails, there exists a heterogeneous scaling of the covariates with which Lasso will suddenly obtain strong provable guarantees for estimation. Moreover, we design a simple, efficient procedure for computing such a “smart scaling.” The sample complexity of the resulting “rescaled Lasso” algorithm incurs (in the worst case) quadratic dependence on the sparsity of the underlying signal. While this dependence is not information-theoretically necessary, we give evidence that it is optimal among the class of polynomial-time algorithms, via the method of low-degree polynomials. This argument reveals a new connection between sparse linear regression and a special version of sparse PCA with a \emph{near-critical negative spike}. The latter problem can be thought of as a real-valued analogue of learning a sparse parity. Using it, we also establish the first computational-statistical gap for the closely related problem of learning a Gaussian Graphical Model.

Cite this Paper

BibTeX

@InProceedings{pmlr-v247-kelner24a,
  title = 	 {Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps},
  author =       {Kelner, Jonathan and Koehler, Frederic and Meka, Raghu and Rohatgi, Dhruv},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {2840--2886},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/kelner24a/kelner24a.pdf},
  url = 	 {https://proceedings.mlr.press/v247/kelner24a.html},
  abstract = 	 {It is well-known that the statistical performance of Lasso can suffer significantly when the covariates of interest have strong correlations. In particular, the prediction error of Lasso becomes much worse than computationally inefficient alternatives like Best Subset Selection. Due to a large conjectured computational-statistical tradeoff in the problem of sparse linear regression, it may be impossible to close this gap in general. In this work, we propose a natural sparse linear regression setting where strong correlations between covariates arise from unobserved latent variables. In this setting, we analyze the problem caused by strong correlations and design a surprisingly simple fix. While Lasso with standard normalization of covariates fails, there exists a heterogeneous scaling of the covariates with which Lasso will suddenly obtain strong provable guarantees for estimation. Moreover, we design a simple, efficient procedure for computing such a “smart scaling.” The sample complexity of the resulting “rescaled Lasso” algorithm incurs (in the worst case) quadratic dependence on the sparsity of the underlying signal. While this dependence is not information-theoretically necessary, we give evidence that it is optimal among the class of polynomial-time algorithms, via the method of low-degree polynomials. This argument reveals a new connection between sparse linear regression and a special version of sparse PCA with a \emph{near-critical negative spike}. The latter problem can be thought of as a real-valued analogue of learning a sparse parity. Using it, we also establish the first computational-statistical gap for the closely related problem of learning a Gaussian Graphical Model.}
}

Endnote

%0 Conference Paper
%T Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps
%A Jonathan Kelner
%A Frederic Koehler
%A Raghu Meka
%A Dhruv Rohatgi
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-kelner24a
%I PMLR
%P 2840--2886
%U https://proceedings.mlr.press/v247/kelner24a.html
%V 247
%X It is well-known that the statistical performance of Lasso can suffer significantly when the covariates of interest have strong correlations. In particular, the prediction error of Lasso becomes much worse than computationally inefficient alternatives like Best Subset Selection. Due to a large conjectured computational-statistical tradeoff in the problem of sparse linear regression, it may be impossible to close this gap in general. In this work, we propose a natural sparse linear regression setting where strong correlations between covariates arise from unobserved latent variables. In this setting, we analyze the problem caused by strong correlations and design a surprisingly simple fix. While Lasso with standard normalization of covariates fails, there exists a heterogeneous scaling of the covariates with which Lasso will suddenly obtain strong provable guarantees for estimation. Moreover, we design a simple, efficient procedure for computing such a “smart scaling.” The sample complexity of the resulting “rescaled Lasso” algorithm incurs (in the worst case) quadratic dependence on the sparsity of the underlying signal. While this dependence is not information-theoretically necessary, we give evidence that it is optimal among the class of polynomial-time algorithms, via the method of low-degree polynomials. This argument reveals a new connection between sparse linear regression and a special version of sparse PCA with a \emph{near-critical negative spike}. The latter problem can be thought of as a real-valued analogue of learning a sparse parity. Using it, we also establish the first computational-statistical gap for the closely related problem of learning a Gaussian Graphical Model.

APA

Kelner, J., Koehler, F., Meka, R. & Rohatgi, D.. (2024). Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:2840-2886 Available from https://proceedings.mlr.press/v247/kelner24a.html.

Related Material

Download PDF