Statistical-Computational Tradeoffs in Mixed Sparse Linear Regression

Gabriel Arpino, Ramji Venkataramanan
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:921-986, 2023.

Abstract

We consider the problem of mixed sparse linear regression with two components, where two k-sparse signals β_1, β_2 ∈ R^p are to be recovered from n unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension (k = o(p)), and the additive noise is assumed to be independent Gaussian with variance σ^2. Prior work has shown that the problem suffers from a k/SNR^2 -to- k^2/SNR^2 statistical-to-computational gap, resembling other computationally challenging high- dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation (Brennan and Bresler, 2020b); here SNR := ∥β1∥^2/σ^2 = ∥β2∥^2/σ^2 is the signal-to-noise ratio. We establish the existence of a more extensive k/SNR^2 -to- k^2 (SNR+1)^2/SNR^2 computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity n and runtime exp( Θ(k^2(SNR + 1)^2/(nSNR^2))) for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity n = o(k^2). Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in O(np) time and matches the sample complexity required for (non-mixed) sparse linear regression of k(SNR+1)/SNR log p; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression. To the best of our knowledge, this is the first thorough study of the interplay between mixture symmetry, signal sparsity, and their joint impact on the computational hardness of mixed sparse linear regression.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-arpino23a, title = {Statistical-Computational Tradeoffs in Mixed Sparse Linear Regression}, author = {Arpino, Gabriel and Venkataramanan, Ramji}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {921--986}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/arpino23a/arpino23a.pdf}, url = {https://proceedings.mlr.press/v195/arpino23a.html}, abstract = {We consider the problem of mixed sparse linear regression with two components, where two k-sparse signals β_1, β_2 ∈ R^p are to be recovered from n unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension (k = o(p)), and the additive noise is assumed to be independent Gaussian with variance σ^2. Prior work has shown that the problem suffers from a k/SNR^2 -to- k^2/SNR^2 statistical-to-computational gap, resembling other computationally challenging high- dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation (Brennan and Bresler, 2020b); here SNR := ∥β1∥^2/σ^2 = ∥β2∥^2/σ^2 is the signal-to-noise ratio. We establish the existence of a more extensive k/SNR^2 -to- k^2 (SNR+1)^2/SNR^2 computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity n and runtime exp( Θ(k^2(SNR + 1)^2/(nSNR^2))) for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity n = o(k^2). Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in O(np) time and matches the sample complexity required for (non-mixed) sparse linear regression of k(SNR+1)/SNR log p; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression. To the best of our knowledge, this is the first thorough study of the interplay between mixture symmetry, signal sparsity, and their joint impact on the computational hardness of mixed sparse linear regression.} }
Endnote
%0 Conference Paper %T Statistical-Computational Tradeoffs in Mixed Sparse Linear Regression %A Gabriel Arpino %A Ramji Venkataramanan %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-arpino23a %I PMLR %P 921--986 %U https://proceedings.mlr.press/v195/arpino23a.html %V 195 %X We consider the problem of mixed sparse linear regression with two components, where two k-sparse signals β_1, β_2 ∈ R^p are to be recovered from n unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension (k = o(p)), and the additive noise is assumed to be independent Gaussian with variance σ^2. Prior work has shown that the problem suffers from a k/SNR^2 -to- k^2/SNR^2 statistical-to-computational gap, resembling other computationally challenging high- dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation (Brennan and Bresler, 2020b); here SNR := ∥β1∥^2/σ^2 = ∥β2∥^2/σ^2 is the signal-to-noise ratio. We establish the existence of a more extensive k/SNR^2 -to- k^2 (SNR+1)^2/SNR^2 computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity n and runtime exp( Θ(k^2(SNR + 1)^2/(nSNR^2))) for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity n = o(k^2). Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in O(np) time and matches the sample complexity required for (non-mixed) sparse linear regression of k(SNR+1)/SNR log p; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression. To the best of our knowledge, this is the first thorough study of the interplay between mixture symmetry, signal sparsity, and their joint impact on the computational hardness of mixed sparse linear regression.
APA
Arpino, G. & Venkataramanan, R.. (2023). Statistical-Computational Tradeoffs in Mixed Sparse Linear Regression. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:921-986 Available from https://proceedings.mlr.press/v195/arpino23a.html.

Related Material