The All-or-Nothing Phenomenon in Sparse Linear Regression

Galen Reeves, Jiaming Xu, Ilias Zadik
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:2652-2663, 2019.

Abstract

We study the problem of recovering a hidden binary $k$-sparse $p$-dimensional vector $\beta$ from $n$ noisy linear observations $Y=X\beta+W$ where $X_{ij}$ are i.i.d. $\mathcal{N}(0,1)$ and $W_i$ are i.i.d. $\mathcal{N}(0,\sigma^2)$. A closely related hypothesis testing problem is to distinguish the pair $(X,Y)$ generated from this structured model from a corresponding null model where $(X,Y)$ consist of purely independent Gaussian entries. In the low sparsity $k=o(p)$ and high signal to noise ratio $k/\sigma^2=\Omega\left(1\right)$ regime, we establish an “All-or-Nothing” information-theoretic phase transition at a critical sample size $n^*=2 k\log \left(p/k\right) /\log \left(1+k/\sigma^2\right)$, resolving a conjecture of [GamarnikZadik17]. Specifically, we show that if $\liminf_{p\rightarrow \infty} n/n^*>1$, then the maximum likelihood estimator almost perfectly recovers the hidden vector with high probability and moreover the true hypothesis can be detected with a vanishing error probability. Conversely, if $\limsup_{p\rightarrow \infty} n/n^*<1$, then it becomes information-theoretically impossible even to recover an arbitrarily small but fixed fraction of the hidden vector support, or to test hypotheses strictly better than random guess. Our proof of the impossibility result builds upon two key techniques, which could be of independent interest. First, we use a conditional second moment method to upper bound the Kullback-Leibler (KL) divergence between the structured and the null model. Second, inspired by the celebrated area theorem, we establish a lower bound to the minimum mean squared estimation error of the hidden vector in terms of the KL divergence between the two models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v99-reeves19a, title = {The All-or-Nothing Phenomenon in Sparse Linear Regression}, author = {Reeves, Galen and Xu, Jiaming and Zadik, Ilias}, booktitle = {Proceedings of the Thirty-Second Conference on Learning Theory}, pages = {2652--2663}, year = {2019}, editor = {Beygelzimer, Alina and Hsu, Daniel}, volume = {99}, series = {Proceedings of Machine Learning Research}, month = {25--28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v99/reeves19a/reeves19a.pdf}, url = {https://proceedings.mlr.press/v99/reeves19a.html}, abstract = {We study the problem of recovering a hidden binary $k$-sparse $p$-dimensional vector $\beta$ from $n$ noisy linear observations $Y=X\beta+W$ where $X_{ij}$ are i.i.d. $\mathcal{N}(0,1)$ and $W_i$ are i.i.d. $\mathcal{N}(0,\sigma^2)$. A closely related hypothesis testing problem is to distinguish the pair $(X,Y)$ generated from this structured model from a corresponding null model where $(X,Y)$ consist of purely independent Gaussian entries. In the low sparsity $k=o(p)$ and high signal to noise ratio $k/\sigma^2=\Omega\left(1\right)$ regime, we establish an “All-or-Nothing” information-theoretic phase transition at a critical sample size $n^*=2 k\log \left(p/k\right) /\log \left(1+k/\sigma^2\right)$, resolving a conjecture of [GamarnikZadik17]. Specifically, we show that if $\liminf_{p\rightarrow \infty} n/n^*>1$, then the maximum likelihood estimator almost perfectly recovers the hidden vector with high probability and moreover the true hypothesis can be detected with a vanishing error probability. Conversely, if $\limsup_{p\rightarrow \infty} n/n^*<1$, then it becomes information-theoretically impossible even to recover an arbitrarily small but fixed fraction of the hidden vector support, or to test hypotheses strictly better than random guess. Our proof of the impossibility result builds upon two key techniques, which could be of independent interest. First, we use a conditional second moment method to upper bound the Kullback-Leibler (KL) divergence between the structured and the null model. Second, inspired by the celebrated area theorem, we establish a lower bound to the minimum mean squared estimation error of the hidden vector in terms of the KL divergence between the two models.} }
Endnote
%0 Conference Paper %T The All-or-Nothing Phenomenon in Sparse Linear Regression %A Galen Reeves %A Jiaming Xu %A Ilias Zadik %B Proceedings of the Thirty-Second Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2019 %E Alina Beygelzimer %E Daniel Hsu %F pmlr-v99-reeves19a %I PMLR %P 2652--2663 %U https://proceedings.mlr.press/v99/reeves19a.html %V 99 %X We study the problem of recovering a hidden binary $k$-sparse $p$-dimensional vector $\beta$ from $n$ noisy linear observations $Y=X\beta+W$ where $X_{ij}$ are i.i.d. $\mathcal{N}(0,1)$ and $W_i$ are i.i.d. $\mathcal{N}(0,\sigma^2)$. A closely related hypothesis testing problem is to distinguish the pair $(X,Y)$ generated from this structured model from a corresponding null model where $(X,Y)$ consist of purely independent Gaussian entries. In the low sparsity $k=o(p)$ and high signal to noise ratio $k/\sigma^2=\Omega\left(1\right)$ regime, we establish an “All-or-Nothing” information-theoretic phase transition at a critical sample size $n^*=2 k\log \left(p/k\right) /\log \left(1+k/\sigma^2\right)$, resolving a conjecture of [GamarnikZadik17]. Specifically, we show that if $\liminf_{p\rightarrow \infty} n/n^*>1$, then the maximum likelihood estimator almost perfectly recovers the hidden vector with high probability and moreover the true hypothesis can be detected with a vanishing error probability. Conversely, if $\limsup_{p\rightarrow \infty} n/n^*<1$, then it becomes information-theoretically impossible even to recover an arbitrarily small but fixed fraction of the hidden vector support, or to test hypotheses strictly better than random guess. Our proof of the impossibility result builds upon two key techniques, which could be of independent interest. First, we use a conditional second moment method to upper bound the Kullback-Leibler (KL) divergence between the structured and the null model. Second, inspired by the celebrated area theorem, we establish a lower bound to the minimum mean squared estimation error of the hidden vector in terms of the KL divergence between the two models.
APA
Reeves, G., Xu, J. & Zadik, I.. (2019). The All-or-Nothing Phenomenon in Sparse Linear Regression. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:2652-2663 Available from https://proceedings.mlr.press/v99/reeves19a.html.

Related Material