Computational-Statistical Gaps in Gaussian Single-Index Models (Extended Abstract)

Alex Damian, Loucas Pillaud-Vivien, Jason Lee, Joan Bruna
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:1262-1262, 2024.

Abstract

Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation. As such, they encompass a broad class of statistical inference tasks, and provide a rich template to study statistical and computational trade-offs in the high-dimensional regime. While the information-theoretic sample complexity to recover the hidden direction is linear in the dimension $d$, we show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $\Omega(d^{k^\star/2})$ samples, where $k^\star$ is a “generative” exponent associated with the model that we explicitly characterize. Moreover, we show that this sample complexity is also sufficient, by establishing matching upper bounds using a partial-trace algorithm. Therefore, our results provide evidence of a sharp computational-to-statistical gap (under both the SQ and LDP class) whenever $k^\star>2$. To complete the study, we construct smooth and Lipschitz deterministic target functions with arbitrarily large generative exponents $k^\star$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-damian24a, title = {Computational-Statistical Gaps in Gaussian Single-Index Models (Extended Abstract)}, author = {Damian, Alex and Pillaud-Vivien, Loucas and Lee, Jason and Bruna, Joan}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {1262--1262}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/damian24a/damian24a.pdf}, url = {https://proceedings.mlr.press/v247/damian24a.html}, abstract = {Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation. As such, they encompass a broad class of statistical inference tasks, and provide a rich template to study statistical and computational trade-offs in the high-dimensional regime. While the information-theoretic sample complexity to recover the hidden direction is linear in the dimension $d$, we show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $\Omega(d^{k^\star/2})$ samples, where $k^\star$ is a “generative” exponent associated with the model that we explicitly characterize. Moreover, we show that this sample complexity is also sufficient, by establishing matching upper bounds using a partial-trace algorithm. Therefore, our results provide evidence of a sharp computational-to-statistical gap (under both the SQ and LDP class) whenever $k^\star>2$. To complete the study, we construct smooth and Lipschitz deterministic target functions with arbitrarily large generative exponents $k^\star$.} }
Endnote
%0 Conference Paper %T Computational-Statistical Gaps in Gaussian Single-Index Models (Extended Abstract) %A Alex Damian %A Loucas Pillaud-Vivien %A Jason Lee %A Joan Bruna %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-damian24a %I PMLR %P 1262--1262 %U https://proceedings.mlr.press/v247/damian24a.html %V 247 %X Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation. As such, they encompass a broad class of statistical inference tasks, and provide a rich template to study statistical and computational trade-offs in the high-dimensional regime. While the information-theoretic sample complexity to recover the hidden direction is linear in the dimension $d$, we show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $\Omega(d^{k^\star/2})$ samples, where $k^\star$ is a “generative” exponent associated with the model that we explicitly characterize. Moreover, we show that this sample complexity is also sufficient, by establishing matching upper bounds using a partial-trace algorithm. Therefore, our results provide evidence of a sharp computational-to-statistical gap (under both the SQ and LDP class) whenever $k^\star>2$. To complete the study, we construct smooth and Lipschitz deterministic target functions with arbitrarily large generative exponents $k^\star$.
APA
Damian, A., Pillaud-Vivien, L., Lee, J. & Bruna, J.. (2024). Computational-Statistical Gaps in Gaussian Single-Index Models (Extended Abstract). Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:1262-1262 Available from https://proceedings.mlr.press/v247/damian24a.html.

Related Material