Statistical Query Lower Bounds for Learning Truncated Gaussians

Ilias Diakonikolas; Daniel M. Kane; Thanasis Pittas; Nikos Zarifis

Statistical Query Lower Bounds for Learning Truncated Gaussians

Ilias Diakonikolas, Daniel M. Kane, Thanasis Pittas, Nikos Zarifis

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:1336-1363, 2024.

Abstract

We study the problem of estimating the mean of an identity covariance Gaussian in the truncated setting, in the regime when the truncation set comes from a low-complexity family

$\mathcal{C}$ of sets. Specifically, for a fixed but unknown truncation set

$S \subseteq \mathbb{R}^d$ , we are given access to samples from the distribution

$\mathcal{N}(\bm{\mu}, \vec{I})$ truncated to the set

$S$ . The goal is to estimate

$\bm{\mu}$ within accuracy

$\epsilon>0$ in

$\ell_2$ -norm. Our main result is a Statistical Query (SQ) lower bound suggesting a super-polynomial information-computation gap for this task. In more detail, we show that the complexity of any SQ algorithm for this problem is

$d^{\mathrm{poly}(1/\epsilon)}$ , even when the class

$\mathcal{C}$ is simple so that

$\mathrm{poly}(d/\epsilon)$ samples information-theoretically suffice. Concretely, our SQ lower bound applies when

$\mathcal{C}$ is a union of a bounded number of rectangles whose VC dimension and Gaussian surface are small. As a corollary of our construction, it also follows that the complexity of the previously known algorithm for this task is qualitatively best possible.

Cite this Paper

BibTeX


@InProceedings{pmlr-v247-diakonikolas24b,
  title = 	 {Statistical Query Lower Bounds for Learning Truncated Gaussians},
  author =       {Diakonikolas, Ilias and Kane, Daniel M. and Pittas, Thanasis and Zarifis, Nikos},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {1336--1363},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/diakonikolas24b/diakonikolas24b.pdf},
  url = 	 {https://proceedings.mlr.press/v247/diakonikolas24b.html},
  abstract = 	 {We study the problem of estimating the mean of an identity covariance Gaussian in the  truncated setting, in the regime when the truncation set comes from a low-complexity  family $\mathcal{C}$ of sets. Specifically, for a fixed but unknown truncation set  $S \subseteq \mathbb{R}^d$, we are given access to samples from the distribution  $\mathcal{N}(\bm{\mu}, \vec{I})$ truncated to the set $S$. The goal is to estimate $\bm{\mu}$  within accuracy $\epsilon>0$ in $\ell_2$-norm. Our main result is a Statistical Query (SQ)  lower bound suggesting a super-polynomial information-computation gap for this task. In  more detail, we show that the complexity of any SQ algorithm for this problem is  $d^{\mathrm{poly}(1/\epsilon)}$, even when the class $\mathcal{C}$ is simple so that $\mathrm{poly}(d/\epsilon)$  samples information-theoretically suffice. Concretely, our SQ lower bound applies when  $\mathcal{C}$ is a union of a bounded number of rectangles whose VC dimension and Gaussian  surface are small. As a corollary of our construction, it also  follows that the complexity of the previously known algorithm for this task is qualitatively best possible.}
}

Endnote

%0 Conference Paper
%T Statistical Query Lower Bounds for Learning Truncated Gaussians
%A Ilias Diakonikolas
%A Daniel M. Kane
%A Thanasis Pittas
%A Nikos Zarifis
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-diakonikolas24b
%I PMLR
%P 1336--1363
%U https://proceedings.mlr.press/v247/diakonikolas24b.html
%V 247
%X We study the problem of estimating the mean of an identity covariance Gaussian in the  truncated setting, in the regime when the truncation set comes from a low-complexity  family $\mathcal{C}$ of sets. Specifically, for a fixed but unknown truncation set  $S \subseteq \mathbb{R}^d$, we are given access to samples from the distribution  $\mathcal{N}(\bm{\mu}, \vec{I})$ truncated to the set $S$. The goal is to estimate $\bm{\mu}$  within accuracy $\epsilon>0$ in $\ell_2$-norm. Our main result is a Statistical Query (SQ)  lower bound suggesting a super-polynomial information-computation gap for this task. In  more detail, we show that the complexity of any SQ algorithm for this problem is  $d^{\mathrm{poly}(1/\epsilon)}$, even when the class $\mathcal{C}$ is simple so that $\mathrm{poly}(d/\epsilon)$  samples information-theoretically suffice. Concretely, our SQ lower bound applies when  $\mathcal{C}$ is a union of a bounded number of rectangles whose VC dimension and Gaussian  surface are small. As a corollary of our construction, it also  follows that the complexity of the previously known algorithm for this task is qualitatively best possible.

APA


Diakonikolas, I., Kane, D.M., Pittas, T. & Zarifis, N.. (2024). Statistical Query Lower Bounds for Learning Truncated Gaussians. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:1336-1363 Available from https://proceedings.mlr.press/v247/diakonikolas24b.html.

Related Material

Download PDF