Hardness of Agnostically Learning Halfspaces from Worst-Case Lattice Problems

Stefan Tiegel

Hardness of Agnostically Learning Halfspaces from Worst-Case Lattice Problems

Stefan Tiegel

Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:3029-3064, 2023.

Abstract

We show hardness of improperly learning halfspaces in the agnostic model, both in the distribution-independent as well as the distribution-specific setting, based on the assumption that worst-case lattice problems, e.g., approximating shortest vectors within polynomial factors, are hard.In particular, we show that under this assumption there is no efficient algorithm that outputs any binary hypothesis, not necessarily a halfspace, achieving misclassfication error better than

$\frac 1 2 - \gamma$ even if the optimal misclassification error is as small is as small as

$\delta$ .Here,

$\gamma$ can be smaller than the inverse of any polynomial in the dimension and

$\delta$ as small as

$\exp(-\Omega(\log^{1-c}(d)))$ , where

$0 < c < 1$ is an arbitrary constant and

$d$ is the dimension.For the distribution-specific setting, we show that if the marginal distribution is standard Gaussian, for any

$\beta > 0$ learning halfspaces up to error

$\OPT_\LTF + \varepsilon$ takes time at least

$d^{\tilde{\Omega}(1/\varepsilon^{2-\beta})}$ under the same hardness assumptions.Similarly, we show that learning degree-

$\ell$ polynomial threshold functions up to error

$\OPT_{\PTF_\ell} + \e$ takes time at least

$d^{\tilde{\Omega}(\ell^{2-\beta}/\varepsilon^{4-2\beta})}$ .

$\OPT_\LTF$ and

$\OPT_{\PTF_\ell}$ denote the best error achievable by any halfspace or polynomial threshold function, respectively.Our lower bounds qualitively match algorithmic guarantees and (nearly) recover known lower bounds based on non-worst-case assumptions.Previously, such hardness results [D16, DKPZ21] were based on average-case complexity assumptions, specifically, variants of Feige’s random 3SAT hypothesis, or restricted to the statistical query model.Our work gives the first hardness results basing these fundamental learning problems on well-understood worst-case complexity assumption.It is inspired by a sequence of recent works showing hardness of learning well-separated Gaussian mixtures based on worst-case lattice problems.

Cite this Paper

BibTeX


@InProceedings{pmlr-v195-tiegel23a,
  title = 	 {Hardness of Agnostically Learning Halfspaces from Worst-Case Lattice Problems},
  author =       {Tiegel, Stefan},
  booktitle = 	 {Proceedings of Thirty Sixth Conference on Learning Theory},
  pages = 	 {3029--3064},
  year = 	 {2023},
  editor = 	 {Neu, Gergely and Rosasco, Lorenzo},
  volume = 	 {195},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v195/tiegel23a/tiegel23a.pdf},
  url = 	 {https://proceedings.mlr.press/v195/tiegel23a.html},
  abstract = 	 {We show hardness of improperly learning halfspaces in the agnostic model, both in the distribution-independent as well as the distribution-specific setting, based on the assumption that worst-case lattice problems, e.g., approximating shortest vectors within polynomial factors, are hard.In particular, we show that under this assumption there is no efficient algorithm that outputs any binary hypothesis, not necessarily a halfspace, achieving misclassfication error better than $\frac 1 2 - \gamma$ even if the optimal misclassification error is as small is as small as $\delta$.Here, $\gamma$ can be smaller than the inverse of any polynomial in the dimension and $\delta$ as small as $\exp(-\Omega(\log^{1-c}(d)))$, where $0 < c < 1$ is an arbitrary constant and $d$ is the dimension.For the distribution-specific setting, we show that if the marginal distribution is standard Gaussian, for any $\beta > 0$ learning halfspaces up to error $\OPT_\LTF + \varepsilon$ takes time at least $d^{\tilde{\Omega}(1/\varepsilon^{2-\beta})}$ under the same hardness assumptions.Similarly, we show that learning degree-$\ell$ polynomial threshold functions up to error $\OPT_{\PTF_\ell} + \e$ takes time at least $d^{\tilde{\Omega}(\ell^{2-\beta}/\varepsilon^{4-2\beta})}$.$\OPT_\LTF$ and $\OPT_{\PTF_\ell}$ denote the best error achievable by any halfspace or polynomial threshold function, respectively.Our lower bounds qualitively match algorithmic guarantees and (nearly) recover known lower bounds based on non-worst-case assumptions.Previously, such hardness results [D16, DKPZ21] were based on average-case complexity assumptions, specifically, variants of Feige’s random 3SAT hypothesis, or restricted to the statistical query model.Our work gives the first hardness results basing these fundamental learning problems on well-understood worst-case complexity assumption.It is inspired by a sequence of recent works showing hardness of learning well-separated Gaussian mixtures based on worst-case lattice problems.}
}

Endnote

%0 Conference Paper
%T Hardness of Agnostically Learning Halfspaces from Worst-Case Lattice Problems
%A Stefan Tiegel
%B Proceedings of Thirty Sixth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Gergely Neu
%E Lorenzo Rosasco	
%F pmlr-v195-tiegel23a
%I PMLR
%P 3029--3064
%U https://proceedings.mlr.press/v195/tiegel23a.html
%V 195
%X We show hardness of improperly learning halfspaces in the agnostic model, both in the distribution-independent as well as the distribution-specific setting, based on the assumption that worst-case lattice problems, e.g., approximating shortest vectors within polynomial factors, are hard.In particular, we show that under this assumption there is no efficient algorithm that outputs any binary hypothesis, not necessarily a halfspace, achieving misclassfication error better than $\frac 1 2 - \gamma$ even if the optimal misclassification error is as small is as small as $\delta$.Here, $\gamma$ can be smaller than the inverse of any polynomial in the dimension and $\delta$ as small as $\exp(-\Omega(\log^{1-c}(d)))$, where $0 < c < 1$ is an arbitrary constant and $d$ is the dimension.For the distribution-specific setting, we show that if the marginal distribution is standard Gaussian, for any $\beta > 0$ learning halfspaces up to error $\OPT_\LTF + \varepsilon$ takes time at least $d^{\tilde{\Omega}(1/\varepsilon^{2-\beta})}$ under the same hardness assumptions.Similarly, we show that learning degree-$\ell$ polynomial threshold functions up to error $\OPT_{\PTF_\ell} + \e$ takes time at least $d^{\tilde{\Omega}(\ell^{2-\beta}/\varepsilon^{4-2\beta})}$.$\OPT_\LTF$ and $\OPT_{\PTF_\ell}$ denote the best error achievable by any halfspace or polynomial threshold function, respectively.Our lower bounds qualitively match algorithmic guarantees and (nearly) recover known lower bounds based on non-worst-case assumptions.Previously, such hardness results [D16, DKPZ21] were based on average-case complexity assumptions, specifically, variants of Feige’s random 3SAT hypothesis, or restricted to the statistical query model.Our work gives the first hardness results basing these fundamental learning problems on well-understood worst-case complexity assumption.It is inspired by a sequence of recent works showing hardness of learning well-separated Gaussian mixtures based on worst-case lattice problems.

APA


Tiegel, S.. (2023). Hardness of Agnostically Learning Halfspaces from Worst-Case Lattice Problems. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:3029-3064 Available from https://proceedings.mlr.press/v195/tiegel23a.html.

Related Material

Download PDF