Exponential Weights Algorithms for Selective Learning

Mingda Qiao; Gregory Valiant

Exponential Weights Algorithms for Selective Learning

Mingda Qiao, Gregory Valiant

Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:3833-3858, 2021.

Abstract

We study the selective learning problem introduced by Qiao and Valiant (2019), in which the learner observes

$n$ labeled data points one at a time. At a time of its choosing, the learner selects a window length

$w$ and a model

$\hat\ell$ from the model class

$\mathcal{L}$ , and then labels the next

$w$ data points using

$\hat\ell$ . The \emph{excess risk} incurred by the learner is defined as the difference between the average loss of

$\hat\ell$ over those

$w$ data points and the smallest possible average loss among all models in

$\mathcal{L}$ over those

$w$ data points. We give an improved algorithm, termed the \emph{hybrid exponential weights} algorithm, that achieves an expected excess risk of

$O((\log\log|\mathcal{L}| + \log\log n)/\log n)$ . This result gives a doubly exponential improvement in the dependence on

$|\mathcal{L}|$ over the best known bound of

$O(\sqrt{|\mathcal{L}|/\log n})$ . We complement the positive result with an almost matching lower bound, which suggests the worst-case optimality of the algorithm. We also study a more restrictive family of learning algorithms that are \emph{bounded-recall} in the sense that when a prediction window of length

$w$ is chosen, the learner’s decision only depends on the most recent

$w$ data points. We analyze an exponential weights variant of the ERM algorithm in Qiao and Valiant (2019). This new algorithm achieves an expected excess risk of

$O(\sqrt{\log |\mathcal{L}|/\log n})$ , which is shown to be nearly optimal among all bounded-recall learners. Our analysis builds on a generalized version of the selective mean prediction problem in Drucker (2013); Qiao and Valiant (2019), which may be of independent interest.

Cite this Paper

BibTeX


@InProceedings{pmlr-v134-qiao21a,
  title = 	 {Exponential Weights Algorithms for Selective Learning},
  author =       {Qiao, Mingda and Valiant, Gregory},
  booktitle = 	 {Proceedings of Thirty Fourth Conference on Learning Theory},
  pages = 	 {3833--3858},
  year = 	 {2021},
  editor = 	 {Belkin, Mikhail and Kpotufe, Samory},
  volume = 	 {134},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--19 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v134/qiao21a/qiao21a.pdf},
  url = 	 {https://proceedings.mlr.press/v134/qiao21a.html},
  abstract = 	 {We study the selective learning problem introduced by Qiao and Valiant (2019), in which the learner observes $n$ labeled data points one at a time. At a time of its choosing, the learner selects a window length $w$ and a model $\hat\ell$ from the model class $\mathcal{L}$, and then labels the next $w$ data points using $\hat\ell$. The \emph{excess risk} incurred by the learner is defined as the difference between the average loss of $\hat\ell$ over those $w$ data points and the smallest possible average loss among all models in $\mathcal{L}$ over those $w$ data points.  We give an improved algorithm, termed the \emph{hybrid exponential weights} algorithm, that achieves an expected excess risk of $O((\log\log|\mathcal{L}| + \log\log n)/\log n)$. This result gives a doubly exponential improvement in the dependence on $|\mathcal{L}|$ over the best known bound of $O(\sqrt{|\mathcal{L}|/\log n})$. We complement the positive result with an almost matching lower bound, which suggests the worst-case optimality of the algorithm.  We also study a more restrictive family of learning algorithms that are \emph{bounded-recall} in the sense that when a prediction window of length $w$ is chosen, the learner’s decision only depends on the most recent $w$ data points. We analyze an exponential weights variant of the ERM algorithm in Qiao and Valiant (2019). This new algorithm achieves an expected excess risk of $O(\sqrt{\log |\mathcal{L}|/\log n})$, which is shown to be nearly optimal among all bounded-recall learners. Our analysis builds on a generalized version of the selective mean prediction problem in Drucker (2013); Qiao and Valiant (2019), which may be of independent interest.}
}

Endnote

%0 Conference Paper
%T Exponential Weights Algorithms for Selective Learning
%A Mingda Qiao
%A Gregory Valiant
%B Proceedings of Thirty Fourth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Mikhail Belkin
%E Samory Kpotufe	
%F pmlr-v134-qiao21a
%I PMLR
%P 3833--3858
%U https://proceedings.mlr.press/v134/qiao21a.html
%V 134
%X We study the selective learning problem introduced by Qiao and Valiant (2019), in which the learner observes $n$ labeled data points one at a time. At a time of its choosing, the learner selects a window length $w$ and a model $\hat\ell$ from the model class $\mathcal{L}$, and then labels the next $w$ data points using $\hat\ell$. The \emph{excess risk} incurred by the learner is defined as the difference between the average loss of $\hat\ell$ over those $w$ data points and the smallest possible average loss among all models in $\mathcal{L}$ over those $w$ data points.  We give an improved algorithm, termed the \emph{hybrid exponential weights} algorithm, that achieves an expected excess risk of $O((\log\log|\mathcal{L}| + \log\log n)/\log n)$. This result gives a doubly exponential improvement in the dependence on $|\mathcal{L}|$ over the best known bound of $O(\sqrt{|\mathcal{L}|/\log n})$. We complement the positive result with an almost matching lower bound, which suggests the worst-case optimality of the algorithm.  We also study a more restrictive family of learning algorithms that are \emph{bounded-recall} in the sense that when a prediction window of length $w$ is chosen, the learner’s decision only depends on the most recent $w$ data points. We analyze an exponential weights variant of the ERM algorithm in Qiao and Valiant (2019). This new algorithm achieves an expected excess risk of $O(\sqrt{\log |\mathcal{L}|/\log n})$, which is shown to be nearly optimal among all bounded-recall learners. Our analysis builds on a generalized version of the selective mean prediction problem in Drucker (2013); Qiao and Valiant (2019), which may be of independent interest.

APA


Qiao, M. & Valiant, G.. (2021). Exponential Weights Algorithms for Selective Learning. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:3833-3858 Available from https://proceedings.mlr.press/v134/qiao21a.html.

Related Material

Download PDF