Ambiguous Online Learning

Vanessa Kosoy
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:4229-4266, 2026.

Abstract

We propose a new variant of online learning that we call “ambiguous online learning". In this setting, the learner is allowed to produce multiple predicted labels. Such an “ambiguous prediction" is considered correct when at least one of the labels is correct, and none of the labels are “predictably wrong". The definition of “predictably wrong" comes from a hypothesis class in which hypotheses are also multi-valued. Thus, a prediction is “predictably wrong" if it’s not allowed by the (unknown) true hypothesis. In particular, this setting is natural in the context of multivalued dynamical systems, recommendation algorithms and lossless compression. It is also strongly related to so-called “apple tasting". We show that in this setting, the asymptotic minimax mistake bound is controlled by a combination of the classical Littlestone dimension $\mathrm{L}$ and a new parameter that we call “ambiguous Littlestone dimension" (denoted $\mathrm{AL}$). There is a trichotomy of behaviors: up to logarithmic factors, any hypothesis class has a mistake bound of either $O(1)$ (when both $\mathrm{AL}$ and $\mathrm{L}$ are finite), $\tilde{\Theta}(\sqrt{N})$ (when $\mathrm{AL}$ is infinite but $\mathrm{L}$ is finite) or $\Theta(N)$ (when both are infinite).

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-kosoy26a, title = {Ambiguous Online Learning}, author = {Kosoy, Vanessa}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {4229--4266}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/kosoy26a/kosoy26a.pdf}, url = {https://proceedings.mlr.press/v336/kosoy26a.html}, abstract = {We propose a new variant of online learning that we call “ambiguous online learning". In this setting, the learner is allowed to produce multiple predicted labels. Such an “ambiguous prediction" is considered correct when at least one of the labels is correct, and none of the labels are “predictably wrong". The definition of “predictably wrong" comes from a hypothesis class in which hypotheses are also multi-valued. Thus, a prediction is “predictably wrong" if it’s not allowed by the (unknown) true hypothesis. In particular, this setting is natural in the context of multivalued dynamical systems, recommendation algorithms and lossless compression. It is also strongly related to so-called “apple tasting". We show that in this setting, the asymptotic minimax mistake bound is controlled by a combination of the classical Littlestone dimension $\mathrm{L}$ and a new parameter that we call “ambiguous Littlestone dimension" (denoted $\mathrm{AL}$). There is a trichotomy of behaviors: up to logarithmic factors, any hypothesis class has a mistake bound of either $O(1)$ (when both $\mathrm{AL}$ and $\mathrm{L}$ are finite), $\tilde{\Theta}(\sqrt{N})$ (when $\mathrm{AL}$ is infinite but $\mathrm{L}$ is finite) or $\Theta(N)$ (when both are infinite).} }
Endnote
%0 Conference Paper %T Ambiguous Online Learning %A Vanessa Kosoy %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-kosoy26a %I PMLR %P 4229--4266 %U https://proceedings.mlr.press/v336/kosoy26a.html %V 336 %X We propose a new variant of online learning that we call “ambiguous online learning". In this setting, the learner is allowed to produce multiple predicted labels. Such an “ambiguous prediction" is considered correct when at least one of the labels is correct, and none of the labels are “predictably wrong". The definition of “predictably wrong" comes from a hypothesis class in which hypotheses are also multi-valued. Thus, a prediction is “predictably wrong" if it’s not allowed by the (unknown) true hypothesis. In particular, this setting is natural in the context of multivalued dynamical systems, recommendation algorithms and lossless compression. It is also strongly related to so-called “apple tasting". We show that in this setting, the asymptotic minimax mistake bound is controlled by a combination of the classical Littlestone dimension $\mathrm{L}$ and a new parameter that we call “ambiguous Littlestone dimension" (denoted $\mathrm{AL}$). There is a trichotomy of behaviors: up to logarithmic factors, any hypothesis class has a mistake bound of either $O(1)$ (when both $\mathrm{AL}$ and $\mathrm{L}$ are finite), $\tilde{\Theta}(\sqrt{N})$ (when $\mathrm{AL}$ is infinite but $\mathrm{L}$ is finite) or $\Theta(N)$ (when both are infinite).
APA
Kosoy, V.. (2026). Ambiguous Online Learning. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:4229-4266 Available from https://proceedings.mlr.press/v336/kosoy26a.html.

Related Material