Universal Rates for Multiclass Learning with Bandit Feedback

Steve Hanneke, Amirreza Shaeiri, Qian Zhang
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:2704-2756, 2025.

Abstract

The seminal work of (Daniely et al., COLT 2011) introduced the problem of multiclass learning under bandit feedback and provided a combinatorial characterization of its learnability within the framework of PAC learning. In multiclass learning under bandit feedback, there is an unknown data distribution over an instance space $\mathcal{X}$ and a label space $\mathcal{Y}$ similar to classical multiclass learning, but the learner does not directly observe the correct labels of the i.i.d. training examples. Instead, during each round, the learner receives an example, makes a prediction for its label, and receives bandit feedback only indicating whether the prediction is correct. Despite this restriction, the goal remains the same as in classical multiclass learning. In the present work, we study the problem of multiclass learning under bandit feedback within the framework of \emph{universal learning} (Bousquet et al., STOC 2021). This makes it possible to study the behavior of learning curves. In the \emph{uniform learning} framework, no concept class $\mathcal{C}$ is learnable when the effective label space is unbounded. In contrast, surprisingly, we demonstrate that the universal learnability of concept classes $\mathcal{C}$ even when the effective label space is unbounded gives rise to a rich theory. More concretely, our primary contribution is a theory that reveals an inherent trichotomy governing instance optimal learning curves in the realizable setting. Moreover, the best achievable universal learning rate for any given concept class can only decay either at an \emph{exponential}, a \emph{linear}, or an \emph{arbitrarily slow} rate. In particular, the trichotomy is combinatorially characterized by the absence of an infinite multiclass Littlestone tree and the combination of an infinite Natarajan Littlestone tree and an infinite progressive Littlestone tree. Furthermore, we introduce novel learning algorithms for achieving instance optimal universal rates.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-hanneke25c, title = {Universal Rates for Multiclass Learning with Bandit Feedback}, author = {Hanneke, Steve and Shaeiri, Amirreza and Zhang, Qian}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {2704--2756}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/hanneke25c/hanneke25c.pdf}, url = {https://proceedings.mlr.press/v291/hanneke25c.html}, abstract = {The seminal work of (Daniely et al., COLT 2011) introduced the problem of multiclass learning under bandit feedback and provided a combinatorial characterization of its learnability within the framework of PAC learning. In multiclass learning under bandit feedback, there is an unknown data distribution over an instance space $\mathcal{X}$ and a label space $\mathcal{Y}$ similar to classical multiclass learning, but the learner does not directly observe the correct labels of the i.i.d. training examples. Instead, during each round, the learner receives an example, makes a prediction for its label, and receives bandit feedback only indicating whether the prediction is correct. Despite this restriction, the goal remains the same as in classical multiclass learning. In the present work, we study the problem of multiclass learning under bandit feedback within the framework of \emph{universal learning} (Bousquet et al., STOC 2021). This makes it possible to study the behavior of learning curves. In the \emph{uniform learning} framework, no concept class $\mathcal{C}$ is learnable when the effective label space is unbounded. In contrast, surprisingly, we demonstrate that the universal learnability of concept classes $\mathcal{C}$ even when the effective label space is unbounded gives rise to a rich theory. More concretely, our primary contribution is a theory that reveals an inherent trichotomy governing instance optimal learning curves in the realizable setting. Moreover, the best achievable universal learning rate for any given concept class can only decay either at an \emph{exponential}, a \emph{linear}, or an \emph{arbitrarily slow} rate. In particular, the trichotomy is combinatorially characterized by the absence of an infinite multiclass Littlestone tree and the combination of an infinite Natarajan Littlestone tree and an infinite progressive Littlestone tree. Furthermore, we introduce novel learning algorithms for achieving instance optimal universal rates.} }
Endnote
%0 Conference Paper %T Universal Rates for Multiclass Learning with Bandit Feedback %A Steve Hanneke %A Amirreza Shaeiri %A Qian Zhang %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-hanneke25c %I PMLR %P 2704--2756 %U https://proceedings.mlr.press/v291/hanneke25c.html %V 291 %X The seminal work of (Daniely et al., COLT 2011) introduced the problem of multiclass learning under bandit feedback and provided a combinatorial characterization of its learnability within the framework of PAC learning. In multiclass learning under bandit feedback, there is an unknown data distribution over an instance space $\mathcal{X}$ and a label space $\mathcal{Y}$ similar to classical multiclass learning, but the learner does not directly observe the correct labels of the i.i.d. training examples. Instead, during each round, the learner receives an example, makes a prediction for its label, and receives bandit feedback only indicating whether the prediction is correct. Despite this restriction, the goal remains the same as in classical multiclass learning. In the present work, we study the problem of multiclass learning under bandit feedback within the framework of \emph{universal learning} (Bousquet et al., STOC 2021). This makes it possible to study the behavior of learning curves. In the \emph{uniform learning} framework, no concept class $\mathcal{C}$ is learnable when the effective label space is unbounded. In contrast, surprisingly, we demonstrate that the universal learnability of concept classes $\mathcal{C}$ even when the effective label space is unbounded gives rise to a rich theory. More concretely, our primary contribution is a theory that reveals an inherent trichotomy governing instance optimal learning curves in the realizable setting. Moreover, the best achievable universal learning rate for any given concept class can only decay either at an \emph{exponential}, a \emph{linear}, or an \emph{arbitrarily slow} rate. In particular, the trichotomy is combinatorially characterized by the absence of an infinite multiclass Littlestone tree and the combination of an infinite Natarajan Littlestone tree and an infinite progressive Littlestone tree. Furthermore, we introduce novel learning algorithms for achieving instance optimal universal rates.
APA
Hanneke, S., Shaeiri, A. & Zhang, Q.. (2025). Universal Rates for Multiclass Learning with Bandit Feedback. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:2704-2756 Available from https://proceedings.mlr.press/v291/hanneke25c.html.

Related Material