The Real Price of Bandit Information in Multiclass Classification

Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:1573-1598, 2024.

Abstract

We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of K possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels K, and whether T-step regret bounds in this setting can be improved beyond the KT dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form ˜Θ(min, where \mathcal{H} is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret \smash{\widetilde{O}(|\mathcal{H}|+\sqrt{T})}, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-erez24a, title = {The Real Price of Bandit Information in Multiclass Classification}, author = {Erez, Liad and Cohen, Alon and Koren, Tomer and Mansour, Yishay and Moran, Shay}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {1573--1598}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/erez24a/erez24a.pdf}, url = {https://proceedings.mlr.press/v247/erez24a.html}, abstract = {We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}(\min |\mathcal{H}| + \sqrt{T}, \sqrt{KT \log |\mathcal{H}|})}$, where $\mathcal{H}$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|\mathcal{H}|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.} }
Endnote
%0 Conference Paper %T The Real Price of Bandit Information in Multiclass Classification %A Liad Erez %A Alon Cohen %A Tomer Koren %A Yishay Mansour %A Shay Moran %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-erez24a %I PMLR %P 1573--1598 %U https://proceedings.mlr.press/v247/erez24a.html %V 247 %X We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}(\min |\mathcal{H}| + \sqrt{T}, \sqrt{KT \log |\mathcal{H}|})}$, where $\mathcal{H}$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|\mathcal{H}|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.
APA
Erez, L., Cohen, A., Koren, T., Mansour, Y. & Moran, S.. (2024). The Real Price of Bandit Information in Multiclass Classification. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:1573-1598 Available from https://proceedings.mlr.press/v247/erez24a.html.

Related Material