The Real Price of Bandit Information in Multiclass Classification

Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:1573-1598, 2024.

Abstract

We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}(\min |\mathcal{H}| + \sqrt{T}, \sqrt{KT \log |\mathcal{H}|})}$, where $\mathcal{H}$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|\mathcal{H}|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-erez24a, title = {The Real Price of Bandit Information in Multiclass Classification}, author = {Erez, Liad and Cohen, Alon and Koren, Tomer and Mansour, Yishay and Moran, Shay}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {1573--1598}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/erez24a/erez24a.pdf}, url = {https://proceedings.mlr.press/v247/erez24a.html}, abstract = {We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}(\min |\mathcal{H}| + \sqrt{T}, \sqrt{KT \log |\mathcal{H}|})}$, where $\mathcal{H}$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|\mathcal{H}|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.} }
Endnote
%0 Conference Paper %T The Real Price of Bandit Information in Multiclass Classification %A Liad Erez %A Alon Cohen %A Tomer Koren %A Yishay Mansour %A Shay Moran %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-erez24a %I PMLR %P 1573--1598 %U https://proceedings.mlr.press/v247/erez24a.html %V 247 %X We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}(\min |\mathcal{H}| + \sqrt{T}, \sqrt{KT \log |\mathcal{H}|})}$, where $\mathcal{H}$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|\mathcal{H}|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.
APA
Erez, L., Cohen, A., Koren, T., Mansour, Y. & Moran, S.. (2024). The Real Price of Bandit Information in Multiclass Classification. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:1573-1598 Available from https://proceedings.mlr.press/v247/erez24a.html.

Related Material