Multiclass Online Learnability under Bandit Feedback

Ananth Raman, Vinod Raman, Unique Subedi, Idan Mehalel, Ambuj Tewari
Proceedings of The 35th International Conference on Algorithmic Learning Theory, PMLR 237:997-1012, 2024.

Abstract

We study online multiclass classification under bandit feedback. We extend the results of Daniely and Helbertal [2013] by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online learnability even when the label space is unbounded. Moreover, we show that, unlike the full-information setting, sequential uniform convergence is necessary but not sufficient for bandit online learnability. Our result complements the recent work by Hanneke, Moran, Raman, Subedi, and Tewari [2023] who show that the Littlestone dimension characterizes online multiclass learnability in the full-information setting even when the label space is unbounded.

Cite this Paper


BibTeX
@InProceedings{pmlr-v237-raman24a, title = {Multiclass Online Learnability under Bandit Feedback}, author = {Raman, Ananth and Raman, Vinod and Subedi, Unique and Mehalel, Idan and Tewari, Ambuj}, booktitle = {Proceedings of The 35th International Conference on Algorithmic Learning Theory}, pages = {997--1012}, year = {2024}, editor = {Vernade, Claire and Hsu, Daniel}, volume = {237}, series = {Proceedings of Machine Learning Research}, month = {25--28 Feb}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v237/raman24a/raman24a.pdf}, url = {https://proceedings.mlr.press/v237/raman24a.html}, abstract = {We study online multiclass classification under bandit feedback. We extend the results of Daniely and Helbertal [2013] by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online learnability even when the label space is unbounded. Moreover, we show that, unlike the full-information setting, sequential uniform convergence is necessary but not sufficient for bandit online learnability. Our result complements the recent work by Hanneke, Moran, Raman, Subedi, and Tewari [2023] who show that the Littlestone dimension characterizes online multiclass learnability in the full-information setting even when the label space is unbounded.} }
Endnote
%0 Conference Paper %T Multiclass Online Learnability under Bandit Feedback %A Ananth Raman %A Vinod Raman %A Unique Subedi %A Idan Mehalel %A Ambuj Tewari %B Proceedings of The 35th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Claire Vernade %E Daniel Hsu %F pmlr-v237-raman24a %I PMLR %P 997--1012 %U https://proceedings.mlr.press/v237/raman24a.html %V 237 %X We study online multiclass classification under bandit feedback. We extend the results of Daniely and Helbertal [2013] by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online learnability even when the label space is unbounded. Moreover, we show that, unlike the full-information setting, sequential uniform convergence is necessary but not sufficient for bandit online learnability. Our result complements the recent work by Hanneke, Moran, Raman, Subedi, and Tewari [2023] who show that the Littlestone dimension characterizes online multiclass learnability in the full-information setting even when the label space is unbounded.
APA
Raman, A., Raman, V., Subedi, U., Mehalel, I. & Tewari, A.. (2024). Multiclass Online Learnability under Bandit Feedback. Proceedings of The 35th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 237:997-1012 Available from https://proceedings.mlr.press/v237/raman24a.html.

Related Material