New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

Philip M. Long
Proceedings of the 28th International Conference on Algorithmic Learning Theory, PMLR 76:3-10, 2017.

Abstract

This paper is about two generalizations of the mistake bound model to online multiclass classification. In the standard model, the learner receives the correct classification at the end of each round, and in the bandit model, the learner only finds out whether its prediction was correct or not. For a set $F$ of multiclass classifiers, let $\mathrm{opt}_{\mathrm{std}}(F)$ and $\mathrm{opt}_{\mathrm{bandit}}(F)$ be the optimal bounds for learning $F$ according to these two models. We show that an $$ \mathrm{opt}_{\mathrm{bandit}}(F) \leq (1 + o(1)) (|Y| \ln |Y|) \mathrm{opt}_{\mathrm{std}}(F) $$ bound is the best possible up to the leading constant, closing a $\Theta(\log |Y|)$ factor gap.

Cite this Paper


BibTeX
@InProceedings{pmlr-v76-long17a, title = {New bounds on the price of bandit feedback for mistake-bounded online multiclass learning}, author = {Long, Philip M.}, booktitle = {Proceedings of the 28th International Conference on Algorithmic Learning Theory}, pages = {3--10}, year = {2017}, editor = {Hanneke, Steve and Reyzin, Lev}, volume = {76}, series = {Proceedings of Machine Learning Research}, month = {15--17 Oct}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v76/long17a/long17a.pdf}, url = {https://proceedings.mlr.press/v76/long17a.html}, abstract = {This paper is about two generalizations of the mistake bound model to online multiclass classification. In the standard model, the learner receives the correct classification at the end of each round, and in the bandit model, the learner only finds out whether its prediction was correct or not. For a set $F$ of multiclass classifiers, let $\mathrm{opt}_{\mathrm{std}}(F)$ and $\mathrm{opt}_{\mathrm{bandit}}(F)$ be the optimal bounds for learning $F$ according to these two models. We show that an $$ \mathrm{opt}_{\mathrm{bandit}}(F) \leq (1 + o(1)) (|Y| \ln |Y|) \mathrm{opt}_{\mathrm{std}}(F) $$ bound is the best possible up to the leading constant, closing a $\Theta(\log |Y|)$ factor gap.} }
Endnote
%0 Conference Paper %T New bounds on the price of bandit feedback for mistake-bounded online multiclass learning %A Philip M. Long %B Proceedings of the 28th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2017 %E Steve Hanneke %E Lev Reyzin %F pmlr-v76-long17a %I PMLR %P 3--10 %U https://proceedings.mlr.press/v76/long17a.html %V 76 %X This paper is about two generalizations of the mistake bound model to online multiclass classification. In the standard model, the learner receives the correct classification at the end of each round, and in the bandit model, the learner only finds out whether its prediction was correct or not. For a set $F$ of multiclass classifiers, let $\mathrm{opt}_{\mathrm{std}}(F)$ and $\mathrm{opt}_{\mathrm{bandit}}(F)$ be the optimal bounds for learning $F$ according to these two models. We show that an $$ \mathrm{opt}_{\mathrm{bandit}}(F) \leq (1 + o(1)) (|Y| \ln |Y|) \mathrm{opt}_{\mathrm{std}}(F) $$ bound is the best possible up to the leading constant, closing a $\Theta(\log |Y|)$ factor gap.
APA
Long, P.M.. (2017). New bounds on the price of bandit feedback for mistake-bounded online multiclass learning. Proceedings of the 28th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 76:3-10 Available from https://proceedings.mlr.press/v76/long17a.html.

Related Material