[edit]

# New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

*Proceedings of the 28th International Conference on Algorithmic Learning Theory*, PMLR 76:3-10, 2017.

#### Abstract

This paper is about two generalizations of the mistake bound model to online multiclass classification. In the

*standard model*, the learner receives the correct classification at the end of each round, and in the*bandit model*, the learner only finds out whether its prediction was correct or not. For a set $F$ of multiclass classifiers, let $\mathrm{opt}_{\mathrm{std}}(F)$ and $\mathrm{opt}_{\mathrm{bandit}}(F)$ be the optimal bounds for learning $F$ according to these two models. We show that an $$ \mathrm{opt}_{\mathrm{bandit}}(F) \leq (1 + o(1)) (|Y| \ln |Y|) \mathrm{opt}_{\mathrm{std}}(F) $$ bound is the best possible up to the leading constant, closing a $\Theta(\log |Y|)$ factor gap.