[edit]
New bounds on the price of bandit feedback for mistake-bounded online multiclass learning
Proceedings of the 28th International Conference on Algorithmic Learning Theory, PMLR 76:3-10, 2017.
Abstract
This paper is about two generalizations of the mistake bound model to online multiclass classification. In the standard model, the learner receives the correct classification at the end of each round, and in the bandit model, the learner only finds out whether its prediction was correct or not. For a set $F$ of multiclass classifiers, let $\mathrm{opt}_{\mathrm{std}}(F)$ and $\mathrm{opt}_{\mathrm{bandit}}(F)$ be the optimal bounds for learning $F$ according to these two models. We show that an $$ \mathrm{opt}_{\mathrm{bandit}}(F) \leq (1 + o(1)) (|Y| \ln |Y|) \mathrm{opt}_{\mathrm{std}}(F) $$ bound is the best possible up to the leading constant, closing a $\Theta(\log |Y|)$ factor gap.