Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret
[edit]
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:488497, 2017.
Abstract
We present an efficient secondorder algorithm with $\tilde{O}(1/\eta \sqrt{T})$ regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by $\eta$, ranging from hinge loss ($\eta=0$) to squared hinge loss ($\eta=1$). This provides a solution to the open problem of (Abernethy, J. and Rakhlin, A. An efficient bandit algorithm for $\sqrt{T}$regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it performs favorably against earlier algorithms.
Related Material


