Sparse Stochastic Bandits
[edit]
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:12691270, 2017.
Abstract
In the classical multiarmed bandit problem, $d$ arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales linearly with $d$ (or with $\sqrt{d}$ in the minimax sense). We here consider the \emphsparse case of this classical problem in the sense that only a small number of arms, namely $s
Related Material


