[edit]
Sparse Stochastic Bandits
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1269-1270, 2017.
Abstract
In the classical multi-armed bandit problem, $d$ arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales linearly with $d$ (or with $\sqrt{d}$ in the minimax sense). We here consider the \emphsparse case of this classical problem in the sense that only a small number of arms, namely $s