[edit]
Sparse Stochastic Bandits
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1269-1270, 2017.
Abstract
In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales linearly with d (or with √d in the minimax sense). We here consider the \emphsparse case of this classical problem in the sense that only a small number of arms, namely $s
Cite this Paper
Related Material