[edit]
The True Sample Complexity of Identifying Good Arms
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1781-1791, 2020.
Abstract
We consider two multi-armed bandit problems with $n$ arms: \emph{(i)} given an $\epsilon > 0$, identify an arm with mean that is within $\epsilon$ of the largest mean and \emph{(ii)} given a threshold $\mu_0$ and integer $k$, identify $k$ arms with means larger than $\mu_0$. Existing lower bounds and algorithms for the PAC framework suggest that both of these problems require $\Omega(n)$ samples. However, we argue that the PAC framework not only conflicts with how these algorithms are used in practice, but also that these results disagree with intuition that says \emph{(i)} requires only $\Theta(\frac{n}{m})$ samples where $m = |\{ i : \mu_i > \max_{j \in [n]} \mu_j - \epsilon\}|$ and \emph{(ii)} requires $\Theta(\frac{n}{m}k)$ samples where $m = |\{ i : \mu_i > \mu_0 \}|$. We provide definitions that formalize these intuitions, obtain lower bounds that match the above sample complexities, and develop explicit, practical algorithms that achieve nearly matching upper bounds.