Towards Instance Optimal Bounds for Best Arm Identification
[edit]
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:535592, 2017.
Abstract
In the classical best arm identification (Best$1$Arm) problem, we are given $n$ stochastic bandit arms, each associated with a reward distribution with an unknown mean. Upon each play of an arm, we can get a reward sampled i.i.d. from its reward distribution. We would like to identify the arm with the largest mean with probability at least $1δ$, using as few samples as possible. The problem has a long history and understanding its sample complexity has attracted significant attention since the last decade. However, the optimal sample complexity of the problem is still unknown. Recently, Chen and Li (2016) made an interesting conjecture, called gapentropy conjecture, concerning the instance optimal sample complexity of Best$1$Arm. Given a Best$1$Arm instance $I$ (i.e., a set of arms), let $\mu_[i]$ denote the $i$th largest mean and $\Delta_[i]=\mu_[1]\mu_[i]$ denote the corresponding gap. $H(I)=\sum_i=2^n\Delta_[i]^2$ denotes the complexity of the instance. The gapentropy conjecture states that for any instance $I$, $Ω\left(H(I)⋅\left(\lnδ^1 + \mathsf{Ent}(I)\right)\right)$ is an instance lower bound, where $\mathsf{Ent}(I)$ is an entropylike term determined by the gaps, and there is a $δ$correct algorithm for Best$1$Arm with sample complexity $O\left(H(I)⋅\left(\lnδ^1 + \mathsf{Ent}(I)\right)+\Delta_[2]^2\ln\ln\Delta_[2]^1\right)$. We note that $Θ\left(\Delta_[2]^2\ln\ln\Delta_[2]^1\right)$ is necessary and sufficient to solve the twoarm instance with the best and second best arms. If the conjecture is true, we would have a complete understanding of the instancewise sample complexity of Best$1$Arm (up to constant factors). In this paper, we make significant progress towards a complete resolution of the gapentropy conjecture. For the upper bound, we provide a highly nontrivial algorithm which requires \[O\left(H(I)⋅\left(\lnδ^1 + \mathsf{Ent}(I)\right)+\Delta_[2]^2\ln\ln\Delta_[2]^1\mathrmpolylog(n,δ^1)\right)\]samples in expectation for any instance $I$. For the lower bound, we show that for any Gaussian Best$1$Arm instance with gaps of the form $2^k$, any $δ$correct monotone algorithm requires at least \[Ω\left(H(I)⋅\left(\lnδ^1 + \mathsf{Ent}(I)\right)\right)\]samples in expectation. Here, a monotone algorithm is one which uses no more samples (in expectation) on $I’$ than on $I$, if $I’$ is a subinstance of $I$ obtained by removing some suboptimal arms.
Related Material


