Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture
29th Annual Conference on Learning Theory, PMLR 49:1643-1646, 2016.
The best arm identification problem (BEST-1-ARM) is the most basic pure exploration problem in stochastic multi-armed bandits. The problem has a long history and attracted significant attention for the last decade. However, we do not yet have a complete understanding of the optimal sample complexity of the problem: The state-of-the-art algorithms achieve a sample complexity of O(\sum_i=2^n \Delta_i^-2(\lnδ^-1 + \ln\ln\Delta_i^-1)) (\Delta_i is the difference between the largest mean and the i^th mean), while the best known lower bound is Ω(\sum_i=2^n \Delta_i^-2\lnδ^-1) for general instances and Ω(∆^-2 \ln\ln ∆^-1) for the two-arm instances. We propose to study the instance-wise optimality for the BEST-1-ARM problem. Previous work has proved that it is impossible to have an instance optimal algorithm for the 2-arm problem. However, we conjecture that modulo the additive term Ω(\Delta_2^-2 \ln\ln \Delta_2^-1) (which is an upper bound and worst case lower bound for the 2-arm problem), there is an instance optimal algorithm for BEST-1-ARM. Moreover, we introduce a new quantity, called the gap entropy for a best-arm problem instance, and conjecture that it is the instance-wise lower bound. Hence, resolving this conjecture would provide a final answer to the old and basic problem.