The Simulator: Understanding Adaptive Sampling in the ModerateConfidence Regime
[edit]
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:17941834, 2017.
Abstract
We propose a novel technique for analyzing adaptive sampling called the Simulator. Our approach differs from the existing methods by considering not how much information could be gathered by any fixed sampling strategy, but how difficult it is to distinguish a good sampling strategy from a bad one given the limited amount of data collected up to any given time. This change of perspective allows us to match the strength of both Fano and changeofmeasure techniques, without succumbing to the limitations of either method. For concreteness, we apply our techniques to a structured multiarm bandit problem in the fixedconfidence pure exploration setting, where we show that the constraints on the means imply a substantial gap between the moderateconfidence sample complexity, and the asymptotic sample complexity as the confidence delta tends to zero, as found in the literature. We also prove the first instancebased lower bounds for the topk problem which incorporate the appropriate logfactors. Moreover, our lower bounds zeroin on the number of times each individual arm needs to be pulled, uncovering new phenomena which are drowned out in the aggregate sample complexity. Our new analysis inspires a simple and nearoptimal algorithm for the bestarm and topk identification, the first practical algorithm of its kind for the latter problem which removes extraneous log factors, and outperforms the stateoftheart in experiments.
Related Material


