Pure Exploration in InfinitelyArmed Bandit Models with FixedConfidence
[edit]
Proceedings of Algorithmic Learning Theory, PMLR 83:324, 2018.
Abstract
We consider the problem of nearoptimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAClike framework within which to derive and cast results; (2) derive a sample complexity lower bound for nearoptimal arm identification; (3) propose an algorithm that identifies a nearlyoptimal arm with high probability and derive an upper bound on its sample complexity which is within a log factor of our lower bound; and (4) discuss whether our $\log^2 \frac{1}{δ}$ dependence is inescapable for “twophase” (select arms first, identify the best later) algorithms in the infinite setting. This work permits the application of bandit models to a broader class of problems where fewer assumptions hold.
Related Material


