Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

Maryam Aziz, Jesse Anderton, Emilie Kaufmann, Javed Aslam
Proceedings of Algorithmic Learning Theory, PMLR 83:3-24, 2018.

Abstract

We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAC-like framework within which to derive and cast results; (2) derive a sample complexity lower bound for near-optimal arm identification; (3) propose an algorithm that identifies a nearly-optimal arm with high probability and derive an upper bound on its sample complexity which is within a log factor of our lower bound; and (4) discuss whether our $\log^2 \frac{1}{δ}$ dependence is inescapable for “two-phase” (select arms first, identify the best later) algorithms in the infinite setting. This work permits the application of bandit models to a broader class of problems where fewer assumptions hold.

Cite this Paper


BibTeX
@InProceedings{pmlr-v83-aziz18a, title = {Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence}, author = {Aziz, Maryam and Anderton, Jesse and Kaufmann, Emilie and Aslam, Javed}, booktitle = {Proceedings of Algorithmic Learning Theory}, pages = {3--24}, year = {2018}, editor = {Janoos, Firdaus and Mohri, Mehryar and Sridharan, Karthik}, volume = {83}, series = {Proceedings of Machine Learning Research}, month = {07--09 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v83/aziz18a/aziz18a.pdf}, url = {https://proceedings.mlr.press/v83/aziz18a.html}, abstract = {We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAC-like framework within which to derive and cast results; (2) derive a sample complexity lower bound for near-optimal arm identification; (3) propose an algorithm that identifies a nearly-optimal arm with high probability and derive an upper bound on its sample complexity which is within a log factor of our lower bound; and (4) discuss whether our $\log^2 \frac{1}{δ}$ dependence is inescapable for “two-phase” (select arms first, identify the best later) algorithms in the infinite setting. This work permits the application of bandit models to a broader class of problems where fewer assumptions hold.} }
Endnote
%0 Conference Paper %T Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence %A Maryam Aziz %A Jesse Anderton %A Emilie Kaufmann %A Javed Aslam %B Proceedings of Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Firdaus Janoos %E Mehryar Mohri %E Karthik Sridharan %F pmlr-v83-aziz18a %I PMLR %P 3--24 %U https://proceedings.mlr.press/v83/aziz18a.html %V 83 %X We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAC-like framework within which to derive and cast results; (2) derive a sample complexity lower bound for near-optimal arm identification; (3) propose an algorithm that identifies a nearly-optimal arm with high probability and derive an upper bound on its sample complexity which is within a log factor of our lower bound; and (4) discuss whether our $\log^2 \frac{1}{δ}$ dependence is inescapable for “two-phase” (select arms first, identify the best later) algorithms in the infinite setting. This work permits the application of bandit models to a broader class of problems where fewer assumptions hold.
APA
Aziz, M., Anderton, J., Kaufmann, E. & Aslam, J.. (2018). Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence. Proceedings of Algorithmic Learning Theory, in Proceedings of Machine Learning Research 83:3-24 Available from https://proceedings.mlr.press/v83/aziz18a.html.

Related Material