A fully adaptive algorithm for pure exploration in linear bandits

Liyuan Xu, Junya Honda, Masashi Sugiyama
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:843-851, 2018.

Abstract

We propose the first fully-adaptive algorithm for pure exploration in linear bandits—the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly. While existing methods partially or entirely fix sequences of arm selections before observing rewards, our method adaptively changes the arm selection strategy based on past observations at each round. We show our sample complexity matches the achievable lower bound up to a constant factor in an extreme case. Furthermore, we evaluate the performance of the methods by simulations based on both synthetic setting and real-world data, in which our method shows vast improvement over existing ones.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-xu18d, title = {A fully adaptive algorithm for pure exploration in linear bandits}, author = {Xu, Liyuan and Honda, Junya and Sugiyama, Masashi}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {843--851}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/xu18d/xu18d.pdf}, url = {https://proceedings.mlr.press/v84/xu18d.html}, abstract = {We propose the first fully-adaptive algorithm for pure exploration in linear bandits—the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly. While existing methods partially or entirely fix sequences of arm selections before observing rewards, our method adaptively changes the arm selection strategy based on past observations at each round. We show our sample complexity matches the achievable lower bound up to a constant factor in an extreme case. Furthermore, we evaluate the performance of the methods by simulations based on both synthetic setting and real-world data, in which our method shows vast improvement over existing ones.} }
Endnote
%0 Conference Paper %T A fully adaptive algorithm for pure exploration in linear bandits %A Liyuan Xu %A Junya Honda %A Masashi Sugiyama %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-xu18d %I PMLR %P 843--851 %U https://proceedings.mlr.press/v84/xu18d.html %V 84 %X We propose the first fully-adaptive algorithm for pure exploration in linear bandits—the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly. While existing methods partially or entirely fix sequences of arm selections before observing rewards, our method adaptively changes the arm selection strategy based on past observations at each round. We show our sample complexity matches the achievable lower bound up to a constant factor in an extreme case. Furthermore, we evaluate the performance of the methods by simulations based on both synthetic setting and real-world data, in which our method shows vast improvement over existing ones.
APA
Xu, L., Honda, J. & Sugiyama, M.. (2018). A fully adaptive algorithm for pure exploration in linear bandits. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:843-851 Available from https://proceedings.mlr.press/v84/xu18d.html.

Related Material