Best Arm Identification in Linear Bandits with Linear Dimension Dependency

Chao Tao, Saúl Blanco, Yuan Zhou
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4877-4886, 2018.

Abstract

We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown $d$-dimensional parameter vector $\theta$, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized $\theta$ estimator based on the solution to the convex relaxation of an optimal $G$-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension $d$, as well as an algorithm with sample complexity dependent on the reward gaps of the best $d$ arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-tao18a, title = {Best Arm Identification in Linear Bandits with Linear Dimension Dependency}, author = {Tao, Chao and Blanco, Sa{\'u}l and Zhou, Yuan}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {4877--4886}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/tao18a/tao18a.pdf}, url = {https://proceedings.mlr.press/v80/tao18a.html}, abstract = {We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown $d$-dimensional parameter vector $\theta$, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized $\theta$ estimator based on the solution to the convex relaxation of an optimal $G$-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension $d$, as well as an algorithm with sample complexity dependent on the reward gaps of the best $d$ arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.} }
Endnote
%0 Conference Paper %T Best Arm Identification in Linear Bandits with Linear Dimension Dependency %A Chao Tao %A Saúl Blanco %A Yuan Zhou %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-tao18a %I PMLR %P 4877--4886 %U https://proceedings.mlr.press/v80/tao18a.html %V 80 %X We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown $d$-dimensional parameter vector $\theta$, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized $\theta$ estimator based on the solution to the convex relaxation of an optimal $G$-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension $d$, as well as an algorithm with sample complexity dependent on the reward gaps of the best $d$ arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.
APA
Tao, C., Blanco, S. & Zhou, Y.. (2018). Best Arm Identification in Linear Bandits with Linear Dimension Dependency. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:4877-4886 Available from https://proceedings.mlr.press/v80/tao18a.html.

Related Material