Best Arm Identification in Linear Bandits with Linear Dimension Dependency

Chao Tao, Saúl Blanco, Yuan Zhou
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4877-4886, 2018.

Abstract

We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown d-dimensional parameter vector θ, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized θ estimator based on the solution to the convex relaxation of an optimal G-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension d, as well as an algorithm with sample complexity dependent on the reward gaps of the best d arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-tao18a, title = {Best Arm Identification in Linear Bandits with Linear Dimension Dependency}, author = {Tao, Chao and Blanco, Sa{\'u}l and Zhou, Yuan}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {4877--4886}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/tao18a/tao18a.pdf}, url = {https://proceedings.mlr.press/v80/tao18a.html}, abstract = {We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown $d$-dimensional parameter vector $\theta$, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized $\theta$ estimator based on the solution to the convex relaxation of an optimal $G$-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension $d$, as well as an algorithm with sample complexity dependent on the reward gaps of the best $d$ arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.} }
Endnote
%0 Conference Paper %T Best Arm Identification in Linear Bandits with Linear Dimension Dependency %A Chao Tao %A Saúl Blanco %A Yuan Zhou %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-tao18a %I PMLR %P 4877--4886 %U https://proceedings.mlr.press/v80/tao18a.html %V 80 %X We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown $d$-dimensional parameter vector $\theta$, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized $\theta$ estimator based on the solution to the convex relaxation of an optimal $G$-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension $d$, as well as an algorithm with sample complexity dependent on the reward gaps of the best $d$ arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.
APA
Tao, C., Blanco, S. & Zhou, Y.. (2018). Best Arm Identification in Linear Bandits with Linear Dimension Dependency. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:4877-4886 Available from https://proceedings.mlr.press/v80/tao18a.html.

Related Material