Best Arm Identification in Linear Bandits with Linear Dimension Dependency

Chao Tao; Saúl Blanco; Yuan Zhou

Best Arm Identification in Linear Bandits with Linear Dimension Dependency

Chao Tao, Saúl Blanco, Yuan Zhou

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4877-4886, 2018.

Abstract

We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown $d$-dimensional parameter vector $\theta$, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized $\theta$ estimator based on the solution to the convex relaxation of an optimal $G$-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension $d$, as well as an algorithm with sample complexity dependent on the reward gaps of the best $d$ arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-tao18a,
  title = 	 {Best Arm Identification in Linear Bandits with Linear Dimension Dependency},
  author =       {Tao, Chao and Blanco, Sa{\'u}l and Zhou, Yuan},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {4877--4886},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/tao18a/tao18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/tao18a.html},
  abstract = 	 {We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown $d$-dimensional parameter vector $\theta$, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized $\theta$ estimator based on the solution to the convex relaxation of an optimal $G$-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension $d$, as well as an algorithm with sample complexity dependent on the reward gaps of the best $d$ arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.}
}

Endnote

%0 Conference Paper
%T Best Arm Identification in Linear Bandits with Linear Dimension Dependency
%A Chao Tao
%A Saúl Blanco
%A Yuan Zhou
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-tao18a
%I PMLR
%P 4877--4886
%U https://proceedings.mlr.press/v80/tao18a.html
%V 80
%X We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown $d$-dimensional parameter vector $\theta$, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized $\theta$ estimator based on the solution to the convex relaxation of an optimal $G$-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension $d$, as well as an algorithm with sample complexity dependent on the reward gaps of the best $d$ arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.

APA


Tao, C., Blanco, S. & Zhou, Y.. (2018). Best Arm Identification in Linear Bandits with Linear Dimension Dependency. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:4877-4886 Available from https://proceedings.mlr.press/v80/tao18a.html.

Best Arm Identification in Linear Bandits with Linear Dimension Dependency

Abstract

Cite this Paper

Related Material