[edit]
Best Arm Identification in Linear Bandits with Linear Dimension Dependency
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4877-4886, 2018.
Abstract
We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown d-dimensional parameter vector θ, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized θ estimator based on the solution to the convex relaxation of an optimal G-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension d, as well as an algorithm with sample complexity dependent on the reward gaps of the best d arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.