[edit]
Posterior Tracking Algorithm for Classification Bandits
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:10994-11022, 2023.
Abstract
The classification bandit problem aims to determine whether a set of given $K$ arms contains at least $L$ good arms or not. Here, an arm is said to be good if its expected reward is no less than a specified threshold. To solve this problem, we introduce an asymptotically optimal algorithm, named P-tracking, based on posterior sampling. Unlike previous asymptotically optimal algorithms that require solving a linear programming problem with an exponentially large number of constraints, P-tracking solves an equivalent optimization problem that can be computed in time linear in $K$. Additionally, unlike existing algorithms, P-tracking does not require forced exploration steps. Empirical results show that P-tracking outperforms existing algorithms in sample efficiency.