Posterior Tracking Algorithm for Classification Bandits

Koji Tabata, Junpei Komiyama, Atsuyoshi Nakamura, Tamiki Komatsuzaki
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:10994-11022, 2023.

Abstract

The classification bandit problem aims to determine whether a set of given $K$ arms contains at least $L$ good arms or not. Here, an arm is said to be good if its expected reward is no less than a specified threshold. To solve this problem, we introduce an asymptotically optimal algorithm, named P-tracking, based on posterior sampling. Unlike previous asymptotically optimal algorithms that require solving a linear programming problem with an exponentially large number of constraints, P-tracking solves an equivalent optimization problem that can be computed in time linear in $K$. Additionally, unlike existing algorithms, P-tracking does not require forced exploration steps. Empirical results show that P-tracking outperforms existing algorithms in sample efficiency.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-tabata23a, title = {Posterior Tracking Algorithm for Classification Bandits}, author = {Tabata, Koji and Komiyama, Junpei and Nakamura, Atsuyoshi and Komatsuzaki, Tamiki}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {10994--11022}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/tabata23a/tabata23a.pdf}, url = {https://proceedings.mlr.press/v206/tabata23a.html}, abstract = {The classification bandit problem aims to determine whether a set of given $K$ arms contains at least $L$ good arms or not. Here, an arm is said to be good if its expected reward is no less than a specified threshold. To solve this problem, we introduce an asymptotically optimal algorithm, named P-tracking, based on posterior sampling. Unlike previous asymptotically optimal algorithms that require solving a linear programming problem with an exponentially large number of constraints, P-tracking solves an equivalent optimization problem that can be computed in time linear in $K$. Additionally, unlike existing algorithms, P-tracking does not require forced exploration steps. Empirical results show that P-tracking outperforms existing algorithms in sample efficiency.} }
Endnote
%0 Conference Paper %T Posterior Tracking Algorithm for Classification Bandits %A Koji Tabata %A Junpei Komiyama %A Atsuyoshi Nakamura %A Tamiki Komatsuzaki %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-tabata23a %I PMLR %P 10994--11022 %U https://proceedings.mlr.press/v206/tabata23a.html %V 206 %X The classification bandit problem aims to determine whether a set of given $K$ arms contains at least $L$ good arms or not. Here, an arm is said to be good if its expected reward is no less than a specified threshold. To solve this problem, we introduce an asymptotically optimal algorithm, named P-tracking, based on posterior sampling. Unlike previous asymptotically optimal algorithms that require solving a linear programming problem with an exponentially large number of constraints, P-tracking solves an equivalent optimization problem that can be computed in time linear in $K$. Additionally, unlike existing algorithms, P-tracking does not require forced exploration steps. Empirical results show that P-tracking outperforms existing algorithms in sample efficiency.
APA
Tabata, K., Komiyama, J., Nakamura, A. & Komatsuzaki, T.. (2023). Posterior Tracking Algorithm for Classification Bandits. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:10994-11022 Available from https://proceedings.mlr.press/v206/tabata23a.html.

Related Material