Posterior Tracking Algorithm for Classification Bandits

Koji Tabata; Junpei Komiyama; Atsuyoshi Nakamura; Tamiki Komatsuzaki

Posterior Tracking Algorithm for Classification Bandits

Koji Tabata, Junpei Komiyama, Atsuyoshi Nakamura, Tamiki Komatsuzaki

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:10994-11022, 2023.

Abstract

The classification bandit problem aims to determine whether a set of given $K$ arms contains at least $L$ good arms or not. Here, an arm is said to be good if its expected reward is no less than a specified threshold. To solve this problem, we introduce an asymptotically optimal algorithm, named P-tracking, based on posterior sampling. Unlike previous asymptotically optimal algorithms that require solving a linear programming problem with an exponentially large number of constraints, P-tracking solves an equivalent optimization problem that can be computed in time linear in $K$. Additionally, unlike existing algorithms, P-tracking does not require forced exploration steps. Empirical results show that P-tracking outperforms existing algorithms in sample efficiency.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-tabata23a,
  title = 	 {Posterior Tracking Algorithm for Classification Bandits},
  author =       {Tabata, Koji and Komiyama, Junpei and Nakamura, Atsuyoshi and Komatsuzaki, Tamiki},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {10994--11022},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/tabata23a/tabata23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/tabata23a.html},
  abstract = 	 {The classification bandit problem aims to determine whether a set of given $K$ arms contains at least $L$ good arms or not. Here, an arm is said to be good if its expected reward is no less than a specified threshold. To solve this problem, we introduce an asymptotically optimal algorithm, named P-tracking, based on posterior sampling. Unlike previous asymptotically optimal algorithms that require solving a linear programming problem with an exponentially large number of constraints, P-tracking solves an equivalent optimization problem that can be computed in time linear in $K$. Additionally, unlike existing algorithms, P-tracking does not require forced exploration steps. Empirical results show that P-tracking outperforms existing algorithms in sample efficiency.}
}

Endnote

%0 Conference Paper
%T Posterior Tracking Algorithm for Classification Bandits
%A Koji Tabata
%A Junpei Komiyama
%A Atsuyoshi Nakamura
%A Tamiki Komatsuzaki
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-tabata23a
%I PMLR
%P 10994--11022
%U https://proceedings.mlr.press/v206/tabata23a.html
%V 206
%X The classification bandit problem aims to determine whether a set of given $K$ arms contains at least $L$ good arms or not. Here, an arm is said to be good if its expected reward is no less than a specified threshold. To solve this problem, we introduce an asymptotically optimal algorithm, named P-tracking, based on posterior sampling. Unlike previous asymptotically optimal algorithms that require solving a linear programming problem with an exponentially large number of constraints, P-tracking solves an equivalent optimization problem that can be computed in time linear in $K$. Additionally, unlike existing algorithms, P-tracking does not require forced exploration steps. Empirical results show that P-tracking outperforms existing algorithms in sample efficiency.

APA


Tabata, K., Komiyama, J., Nakamura, A. & Komatsuzaki, T.. (2023). Posterior Tracking Algorithm for Classification Bandits. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:10994-11022 Available from https://proceedings.mlr.press/v206/tabata23a.html.

Posterior Tracking Algorithm for Classification Bandits

Abstract

Cite this Paper

Related Material