Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

Shintaro Nakamura, Masashi Sugiyama
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1225-1233, 2024.

Abstract

We study the real-valued combinatorial pure exploration of the multi-armed bandit in the fixed-budget setting. We first introduce an algorithm named the Combinatorial Successive Asign (CSA) algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large with respect to the number of arms. We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent. Then, we introduce another algorithm named the Minimax Combinatorial Successive Accepts and Rejects (Minimax-CombSAR) algorithm for the case where the size of the action class is polynomial, and show that it is optimal, which matches a lower bound. Finally, we experimentally compare the algorithms with previous methods and show that our algorithm performs better.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-nakamura24a, title = {Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit}, author = {Nakamura, Shintaro and Sugiyama, Masashi}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {1225--1233}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/nakamura24a/nakamura24a.pdf}, url = {https://proceedings.mlr.press/v238/nakamura24a.html}, abstract = {We study the real-valued combinatorial pure exploration of the multi-armed bandit in the fixed-budget setting. We first introduce an algorithm named the Combinatorial Successive Asign (CSA) algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large with respect to the number of arms. We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent. Then, we introduce another algorithm named the Minimax Combinatorial Successive Accepts and Rejects (Minimax-CombSAR) algorithm for the case where the size of the action class is polynomial, and show that it is optimal, which matches a lower bound. Finally, we experimentally compare the algorithms with previous methods and show that our algorithm performs better.} }
Endnote
%0 Conference Paper %T Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit %A Shintaro Nakamura %A Masashi Sugiyama %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-nakamura24a %I PMLR %P 1225--1233 %U https://proceedings.mlr.press/v238/nakamura24a.html %V 238 %X We study the real-valued combinatorial pure exploration of the multi-armed bandit in the fixed-budget setting. We first introduce an algorithm named the Combinatorial Successive Asign (CSA) algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large with respect to the number of arms. We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent. Then, we introduce another algorithm named the Minimax Combinatorial Successive Accepts and Rejects (Minimax-CombSAR) algorithm for the case where the size of the action class is polynomial, and show that it is optimal, which matches a lower bound. Finally, we experimentally compare the algorithms with previous methods and show that our algorithm performs better.
APA
Nakamura, S. & Sugiyama, M.. (2024). Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:1225-1233 Available from https://proceedings.mlr.press/v238/nakamura24a.html.

Related Material