Model-Based Best Arm Identification for Decreasing Bandits

Sho Takemori, Yuhei Umeda, Aditya Gopalan
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1567-1575, 2024.

Abstract

We study the problem of reliably identifying the best (lowest loss) arm in a stochastic multi-armed bandit when the expected loss of each arm is monotone decreasing as a function of its pull count. This models, for instance, scenarios where each arm itself represents an optimization algorithm for finding the minimizer of a common function, and there is a limited time available to test the algorithms before committing to one of them. We assume that the decreasing expected loss of each arm depends on the number of its pulls as a (inverse) polynomial with unknown coefficients. We propose two fixed-budget best arm identification algorithms – one for the case of sparse polynomial decay models and the other for general polynomial models – along with bounds on the identification error probability. We also derive algorithm-independent lower bounds on the error probability. These bounds are seen to be factored into the product of the usual problem complexity and the model complexity that only depends on the parameters of the model. This indicates that our methods can identify the best arm even when the budget is smaller. We conduct empirical studies of our algorithms to complement our theoretical findings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-takemori24a, title = {Model-Based Best Arm Identification for Decreasing Bandits}, author = {Takemori, Sho and Umeda, Yuhei and Gopalan, Aditya}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {1567--1575}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/takemori24a/takemori24a.pdf}, url = {https://proceedings.mlr.press/v238/takemori24a.html}, abstract = {We study the problem of reliably identifying the best (lowest loss) arm in a stochastic multi-armed bandit when the expected loss of each arm is monotone decreasing as a function of its pull count. This models, for instance, scenarios where each arm itself represents an optimization algorithm for finding the minimizer of a common function, and there is a limited time available to test the algorithms before committing to one of them. We assume that the decreasing expected loss of each arm depends on the number of its pulls as a (inverse) polynomial with unknown coefficients. We propose two fixed-budget best arm identification algorithms – one for the case of sparse polynomial decay models and the other for general polynomial models – along with bounds on the identification error probability. We also derive algorithm-independent lower bounds on the error probability. These bounds are seen to be factored into the product of the usual problem complexity and the model complexity that only depends on the parameters of the model. This indicates that our methods can identify the best arm even when the budget is smaller. We conduct empirical studies of our algorithms to complement our theoretical findings.} }
Endnote
%0 Conference Paper %T Model-Based Best Arm Identification for Decreasing Bandits %A Sho Takemori %A Yuhei Umeda %A Aditya Gopalan %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-takemori24a %I PMLR %P 1567--1575 %U https://proceedings.mlr.press/v238/takemori24a.html %V 238 %X We study the problem of reliably identifying the best (lowest loss) arm in a stochastic multi-armed bandit when the expected loss of each arm is monotone decreasing as a function of its pull count. This models, for instance, scenarios where each arm itself represents an optimization algorithm for finding the minimizer of a common function, and there is a limited time available to test the algorithms before committing to one of them. We assume that the decreasing expected loss of each arm depends on the number of its pulls as a (inverse) polynomial with unknown coefficients. We propose two fixed-budget best arm identification algorithms – one for the case of sparse polynomial decay models and the other for general polynomial models – along with bounds on the identification error probability. We also derive algorithm-independent lower bounds on the error probability. These bounds are seen to be factored into the product of the usual problem complexity and the model complexity that only depends on the parameters of the model. This indicates that our methods can identify the best arm even when the budget is smaller. We conduct empirical studies of our algorithms to complement our theoretical findings.
APA
Takemori, S., Umeda, Y. & Gopalan, A.. (2024). Model-Based Best Arm Identification for Decreasing Bandits. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:1567-1575 Available from https://proceedings.mlr.press/v238/takemori24a.html.

Related Material