On Multi-Armed Bandit with Impatient Arms

Yuming Shao, Zhixuan Fang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:44429-44473, 2024.

Abstract

In this paper, we investigate a Multi-Armed Bandit (MAB) setting where an arm exits the game if the algorithm continuously neglects it. This setup is motivated by real-world scenarios, such as online advertising and crowdsourcing, where arms only gain benefits after being pulled by the algorithm. We identify the intrinsic hardness of this problem and limitations in existing approaches. We propose FC-SE algorithm with expected regret upper bounds as our solution to this problem. As an extension, we even allow new arms to enter after the game starts and design FC-Entry algorithm with performance guarantees for this setup. Finally, we conduct experiments to validate our theoretical results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-shao24b, title = {On Multi-Armed Bandit with Impatient Arms}, author = {Shao, Yuming and Fang, Zhixuan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {44429--44473}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/shao24b/shao24b.pdf}, url = {https://proceedings.mlr.press/v235/shao24b.html}, abstract = {In this paper, we investigate a Multi-Armed Bandit (MAB) setting where an arm exits the game if the algorithm continuously neglects it. This setup is motivated by real-world scenarios, such as online advertising and crowdsourcing, where arms only gain benefits after being pulled by the algorithm. We identify the intrinsic hardness of this problem and limitations in existing approaches. We propose FC-SE algorithm with expected regret upper bounds as our solution to this problem. As an extension, we even allow new arms to enter after the game starts and design FC-Entry algorithm with performance guarantees for this setup. Finally, we conduct experiments to validate our theoretical results.} }
Endnote
%0 Conference Paper %T On Multi-Armed Bandit with Impatient Arms %A Yuming Shao %A Zhixuan Fang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-shao24b %I PMLR %P 44429--44473 %U https://proceedings.mlr.press/v235/shao24b.html %V 235 %X In this paper, we investigate a Multi-Armed Bandit (MAB) setting where an arm exits the game if the algorithm continuously neglects it. This setup is motivated by real-world scenarios, such as online advertising and crowdsourcing, where arms only gain benefits after being pulled by the algorithm. We identify the intrinsic hardness of this problem and limitations in existing approaches. We propose FC-SE algorithm with expected regret upper bounds as our solution to this problem. As an extension, we even allow new arms to enter after the game starts and design FC-Entry algorithm with performance guarantees for this setup. Finally, we conduct experiments to validate our theoretical results.
APA
Shao, Y. & Fang, Z.. (2024). On Multi-Armed Bandit with Impatient Arms. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:44429-44473 Available from https://proceedings.mlr.press/v235/shao24b.html.

Related Material