Multi-play Multi-armed Bandits with Shareable Arm Capacities Revisited: Settling Scarce Capacity

Hanyang LI, Hong Xie, Defu Lian
Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:223-238, 2025.

Abstract

This paper revisits multi-play multi-armed bandit with shareable arm capacities problem, which is tailored for resource allocation problems arising from LLM inference serving, edge intelligence, etc. We investigate the capacity-scarce setting, a common dilemma in resource allocation problems. Existing works yield sub-optimal solutions under this setting, as they rely heavily on the assumption of abundant capacities. This paper present a rather complete solution to this setting and it makes three key contributions. We establish a minmax lower bound for the sample complexity of learning the arm capacities and propose an algorithm that exactly matches this lower bound. We derive both instance-independent and instance-dependent regret lower bounds for learning the optimal play assignment. We introduce an efficient exploration algorithm named \\texttt\{PC-CapUL\} for the capacity-scarce setting and \\texttt\{PC-CapUL\} matches the regret lower bounds up to an acceptable constant. \\texttt\{PC-CapUL\} features a novel index for coordinating the exploration of multiple plays. Experiments show significant improvement over existing methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v304-li25a, title = {Multi-play Multi-armed Bandits with Shareable Arm Capacities Revisited: Settling Scarce Capacity}, author = {LI, Hanyang and Xie, Hong and Lian, Defu}, booktitle = {Proceedings of the 17th Asian Conference on Machine Learning}, pages = {223--238}, year = {2025}, editor = {Lee, Hung-yi and Liu, Tongliang}, volume = {304}, series = {Proceedings of Machine Learning Research}, month = {09--12 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v304/main/assets/li25a/li25a.pdf}, url = {https://proceedings.mlr.press/v304/li25a.html}, abstract = {This paper revisits multi-play multi-armed bandit with shareable arm capacities problem, which is tailored for resource allocation problems arising from LLM inference serving, edge intelligence, etc. We investigate the capacity-scarce setting, a common dilemma in resource allocation problems. Existing works yield sub-optimal solutions under this setting, as they rely heavily on the assumption of abundant capacities. This paper present a rather complete solution to this setting and it makes three key contributions. We establish a minmax lower bound for the sample complexity of learning the arm capacities and propose an algorithm that exactly matches this lower bound. We derive both instance-independent and instance-dependent regret lower bounds for learning the optimal play assignment. We introduce an efficient exploration algorithm named \\texttt\{PC-CapUL\} for the capacity-scarce setting and \\texttt\{PC-CapUL\} matches the regret lower bounds up to an acceptable constant. \\texttt\{PC-CapUL\} features a novel index for coordinating the exploration of multiple plays. Experiments show significant improvement over existing methods.} }
Endnote
%0 Conference Paper %T Multi-play Multi-armed Bandits with Shareable Arm Capacities Revisited: Settling Scarce Capacity %A Hanyang LI %A Hong Xie %A Defu Lian %B Proceedings of the 17th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Hung-yi Lee %E Tongliang Liu %F pmlr-v304-li25a %I PMLR %P 223--238 %U https://proceedings.mlr.press/v304/li25a.html %V 304 %X This paper revisits multi-play multi-armed bandit with shareable arm capacities problem, which is tailored for resource allocation problems arising from LLM inference serving, edge intelligence, etc. We investigate the capacity-scarce setting, a common dilemma in resource allocation problems. Existing works yield sub-optimal solutions under this setting, as they rely heavily on the assumption of abundant capacities. This paper present a rather complete solution to this setting and it makes three key contributions. We establish a minmax lower bound for the sample complexity of learning the arm capacities and propose an algorithm that exactly matches this lower bound. We derive both instance-independent and instance-dependent regret lower bounds for learning the optimal play assignment. We introduce an efficient exploration algorithm named \\texttt\{PC-CapUL\} for the capacity-scarce setting and \\texttt\{PC-CapUL\} matches the regret lower bounds up to an acceptable constant. \\texttt\{PC-CapUL\} features a novel index for coordinating the exploration of multiple plays. Experiments show significant improvement over existing methods.
APA
LI, H., Xie, H. & Lian, D.. (2025). Multi-play Multi-armed Bandits with Shareable Arm Capacities Revisited: Settling Scarce Capacity. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:223-238 Available from https://proceedings.mlr.press/v304/li25a.html.

Related Material