Multi-play Multi-armed Bandits with Shareable Arm Capacities Revisited: Settling Scarce Capacity

Hanyang LI; Hong Xie; Defu Lian

Multi-play Multi-armed Bandits with Shareable Arm Capacities Revisited: Settling Scarce Capacity

Hanyang LI, Hong Xie, Defu Lian

Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:223-238, 2025.

Abstract

This paper revisits multi-play multi-armed bandit with shareable arm capacities problem, which is tailored for resource allocation problems arising from LLM inference serving, edge intelligence, etc. We investigate the capacity-scarce setting, a common dilemma in resource allocation problems. Existing works yield sub-optimal solutions under this setting, as they rely heavily on the assumption of abundant capacities. This paper present a rather complete solution to this setting and it makes three key contributions. We establish a minmax lower bound for the sample complexity of learning the arm capacities and propose an algorithm that exactly matches this lower bound. We derive both instance-independent and instance-dependent regret lower bounds for learning the optimal play assignment. We introduce an efficient exploration algorithm named \\texttt\{PC-CapUL\} for the capacity-scarce setting and \\texttt\{PC-CapUL\} matches the regret lower bounds up to an acceptable constant. \\texttt\{PC-CapUL\} features a novel index for coordinating the exploration of multiple plays. Experiments show significant improvement over existing methods.

Cite this Paper

BibTeX

@InProceedings{pmlr-v304-li25a,
  title = 	 {Multi-play Multi-armed Bandits with Shareable Arm Capacities Revisited: Settling Scarce Capacity},
  author =       {LI, Hanyang and Xie, Hong and Lian, Defu},
  booktitle = 	 {Proceedings of the 17th Asian Conference on Machine Learning},
  pages = 	 {223--238},
  year = 	 {2025},
  editor = 	 {Lee, Hung-yi and Liu, Tongliang},
  volume = 	 {304},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--12 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v304/main/assets/li25a/li25a.pdf},
  url = 	 {https://proceedings.mlr.press/v304/li25a.html},
  abstract = 	 {This paper revisits multi-play multi-armed bandit with  shareable arm capacities problem,  which is tailored for resource allocation problems  arising from LLM inference serving, edge intelligence, etc.  We investigate the capacity-scarce setting,  a common dilemma in resource allocation problems. Existing works yield sub-optimal solutions under this setting, as they rely heavily on the assumption of abundant capacities. This paper present a rather complete solution to this  setting and it makes three key contributions.   We establish a minmax lower bound for the  sample complexity of learning the arm capacities  and propose an algorithm that exactly matches this lower bound.  We derive both instance-independent  and instance-dependent regret lower bounds for  learning the optimal play assignment.   We introduce an efficient exploration algorithm  named \\texttt\{PC-CapUL\} for the capacity-scarce setting and \\texttt\{PC-CapUL\} matches  the regret lower bounds up to an acceptable constant. \\texttt\{PC-CapUL\} features a novel index for  coordinating the exploration of multiple plays.   Experiments show significant improvement  over existing methods.}
}

Endnote

%0 Conference Paper
%T Multi-play Multi-armed Bandits with Shareable Arm Capacities Revisited: Settling Scarce Capacity
%A Hanyang LI
%A Hong Xie
%A Defu Lian
%B Proceedings of the 17th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Hung-yi Lee
%E Tongliang Liu	
%F pmlr-v304-li25a
%I PMLR
%P 223--238
%U https://proceedings.mlr.press/v304/li25a.html
%V 304
%X This paper revisits multi-play multi-armed bandit with  shareable arm capacities problem,  which is tailored for resource allocation problems  arising from LLM inference serving, edge intelligence, etc.  We investigate the capacity-scarce setting,  a common dilemma in resource allocation problems. Existing works yield sub-optimal solutions under this setting, as they rely heavily on the assumption of abundant capacities. This paper present a rather complete solution to this  setting and it makes three key contributions.   We establish a minmax lower bound for the  sample complexity of learning the arm capacities  and propose an algorithm that exactly matches this lower bound.  We derive both instance-independent  and instance-dependent regret lower bounds for  learning the optimal play assignment.   We introduce an efficient exploration algorithm  named \\texttt\{PC-CapUL\} for the capacity-scarce setting and \\texttt\{PC-CapUL\} matches  the regret lower bounds up to an acceptable constant. \\texttt\{PC-CapUL\} features a novel index for  coordinating the exploration of multiple plays.   Experiments show significant improvement  over existing methods.

APA

LI, H., Xie, H. & Lian, D.. (2025). Multi-play Multi-armed Bandits with Shareable Arm Capacities Revisited: Settling Scarce Capacity. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:223-238 Available from https://proceedings.mlr.press/v304/li25a.html.

Multi-play Multi-armed Bandits with Shareable Arm Capacities Revisited: Settling Scarce Capacity

Abstract

Cite this Paper

Related Material