[edit]
Multi-play Multi-armed Bandits with Shareable Arm Capacities Revisited: Settling Scarce Capacity
Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:223-238, 2025.
Abstract
This paper revisits multi-play multi-armed bandit with shareable arm capacities problem, which is tailored for resource allocation problems arising from LLM inference serving, edge intelligence, etc. We investigate the capacity-scarce setting, a common dilemma in resource allocation problems. Existing works yield sub-optimal solutions under this setting, as they rely heavily on the assumption of abundant capacities. This paper present a rather complete solution to this setting and it makes three key contributions. We establish a minmax lower bound for the sample complexity of learning the arm capacities and propose an algorithm that exactly matches this lower bound. We derive both instance-independent and instance-dependent regret lower bounds for learning the optimal play assignment. We introduce an efficient exploration algorithm named \\texttt\{PC-CapUL\} for the capacity-scarce setting and \\texttt\{PC-CapUL\} matches the regret lower bounds up to an acceptable constant. \\texttt\{PC-CapUL\} features a novel index for coordinating the exploration of multiple plays. Experiments show significant improvement over existing methods.