Optimal Algorithm for Max-Min Fair Bandit

Zilong Wang, Zhiyao Zhang, Shuai Li
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:65480-65495, 2025.

Abstract

Multi-player multi-armed bandit (MP-MAB) has been widely studied owing to its diverse applications across numerous domains. We consider an MP-MAB problem where $N$ players compete for $K$ arms in $T$ rounds. The reward distributions are heterogeneous where each player has a different expected reward for the same arm. When multiple players select the same arm, they collide and obtain zero rewards. In this paper, our target is to find the max-min fairness matching that maximizes the reward of the player who receives the lowest reward. This paper improves the existing max-min regret upper bound of $O(\exp(1/\Delta) + K^3 \log T\log \log T)$. More specifically, our decentralized fair elimination algorithm (DFE) deals with heterogeneity and collision carefully and attains a regret upper bound of $O((N^2+K)\log T / \Delta)$, where $\Delta$ is the minimum reward gap between max-min value and sub-optimal arms. In addition, this paper also provides an $\Omega(\max\\{ N^2, K \\} \log T / \Delta)$ regret lower bound for this problem, which indicates that our algorithm is optimal with respect to key parameters $T, N, K$, and $\Delta$. Additional numerical experiments also show the efficiency and improvement of our algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-wang25eo, title = {Optimal Algorithm for Max-Min Fair Bandit}, author = {Wang, Zilong and Zhang, Zhiyao and Li, Shuai}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {65480--65495}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wang25eo/wang25eo.pdf}, url = {https://proceedings.mlr.press/v267/wang25eo.html}, abstract = {Multi-player multi-armed bandit (MP-MAB) has been widely studied owing to its diverse applications across numerous domains. We consider an MP-MAB problem where $N$ players compete for $K$ arms in $T$ rounds. The reward distributions are heterogeneous where each player has a different expected reward for the same arm. When multiple players select the same arm, they collide and obtain zero rewards. In this paper, our target is to find the max-min fairness matching that maximizes the reward of the player who receives the lowest reward. This paper improves the existing max-min regret upper bound of $O(\exp(1/\Delta) + K^3 \log T\log \log T)$. More specifically, our decentralized fair elimination algorithm (DFE) deals with heterogeneity and collision carefully and attains a regret upper bound of $O((N^2+K)\log T / \Delta)$, where $\Delta$ is the minimum reward gap between max-min value and sub-optimal arms. In addition, this paper also provides an $\Omega(\max\\{ N^2, K \\} \log T / \Delta)$ regret lower bound for this problem, which indicates that our algorithm is optimal with respect to key parameters $T, N, K$, and $\Delta$. Additional numerical experiments also show the efficiency and improvement of our algorithms.} }
Endnote
%0 Conference Paper %T Optimal Algorithm for Max-Min Fair Bandit %A Zilong Wang %A Zhiyao Zhang %A Shuai Li %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-wang25eo %I PMLR %P 65480--65495 %U https://proceedings.mlr.press/v267/wang25eo.html %V 267 %X Multi-player multi-armed bandit (MP-MAB) has been widely studied owing to its diverse applications across numerous domains. We consider an MP-MAB problem where $N$ players compete for $K$ arms in $T$ rounds. The reward distributions are heterogeneous where each player has a different expected reward for the same arm. When multiple players select the same arm, they collide and obtain zero rewards. In this paper, our target is to find the max-min fairness matching that maximizes the reward of the player who receives the lowest reward. This paper improves the existing max-min regret upper bound of $O(\exp(1/\Delta) + K^3 \log T\log \log T)$. More specifically, our decentralized fair elimination algorithm (DFE) deals with heterogeneity and collision carefully and attains a regret upper bound of $O((N^2+K)\log T / \Delta)$, where $\Delta$ is the minimum reward gap between max-min value and sub-optimal arms. In addition, this paper also provides an $\Omega(\max\\{ N^2, K \\} \log T / \Delta)$ regret lower bound for this problem, which indicates that our algorithm is optimal with respect to key parameters $T, N, K$, and $\Delta$. Additional numerical experiments also show the efficiency and improvement of our algorithms.
APA
Wang, Z., Zhang, Z. & Li, S.. (2025). Optimal Algorithm for Max-Min Fair Bandit. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:65480-65495 Available from https://proceedings.mlr.press/v267/wang25eo.html.

Related Material