Best Arm Identification in Multi-Agent Multi-Armed Bandits

Filippo Vannella; Alexandre Proutiere; Jaeseong Jeong

Best Arm Identification in Multi-Agent Multi-Armed Bandits

Filippo Vannella, Alexandre Proutiere, Jaeseong Jeong

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:34875-34907, 2023.

Abstract

We investigate the problem of best arm identification in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph. The objective is to find an optimal global action with a prescribed level of confidence and minimal sample complexity. We derive a tight instance-specific lower bound of the sample complexity and characterize the corresponding optimal sampling strategy. Unfortunately, this bound is obtained by solving a combinatorial optimization problem with a number of variables and constraints exponentially growing with the number of agents. We leverage Mean Field (MF) techniques to obtain, in a computationally efficient manner, an approximation of the lower bound. The approximation scales at most as

$\rho K^d$ (where

$\rho$ ,

$K$ , and

$d$ denote the number of factors in the graph, the number of possible actions per agent, and the maximal degree of the factor graph). We devise MF-TaS (Mean-Field-Track-and-Stop), an algorithm whose sample complexity provably matches our approximated lower bound. We illustrate the performance of MF-TaS numerically using both synthetic and real-world experiments (e.g., to solve the antenna tilt optimization problem in radio communication networks).

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-vannella23a,
  title = 	 {Best Arm Identification in Multi-Agent Multi-Armed Bandits},
  author =       {Vannella, Filippo and Proutiere, Alexandre and Jeong, Jaeseong},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {34875--34907},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/vannella23a/vannella23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/vannella23a.html},
  abstract = 	 {We investigate the problem of best arm identification in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph. The objective is to find an optimal global action with a prescribed level of confidence and minimal sample complexity. We derive a tight instance-specific lower bound of the sample complexity and characterize the corresponding optimal sampling strategy. Unfortunately, this bound is obtained by solving a combinatorial optimization problem with a number of variables and constraints exponentially growing with the number of agents. We leverage Mean Field (MF) techniques to obtain, in a computationally efficient manner, an approximation of the lower bound. The approximation scales at most as $\rho K^d$ (where $\rho$, $K$, and $d$ denote the number of factors in the graph, the number of possible actions per agent, and the maximal degree of the factor graph). We devise MF-TaS (Mean-Field-Track-and-Stop), an algorithm whose sample complexity provably matches our approximated lower bound. We illustrate the performance of MF-TaS numerically using both synthetic and real-world experiments (e.g., to solve the antenna tilt optimization problem in radio communication networks).}
}

Endnote

%0 Conference Paper
%T Best Arm Identification in Multi-Agent Multi-Armed Bandits
%A Filippo Vannella
%A Alexandre Proutiere
%A Jaeseong Jeong
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-vannella23a
%I PMLR
%P 34875--34907
%U https://proceedings.mlr.press/v202/vannella23a.html
%V 202
%X We investigate the problem of best arm identification in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph. The objective is to find an optimal global action with a prescribed level of confidence and minimal sample complexity. We derive a tight instance-specific lower bound of the sample complexity and characterize the corresponding optimal sampling strategy. Unfortunately, this bound is obtained by solving a combinatorial optimization problem with a number of variables and constraints exponentially growing with the number of agents. We leverage Mean Field (MF) techniques to obtain, in a computationally efficient manner, an approximation of the lower bound. The approximation scales at most as $\rho K^d$ (where $\rho$, $K$, and $d$ denote the number of factors in the graph, the number of possible actions per agent, and the maximal degree of the factor graph). We devise MF-TaS (Mean-Field-Track-and-Stop), an algorithm whose sample complexity provably matches our approximated lower bound. We illustrate the performance of MF-TaS numerically using both synthetic and real-world experiments (e.g., to solve the antenna tilt optimization problem in radio communication networks).

APA


Vannella, F., Proutiere, A. & Jeong, J.. (2023). Best Arm Identification in Multi-Agent Multi-Armed Bandits. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:34875-34907 Available from https://proceedings.mlr.press/v202/vannella23a.html.

Best Arm Identification in Multi-Agent Multi-Armed Bandits

Abstract

Cite this Paper

Related Material