Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems

Eugenio Bargiacchi; Timothy Verstraeten; Diederik Roijers; Ann Nowé; Hado Hasselt

Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems

Eugenio Bargiacchi, Timothy Verstraeten, Diederik Roijers, Ann Nowé, Hado Hasselt

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:482-490, 2018.

Abstract

Learning to coordinate between multiple agents is an important problem in many reinforcement learning problems. Key to learning to coordinate is exploiting loose couplings, i.e., conditional independences between agents. In this paper we study learning in repeated fully cooperative games, multi-agent multi-armed bandits (MAMABs), in which the expected rewards can be expressed as a coordination graph. We propose multi-agent upper confidence exploration (MAUCE), a new algorithm for MAMABs that exploits loose couplings, which enables us to prove a regret bound that is logarithmic in the number of arm pulls and only linear in the number of agents. We empirically compare MAUCE to sparse cooperative Q-learning, and a state-of-the-art combinatorial bandit approach, and show that it performs much better on a variety of settings, including learning control policies for wind farms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-bargiacchi18a,
  title = 	 {Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems},
  author =       {Bargiacchi, Eugenio and Verstraeten, Timothy and Roijers, Diederik and Now{\'e}, Ann and van Hasselt, Hado},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {482--490},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/bargiacchi18a/bargiacchi18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/bargiacchi18a.html},
  abstract = 	 {Learning to coordinate between multiple agents is an important problem in many reinforcement learning problems. Key to learning to coordinate is exploiting loose couplings, i.e., conditional independences between agents. In this paper we study learning in repeated fully cooperative games, multi-agent multi-armed bandits (MAMABs), in which the expected rewards can be expressed as a coordination graph. We propose multi-agent upper confidence exploration (MAUCE), a new algorithm for MAMABs that exploits loose couplings, which enables us to prove a regret bound that is logarithmic in the number of arm pulls and only linear in the number of agents. We empirically compare MAUCE to sparse cooperative Q-learning, and a state-of-the-art combinatorial bandit approach, and show that it performs much better on a variety of settings, including learning control policies for wind farms.}
}

Endnote

%0 Conference Paper
%T Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems
%A Eugenio Bargiacchi
%A Timothy Verstraeten
%A Diederik Roijers
%A Ann Nowé
%A Hado Hasselt
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-bargiacchi18a
%I PMLR
%P 482--490
%U https://proceedings.mlr.press/v80/bargiacchi18a.html
%V 80
%X Learning to coordinate between multiple agents is an important problem in many reinforcement learning problems. Key to learning to coordinate is exploiting loose couplings, i.e., conditional independences between agents. In this paper we study learning in repeated fully cooperative games, multi-agent multi-armed bandits (MAMABs), in which the expected rewards can be expressed as a coordination graph. We propose multi-agent upper confidence exploration (MAUCE), a new algorithm for MAMABs that exploits loose couplings, which enables us to prove a regret bound that is logarithmic in the number of arm pulls and only linear in the number of agents. We empirically compare MAUCE to sparse cooperative Q-learning, and a state-of-the-art combinatorial bandit approach, and show that it performs much better on a variety of settings, including learning control policies for wind farms.

APA


Bargiacchi, E., Verstraeten, T., Roijers, D., Nowé, A. & Hasselt, H.. (2018). Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:482-490 Available from https://proceedings.mlr.press/v80/bargiacchi18a.html.

Related Material

Download PDF