Learning to Act in Decentralized Partially Observable MDPs

Jilles Dibangoye, Olivier Buffet
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1233-1242, 2018.

Abstract

We address a long-standing open problem of reinforcement learning in decentralized partially observable Markov decision processes. Previous attempts focussed on different forms of generalized policy iteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simpler to store and update than policies. We derive, under certain conditions, the first near-optimal cooperative multi-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedy maximization by mixed-integer linear programming. Experiments show our approach can learn to act near-optimally in many finite domains from the literature.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-dibangoye18a, title = {Learning to Act in Decentralized Partially Observable {MDP}s}, author = {Dibangoye, Jilles and Buffet, Olivier}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {1233--1242}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/dibangoye18a/dibangoye18a.pdf}, url = {https://proceedings.mlr.press/v80/dibangoye18a.html}, abstract = {We address a long-standing open problem of reinforcement learning in decentralized partially observable Markov decision processes. Previous attempts focussed on different forms of generalized policy iteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simpler to store and update than policies. We derive, under certain conditions, the first near-optimal cooperative multi-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedy maximization by mixed-integer linear programming. Experiments show our approach can learn to act near-optimally in many finite domains from the literature.} }
Endnote
%0 Conference Paper %T Learning to Act in Decentralized Partially Observable MDPs %A Jilles Dibangoye %A Olivier Buffet %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-dibangoye18a %I PMLR %P 1233--1242 %U https://proceedings.mlr.press/v80/dibangoye18a.html %V 80 %X We address a long-standing open problem of reinforcement learning in decentralized partially observable Markov decision processes. Previous attempts focussed on different forms of generalized policy iteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simpler to store and update than policies. We derive, under certain conditions, the first near-optimal cooperative multi-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedy maximization by mixed-integer linear programming. Experiments show our approach can learn to act near-optimally in many finite domains from the literature.
APA
Dibangoye, J. & Buffet, O.. (2018). Learning to Act in Decentralized Partially Observable MDPs. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1233-1242 Available from https://proceedings.mlr.press/v80/dibangoye18a.html.

Related Material