Monte-Carlo Search for an Equilibrium in Dec-POMDPs

Yang You; Vincent Thomas; Francis Colas; Olivier Buffet

Monte-Carlo Search for an Equilibrium in Dec-POMDPs

Yang You, Vincent Thomas, Francis Colas, Olivier Buffet

Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:2444-2453, 2023.

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of designing individual controllers for a group of collaborative agents under stochastic dynamics and partial observability. Seeking a global optimum is difficult (NEXP complete), but seeking a Nash equilibrium - each agent policy being a best response to the other agents - is more accessible, and allowed addressing infinite-horizon problems with solutions in the form of finite state controllers. In this paper, we show that this approach can be adapted to cases where only a generative model (a simulator) of the Dec-POMDP is available. This requires relying on a simulation-based POMDP solver to construct an agent’s FSC node by node. A related process is used to heuristically derive initial FSCs. Experiment with benchmarks shows that MC-JESP is competitive with existing Dec-POMDP solvers, even better than many offline methods using explicit models.

Cite this Paper

BibTeX


@InProceedings{pmlr-v216-you23a,
  title = 	 {Monte-{C}arlo Search for an Equilibrium in {Dec-POMDPs}},
  author =       {You, Yang and Thomas, Vincent and Colas, Francis and Buffet, Olivier},
  booktitle = 	 {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {2444--2453},
  year = 	 {2023},
  editor = 	 {Evans, Robin J. and Shpitser, Ilya},
  volume = 	 {216},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {31 Jul--04 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v216/you23a/you23a.pdf},
  url = 	 {https://proceedings.mlr.press/v216/you23a.html},
  abstract = 	 {Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of designing individual controllers for a group of collaborative agents under stochastic dynamics and partial observability. Seeking a global optimum is difficult (NEXP complete), but seeking a Nash equilibrium - each agent policy being a best response to the other agents - is more accessible, and allowed addressing infinite-horizon problems with solutions in the form of finite state controllers. In this paper, we show that this approach can be adapted to cases where only a generative model (a simulator) of the Dec-POMDP is available. This requires relying on a simulation-based POMDP solver to construct an agent’s FSC node by node. A related process is used to heuristically derive initial FSCs. Experiment with benchmarks shows that MC-JESP is competitive with existing Dec-POMDP solvers, even better than many offline methods using explicit models.}
}

Endnote

%0 Conference Paper
%T Monte-Carlo Search for an Equilibrium in Dec-POMDPs
%A Yang You
%A Vincent Thomas
%A Francis Colas
%A Olivier Buffet
%B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2023
%E Robin J. Evans
%E Ilya Shpitser	
%F pmlr-v216-you23a
%I PMLR
%P 2444--2453
%U https://proceedings.mlr.press/v216/you23a.html
%V 216
%X Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of designing individual controllers for a group of collaborative agents under stochastic dynamics and partial observability. Seeking a global optimum is difficult (NEXP complete), but seeking a Nash equilibrium - each agent policy being a best response to the other agents - is more accessible, and allowed addressing infinite-horizon problems with solutions in the form of finite state controllers. In this paper, we show that this approach can be adapted to cases where only a generative model (a simulator) of the Dec-POMDP is available. This requires relying on a simulation-based POMDP solver to construct an agent’s FSC node by node. A related process is used to heuristically derive initial FSCs. Experiment with benchmarks shows that MC-JESP is competitive with existing Dec-POMDP solvers, even better than many offline methods using explicit models.

APA


You, Y., Thomas, V., Colas, F. & Buffet, O.. (2023). Monte-Carlo Search for an Equilibrium in Dec-POMDPs. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:2444-2453 Available from https://proceedings.mlr.press/v216/you23a.html.

Monte-Carlo Search for an Equilibrium in Dec-POMDPs

Abstract

Cite this Paper

Related Material