Communicating via Markov Decision Processes

Samuel Sokota; Christian A Schroeder De Witt; Maximilian Igl; Luisa M Zintgraf; Philip Torr; Martin Strohmeier; Zico Kolter; Shimon Whiteson; Jakob Foerster

Communicating via Markov Decision Processes

Samuel Sokota, Christian A Schroeder De Witt, Maximilian Igl, Luisa M Zintgraf, Philip Torr, Martin Strohmeier, Zico Kolter, Shimon Whiteson, Jakob Foerster

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:20314-20328, 2022.

Abstract

We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available—namely, they require balancing communication with the associated cost of communicating. We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME. Due to recent breakthroughs in approximation algorithms for minimum entropy coupling, MEME is not merely a theoretical algorithm, but can be applied to practical settings. Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs. To the latter point, we demonstrate that MEME is able to losslessly communicate binary images via trajectories of Cartpole and Pong, while simultaneously achieving the maximal or near maximal expected returns, and that it is even capable of performing well in the presence of actuator noise.

Cite this Paper

BibTeX

@InProceedings{pmlr-v162-sokota22a,
  title = 	 {Communicating via {M}arkov Decision Processes},
  author =       {Sokota, Samuel and De Witt, Christian A Schroeder and Igl, Maximilian and Zintgraf, Luisa M and Torr, Philip and Strohmeier, Martin and Kolter, Zico and Whiteson, Shimon and Foerster, Jakob},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {20314--20328},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/sokota22a/sokota22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/sokota22a.html},
  abstract = 	 {We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available—namely, they require balancing communication with the associated cost of communicating. We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME. Due to recent breakthroughs in approximation algorithms for minimum entropy coupling, MEME is not merely a theoretical algorithm, but can be applied to practical settings. Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs. To the latter point, we demonstrate that MEME is able to losslessly communicate binary images via trajectories of Cartpole and Pong, while simultaneously achieving the maximal or near maximal expected returns, and that it is even capable of performing well in the presence of actuator noise.}
}

Endnote

%0 Conference Paper
%T Communicating via Markov Decision Processes
%A Samuel Sokota
%A Christian A Schroeder De Witt
%A Maximilian Igl
%A Luisa M Zintgraf
%A Philip Torr
%A Martin Strohmeier
%A Zico Kolter
%A Shimon Whiteson
%A Jakob Foerster
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-sokota22a
%I PMLR
%P 20314--20328
%U https://proceedings.mlr.press/v162/sokota22a.html
%V 162
%X We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available—namely, they require balancing communication with the associated cost of communicating. We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME. Due to recent breakthroughs in approximation algorithms for minimum entropy coupling, MEME is not merely a theoretical algorithm, but can be applied to practical settings. Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs. To the latter point, we demonstrate that MEME is able to losslessly communicate binary images via trajectories of Cartpole and Pong, while simultaneously achieving the maximal or near maximal expected returns, and that it is even capable of performing well in the presence of actuator noise.

APA

Sokota, S., De Witt, C.A.S., Igl, M., Zintgraf, L.M., Torr, P., Strohmeier, M., Kolter, Z., Whiteson, S. & Foerster, J.. (2022). Communicating via Markov Decision Processes. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:20314-20328 Available from https://proceedings.mlr.press/v162/sokota22a.html.

Communicating via Markov Decision Processes

Abstract

Cite this Paper

Related Material