Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path

Muhammad Aneeq Uz Zaman, Alec Koppel, Sujay Bhatt, Tamer Basar
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:10178-10206, 2023.

Abstract

We consider online reinforcement learning in Mean-Field Games (MFGs). Unlike traditional approaches, we alleviate the need for a mean-field oracle by developing an algorithm that approximates the Mean-Field Equilibrium (MFE) using the single sample path of the generic agent. We call this Sandbox Learning, as it can be used as a warm-start for any agent learning in a multi-agent non-cooperative setting. We adopt a two time-scale approach in which an online fixed-point recursion for the mean-field operates on a slower time-scale, in tandem with a control policy update on a faster time-scale for the generic agent. Given that the underlying Markov Decision Process (MDP) of the agent is communicating, we provide finite sample convergence guarantees in terms of convergence of the mean-field and control policy to the mean-field equilibrium. The sample complexity of the Sandbox learning algorithm is $O(\epsilon^{-4})$ where $\epsilon$ is the MFE approximation error. This is similar to works which assume access to oracle. Finally, we empirically demonstrate the effectiveness of the sandbox learning algorithm in diverse scenarios, including those where the MDP does not necessarily have a single communicating class.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-zaman23a, title = {Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path}, author = {Zaman, Muhammad Aneeq Uz and Koppel, Alec and Bhatt, Sujay and Basar, Tamer}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {10178--10206}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/zaman23a/zaman23a.pdf}, url = {https://proceedings.mlr.press/v206/zaman23a.html}, abstract = {We consider online reinforcement learning in Mean-Field Games (MFGs). Unlike traditional approaches, we alleviate the need for a mean-field oracle by developing an algorithm that approximates the Mean-Field Equilibrium (MFE) using the single sample path of the generic agent. We call this Sandbox Learning, as it can be used as a warm-start for any agent learning in a multi-agent non-cooperative setting. We adopt a two time-scale approach in which an online fixed-point recursion for the mean-field operates on a slower time-scale, in tandem with a control policy update on a faster time-scale for the generic agent. Given that the underlying Markov Decision Process (MDP) of the agent is communicating, we provide finite sample convergence guarantees in terms of convergence of the mean-field and control policy to the mean-field equilibrium. The sample complexity of the Sandbox learning algorithm is $O(\epsilon^{-4})$ where $\epsilon$ is the MFE approximation error. This is similar to works which assume access to oracle. Finally, we empirically demonstrate the effectiveness of the sandbox learning algorithm in diverse scenarios, including those where the MDP does not necessarily have a single communicating class.} }
Endnote
%0 Conference Paper %T Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path %A Muhammad Aneeq Uz Zaman %A Alec Koppel %A Sujay Bhatt %A Tamer Basar %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-zaman23a %I PMLR %P 10178--10206 %U https://proceedings.mlr.press/v206/zaman23a.html %V 206 %X We consider online reinforcement learning in Mean-Field Games (MFGs). Unlike traditional approaches, we alleviate the need for a mean-field oracle by developing an algorithm that approximates the Mean-Field Equilibrium (MFE) using the single sample path of the generic agent. We call this Sandbox Learning, as it can be used as a warm-start for any agent learning in a multi-agent non-cooperative setting. We adopt a two time-scale approach in which an online fixed-point recursion for the mean-field operates on a slower time-scale, in tandem with a control policy update on a faster time-scale for the generic agent. Given that the underlying Markov Decision Process (MDP) of the agent is communicating, we provide finite sample convergence guarantees in terms of convergence of the mean-field and control policy to the mean-field equilibrium. The sample complexity of the Sandbox learning algorithm is $O(\epsilon^{-4})$ where $\epsilon$ is the MFE approximation error. This is similar to works which assume access to oracle. Finally, we empirically demonstrate the effectiveness of the sandbox learning algorithm in diverse scenarios, including those where the MDP does not necessarily have a single communicating class.
APA
Zaman, M.A.U., Koppel, A., Bhatt, S. & Basar, T.. (2023). Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:10178-10206 Available from https://proceedings.mlr.press/v206/zaman23a.html.

Related Material