Learning While Playing in Mean-Field Games: Convergence and Optimality

Qiaomin Xie; Zhuoran Yang; Zhaoran Wang; Andreea Minca

Learning While Playing in Mean-Field Games: Convergence and Optimality

Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11436-11447, 2021.

Abstract

We study reinforcement learning in mean-field games. To achieve the Nash equilibrium, which consists of a policy and a mean-field state, existing algorithms require obtaining the optimal policy while fixing any mean-field state. In practice, however, the policy and the mean-field state evolve simultaneously, as each agent is learning while playing. To bridge such a gap, we propose a fictitious play algorithm, which alternatively updates the policy (learning) and the mean-field state (playing) by one step of policy optimization and gradient descent, respectively. Despite the nonstationarity induced by such an alternating scheme, we prove that the proposed algorithm converges to the Nash equilibrium with an explicit convergence rate. To the best of our knowledge, it is the first provably efficient algorithm that achieves learning while playing via alternating updates.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-xie21g,
  title = 	 {Learning While Playing in Mean-Field Games: Convergence and Optimality},
  author =       {Xie, Qiaomin and Yang, Zhuoran and Wang, Zhaoran and Minca, Andreea},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {11436--11447},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/xie21g/xie21g.pdf},
  url = 	 {https://proceedings.mlr.press/v139/xie21g.html},
  abstract = 	 {We study reinforcement learning in mean-field games. To achieve the Nash equilibrium, which consists of a policy and a mean-field state, existing algorithms require obtaining the optimal policy while fixing any mean-field state. In practice, however, the policy and the mean-field state evolve simultaneously, as each agent is learning while playing. To bridge such a gap, we propose a fictitious play algorithm, which alternatively updates the policy (learning) and the mean-field state (playing) by one step of policy optimization and gradient descent, respectively. Despite the nonstationarity induced by such an alternating scheme, we prove that the proposed algorithm converges to the Nash equilibrium with an explicit convergence rate. To the best of our knowledge, it is the first provably efficient algorithm that achieves learning while playing via alternating updates.}
}

Endnote

%0 Conference Paper
%T Learning While Playing in Mean-Field Games: Convergence and Optimality
%A Qiaomin Xie
%A Zhuoran Yang
%A Zhaoran Wang
%A Andreea Minca
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-xie21g
%I PMLR
%P 11436--11447
%U https://proceedings.mlr.press/v139/xie21g.html
%V 139
%X We study reinforcement learning in mean-field games. To achieve the Nash equilibrium, which consists of a policy and a mean-field state, existing algorithms require obtaining the optimal policy while fixing any mean-field state. In practice, however, the policy and the mean-field state evolve simultaneously, as each agent is learning while playing. To bridge such a gap, we propose a fictitious play algorithm, which alternatively updates the policy (learning) and the mean-field state (playing) by one step of policy optimization and gradient descent, respectively. Despite the nonstationarity induced by such an alternating scheme, we prove that the proposed algorithm converges to the Nash equilibrium with an explicit convergence rate. To the best of our knowledge, it is the first provably efficient algorithm that achieves learning while playing via alternating updates.

APA

Xie, Q., Yang, Z., Wang, Z. & Minca, A.. (2021). Learning While Playing in Mean-Field Games: Convergence and Optimality. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:11436-11447 Available from https://proceedings.mlr.press/v139/xie21g.html.

Learning While Playing in Mean-Field Games: Convergence and Optimality

Abstract

Cite this Paper

Related Material