Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis

Sihan Zeng; Sujay Bhatt; Alec Koppel; Sumitra Ganesh

Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis

Sihan Zeng, Sujay Bhatt, Alec Koppel, Sumitra Ganesh

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:343-351, 2025.

Abstract

We consider discrete-time stationary mean field games (MFG) with unknown dynamics and design algorithms for finding the equilibrium with finite-time complexity guarantees. Prior solutions to the problem assume either the contraction of a mean field optimality-consistency operator or strict weak monotonicity, which may be overly restrictive. In this work, we introduce a new class of solvable MFGs, named the "fully herding class", which expands the known solvable class of MFGs and for the first time includes problems with multiple equilibria. We propose a direct policy optimization method, Accelerated Single-loop Actor Critic Algorithm for Mean Field Games (ASAC-MFG), that provably finds a global equilibrium for MFGs within this class, under suitable access to a single trajectory of Markovian samples. Different from the prior methods, ASAC-MFG is single-loop and single-sample-path. We establish the finite-time and finite-sample convergence of ASAC-MFG to a mean field equilibrium via new techniques that we develop for multi-time-scale stochastic approximation. We support the theoretical results with illustrative numerical simulations. When the mean field does not affect the transition and reward, a MFG reduces to a Markov decision process (MDP) and ASAC-MFG becomes an actor-critic algorithm for finding the optimal policy in average-reward MDPs, with a sample complexity matching the state-of-the-art. Previous works derive the complexity assuming a contraction on the Bellman operator, which is invalid for average-reward MDPs. We match the rate while removing the untenable assumption through an improved Lyapunov function.

Cite this Paper

BibTeX

@InProceedings{pmlr-v258-zeng25a,
  title = 	 {Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis},
  author =       {Zeng, Sihan and Bhatt, Sujay and Koppel, Alec and Ganesh, Sumitra},
  booktitle = 	 {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {343--351},
  year = 	 {2025},
  editor = 	 {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz},
  volume = 	 {258},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--05 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v258/main/assets/zeng25a/zeng25a.pdf},
  url = 	 {https://proceedings.mlr.press/v258/zeng25a.html},
  abstract = 	 {We consider discrete-time stationary mean field games (MFG) with unknown dynamics and design algorithms for finding the equilibrium with finite-time complexity guarantees. Prior solutions to the problem assume either the contraction of a mean field optimality-consistency operator or strict weak monotonicity, which may be overly restrictive. In this work, we introduce a new class of solvable MFGs, named the "fully herding class", which expands the known solvable class of MFGs and for the first time includes problems with multiple equilibria. We propose a direct policy optimization method, Accelerated Single-loop Actor Critic Algorithm for Mean Field Games (ASAC-MFG), that provably finds a global equilibrium for MFGs within this class, under suitable access to a single trajectory of Markovian samples. Different from the prior methods, ASAC-MFG is single-loop and single-sample-path. We establish the finite-time and finite-sample convergence of ASAC-MFG to a mean field equilibrium via new techniques that we develop for multi-time-scale stochastic approximation. We support the theoretical results with illustrative numerical simulations. When the mean field does not affect the transition and reward, a MFG reduces to a Markov decision process (MDP) and ASAC-MFG becomes an actor-critic algorithm for finding the optimal policy in average-reward MDPs, with a sample complexity matching the state-of-the-art. Previous works derive the complexity assuming a contraction on the Bellman operator, which is invalid for average-reward MDPs. We match the rate while removing the untenable assumption through an improved Lyapunov function.}
}

Endnote

%0 Conference Paper
%T Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis
%A Sihan Zeng
%A Sujay Bhatt
%A Alec Koppel
%A Sumitra Ganesh
%B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2025
%E Yingzhen Li
%E Stephan Mandt
%E Shipra Agrawal
%E Emtiyaz Khan	
%F pmlr-v258-zeng25a
%I PMLR
%P 343--351
%U https://proceedings.mlr.press/v258/zeng25a.html
%V 258
%X We consider discrete-time stationary mean field games (MFG) with unknown dynamics and design algorithms for finding the equilibrium with finite-time complexity guarantees. Prior solutions to the problem assume either the contraction of a mean field optimality-consistency operator or strict weak monotonicity, which may be overly restrictive. In this work, we introduce a new class of solvable MFGs, named the "fully herding class", which expands the known solvable class of MFGs and for the first time includes problems with multiple equilibria. We propose a direct policy optimization method, Accelerated Single-loop Actor Critic Algorithm for Mean Field Games (ASAC-MFG), that provably finds a global equilibrium for MFGs within this class, under suitable access to a single trajectory of Markovian samples. Different from the prior methods, ASAC-MFG is single-loop and single-sample-path. We establish the finite-time and finite-sample convergence of ASAC-MFG to a mean field equilibrium via new techniques that we develop for multi-time-scale stochastic approximation. We support the theoretical results with illustrative numerical simulations. When the mean field does not affect the transition and reward, a MFG reduces to a Markov decision process (MDP) and ASAC-MFG becomes an actor-critic algorithm for finding the optimal policy in average-reward MDPs, with a sample complexity matching the state-of-the-art. Previous works derive the complexity assuming a contraction on the Bellman operator, which is invalid for average-reward MDPs. We match the rate while removing the untenable assumption through an improved Lyapunov function.

APA

Zeng, S., Bhatt, S., Koppel, A. & Ganesh, S.. (2025). Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:343-351 Available from https://proceedings.mlr.press/v258/zeng25a.html.

Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis

Abstract

Cite this Paper

Related Material