Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean Field Games

Antonio Ocello, Daniil Tiapkin, Lorenzo Mancini, Mathieu Lauriere, Eric Moulines
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:46897-46942, 2025.

Abstract

We introduce Mean Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean Field Games (MFGs) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting, we extend its methodology to the MFG framework, leveraging its stability and robustness in policy optimization. Under standard assumptions in the MFG literature, we provide a rigorous analysis of MF-TRPO, establishing theoretical guarantees on its convergence. Our results cover both the exact formulation of the algorithm and its sample-based counterpart, where we derive high-probability guarantees and finite sample complexity. This work advances MFG optimization by bridging RL techniques with mean-field decision-making, offering a theoretically grounded approach to solving complex multi-agent problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-ocello25a, title = {Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean Field Games}, author = {Ocello, Antonio and Tiapkin, Daniil and Mancini, Lorenzo and Lauriere, Mathieu and Moulines, Eric}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {46897--46942}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/ocello25a/ocello25a.pdf}, url = {https://proceedings.mlr.press/v267/ocello25a.html}, abstract = {We introduce Mean Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean Field Games (MFGs) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting, we extend its methodology to the MFG framework, leveraging its stability and robustness in policy optimization. Under standard assumptions in the MFG literature, we provide a rigorous analysis of MF-TRPO, establishing theoretical guarantees on its convergence. Our results cover both the exact formulation of the algorithm and its sample-based counterpart, where we derive high-probability guarantees and finite sample complexity. This work advances MFG optimization by bridging RL techniques with mean-field decision-making, offering a theoretically grounded approach to solving complex multi-agent problems.} }
Endnote
%0 Conference Paper %T Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean Field Games %A Antonio Ocello %A Daniil Tiapkin %A Lorenzo Mancini %A Mathieu Lauriere %A Eric Moulines %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-ocello25a %I PMLR %P 46897--46942 %U https://proceedings.mlr.press/v267/ocello25a.html %V 267 %X We introduce Mean Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean Field Games (MFGs) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting, we extend its methodology to the MFG framework, leveraging its stability and robustness in policy optimization. Under standard assumptions in the MFG literature, we provide a rigorous analysis of MF-TRPO, establishing theoretical guarantees on its convergence. Our results cover both the exact formulation of the algorithm and its sample-based counterpart, where we derive high-probability guarantees and finite sample complexity. This work advances MFG optimization by bridging RL techniques with mean-field decision-making, offering a theoretically grounded approach to solving complex multi-agent problems.
APA
Ocello, A., Tiapkin, D., Mancini, L., Lauriere, M. & Moulines, E.. (2025). Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean Field Games. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:46897-46942 Available from https://proceedings.mlr.press/v267/ocello25a.html.

Related Material