A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach

Swetha Ganesh; Washim Uddin Mondal; Vaneet Aggarwal

A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach

Swetha Ganesh, Washim Uddin Mondal, Vaneet Aggarwal

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:18206-18227, 2025.

Abstract

This work examines average-reward reinforcement learning with general policy parametrization. Existing state-of-the-art (SOTA) guarantees for this problem are either suboptimal or hindered by several challenges, including poor scalability with respect to the size of the state-action space, high iteration complexity, and a significant dependence on knowledge of mixing times and hitting times. To address these limitations, we propose a Multi-level Monte Carlo-based Natural Actor-Critic (MLMC-NAC) algorithm. Our work is the first to achieve a global convergence rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$ for average-reward Markov Decision Processes (MDPs) (where $T$ is the horizon length), using an Actor-Critic approach. Moreover, the convergence rate does not scale with the size of the state space, therefore even being applicable to infinite state spaces.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-ganesh25b,
  title = 	 {A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach},
  author =       {Ganesh, Swetha and Mondal, Washim Uddin and Aggarwal, Vaneet},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {18206--18227},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/ganesh25b/ganesh25b.pdf},
  url = 	 {https://proceedings.mlr.press/v267/ganesh25b.html},
  abstract = 	 {This work examines average-reward reinforcement learning with general policy parametrization. Existing state-of-the-art (SOTA) guarantees for this problem are either suboptimal or hindered by several challenges, including poor scalability with respect to the size of the state-action space, high iteration complexity, and a significant dependence on knowledge of mixing times and hitting times. To address these limitations, we propose a Multi-level Monte Carlo-based Natural Actor-Critic (MLMC-NAC) algorithm. Our work is the first to achieve a global convergence rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$ for average-reward Markov Decision Processes (MDPs) (where $T$ is the horizon length), using an Actor-Critic approach. Moreover, the convergence rate does not scale with the size of the state space, therefore even being applicable to infinite state spaces.}
}

Endnote

%0 Conference Paper
%T A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach
%A Swetha Ganesh
%A Washim Uddin Mondal
%A Vaneet Aggarwal
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-ganesh25b
%I PMLR
%P 18206--18227
%U https://proceedings.mlr.press/v267/ganesh25b.html
%V 267
%X This work examines average-reward reinforcement learning with general policy parametrization. Existing state-of-the-art (SOTA) guarantees for this problem are either suboptimal or hindered by several challenges, including poor scalability with respect to the size of the state-action space, high iteration complexity, and a significant dependence on knowledge of mixing times and hitting times. To address these limitations, we propose a Multi-level Monte Carlo-based Natural Actor-Critic (MLMC-NAC) algorithm. Our work is the first to achieve a global convergence rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$ for average-reward Markov Decision Processes (MDPs) (where $T$ is the horizon length), using an Actor-Critic approach. Moreover, the convergence rate does not scale with the size of the state space, therefore even being applicable to infinite state spaces.

APA

Ganesh, S., Mondal, W.U. & Aggarwal, V.. (2025). A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:18206-18227 Available from https://proceedings.mlr.press/v267/ganesh25b.html.

A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach

Abstract

Cite this Paper

Related Material