Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization

Mudit Gaur, Amrit Bedi, Di Wang, Vaneet Aggarwal
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:15153-15179, 2024.

Abstract

The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: Multi-layer neural network parametrization for actor/critic, Markovian sampling, Continuous state-action spaces, the performance of the Last iterate, and Global optimality. These aspects are practically significant and have been largely overlooked in existing theoretical analyses of AC algorithms. In this work, we address these gaps by providing the first comprehensive theoretical analysis of AC algorithms that encompasses all five crucial practical aspects (covers MMCLG criteria). We establish global convergence sample complexity bounds of $\tilde{\mathcal{O}}\left( \epsilon^{-3} \right)$. We achieve this result through our novel use of the weak gradient domination property of MDP’s and our unique analysis of the error in critic estimation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-gaur24a, title = {Closing the Gap: Achieving Global Convergence ({L}ast Iterate) of Actor-Critic under {M}arkovian Sampling with Neural Network Parametrization}, author = {Gaur, Mudit and Bedi, Amrit and Wang, Di and Aggarwal, Vaneet}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {15153--15179}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/gaur24a/gaur24a.pdf}, url = {https://proceedings.mlr.press/v235/gaur24a.html}, abstract = {The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: Multi-layer neural network parametrization for actor/critic, Markovian sampling, Continuous state-action spaces, the performance of the Last iterate, and Global optimality. These aspects are practically significant and have been largely overlooked in existing theoretical analyses of AC algorithms. In this work, we address these gaps by providing the first comprehensive theoretical analysis of AC algorithms that encompasses all five crucial practical aspects (covers MMCLG criteria). We establish global convergence sample complexity bounds of $\tilde{\mathcal{O}}\left( \epsilon^{-3} \right)$. We achieve this result through our novel use of the weak gradient domination property of MDP’s and our unique analysis of the error in critic estimation.} }
Endnote
%0 Conference Paper %T Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization %A Mudit Gaur %A Amrit Bedi %A Di Wang %A Vaneet Aggarwal %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-gaur24a %I PMLR %P 15153--15179 %U https://proceedings.mlr.press/v235/gaur24a.html %V 235 %X The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: Multi-layer neural network parametrization for actor/critic, Markovian sampling, Continuous state-action spaces, the performance of the Last iterate, and Global optimality. These aspects are practically significant and have been largely overlooked in existing theoretical analyses of AC algorithms. In this work, we address these gaps by providing the first comprehensive theoretical analysis of AC algorithms that encompasses all five crucial practical aspects (covers MMCLG criteria). We establish global convergence sample complexity bounds of $\tilde{\mathcal{O}}\left( \epsilon^{-3} \right)$. We achieve this result through our novel use of the weak gradient domination property of MDP’s and our unique analysis of the error in critic estimation.
APA
Gaur, M., Bedi, A., Wang, D. & Aggarwal, V.. (2024). Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:15153-15179 Available from https://proceedings.mlr.press/v235/gaur24a.html.

Related Material