WAVE: Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement Learning

Ali Baheri, Zahra Shahrooei, Chirayu Salgarkar
Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:920-931, 2025.

Abstract

We present WAVE (Wasserstein Adaptive Value Estimation for Actor-Critic), an approach to enhance stability in deep reinforcement learning through adaptive Wasserstein regularization. Our method addresses the inherent instability of actor-critic algorithms by incorporating an adaptively weighted Wasserstein regularization term into the critic’s loss function. We prove that WAVE achieves $\mathcal{O}\left(\frac{1}{k}\right)$ convergence rate for the critic’s mean squared error and provide theoretical guarantees for stability through Wasserstein-based regularization. Using the Sinkhorn approximation for computational efficiency, our approach automatically adjusts the regularization based on the agent’s performance. Theoretical analysis and experimental results demonstrate that WAVE achieves superior performance compared to standard actor-critic methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v283-baheri25a, title = {WAVE: Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement Learning}, author = {Baheri, Ali and Shahrooei, Zahra and Salgarkar, Chirayu}, booktitle = {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference}, pages = {920--931}, year = {2025}, editor = {Ozay, Necmiye and Balzano, Laura and Panagou, Dimitra and Abate, Alessandro}, volume = {283}, series = {Proceedings of Machine Learning Research}, month = {04--06 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v283/main/assets/baheri25a/baheri25a.pdf}, url = {https://proceedings.mlr.press/v283/baheri25a.html}, abstract = {We present WAVE (Wasserstein Adaptive Value Estimation for Actor-Critic), an approach to enhance stability in deep reinforcement learning through adaptive Wasserstein regularization. Our method addresses the inherent instability of actor-critic algorithms by incorporating an adaptively weighted Wasserstein regularization term into the critic’s loss function. We prove that WAVE achieves $\mathcal{O}\left(\frac{1}{k}\right)$ convergence rate for the critic’s mean squared error and provide theoretical guarantees for stability through Wasserstein-based regularization. Using the Sinkhorn approximation for computational efficiency, our approach automatically adjusts the regularization based on the agent’s performance. Theoretical analysis and experimental results demonstrate that WAVE achieves superior performance compared to standard actor-critic methods.} }
Endnote
%0 Conference Paper %T WAVE: Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement Learning %A Ali Baheri %A Zahra Shahrooei %A Chirayu Salgarkar %B Proceedings of the 7th Annual Learning for Dynamics \& Control Conference %C Proceedings of Machine Learning Research %D 2025 %E Necmiye Ozay %E Laura Balzano %E Dimitra Panagou %E Alessandro Abate %F pmlr-v283-baheri25a %I PMLR %P 920--931 %U https://proceedings.mlr.press/v283/baheri25a.html %V 283 %X We present WAVE (Wasserstein Adaptive Value Estimation for Actor-Critic), an approach to enhance stability in deep reinforcement learning through adaptive Wasserstein regularization. Our method addresses the inherent instability of actor-critic algorithms by incorporating an adaptively weighted Wasserstein regularization term into the critic’s loss function. We prove that WAVE achieves $\mathcal{O}\left(\frac{1}{k}\right)$ convergence rate for the critic’s mean squared error and provide theoretical guarantees for stability through Wasserstein-based regularization. Using the Sinkhorn approximation for computational efficiency, our approach automatically adjusts the regularization based on the agent’s performance. Theoretical analysis and experimental results demonstrate that WAVE achieves superior performance compared to standard actor-critic methods.
APA
Baheri, A., Shahrooei, Z. & Salgarkar, C.. (2025). WAVE: Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement Learning. Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, in Proceedings of Machine Learning Research 283:920-931 Available from https://proceedings.mlr.press/v283/baheri25a.html.

Related Material