Off Policy Lyapunov Stability in Reinforcement Learning

Sarvan Gill, Daniela Constantinescu
Proceedings of The 9th Conference on Robot Learning, PMLR 305:4093-4102, 2025.

Abstract

Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-gill25a, title = {Off Policy Lyapunov Stability in Reinforcement Learning}, author = {Gill, Sarvan and Constantinescu, Daniela}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {4093--4102}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/gill25a/gill25a.pdf}, url = {https://proceedings.mlr.press/v305/gill25a.html}, abstract = {Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.} }
Endnote
%0 Conference Paper %T Off Policy Lyapunov Stability in Reinforcement Learning %A Sarvan Gill %A Daniela Constantinescu %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-gill25a %I PMLR %P 4093--4102 %U https://proceedings.mlr.press/v305/gill25a.html %V 305 %X Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.
APA
Gill, S. & Constantinescu, D.. (2025). Off Policy Lyapunov Stability in Reinforcement Learning. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:4093-4102 Available from https://proceedings.mlr.press/v305/gill25a.html.

Related Material