Off Policy Lyapunov Stability in Reinforcement Learning

Sarvan Gill; Daniela Constantinescu

Off Policy Lyapunov Stability in Reinforcement Learning

Sarvan Gill, Daniela Constantinescu

Proceedings of The 9th Conference on Robot Learning, PMLR 305:4093-4102, 2025.

Abstract

Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.

Cite this Paper

BibTeX

@InProceedings{pmlr-v305-gill25a,
  title = 	 {Off Policy Lyapunov Stability in Reinforcement Learning},
  author =       {Gill, Sarvan and Constantinescu, Daniela},
  booktitle = 	 {Proceedings of The 9th Conference on Robot Learning},
  pages = 	 {4093--4102},
  year = 	 {2025},
  editor = 	 {Lim, Joseph and Song, Shuran and Park, Hae-Won},
  volume = 	 {305},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27--30 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v305/main/assets/gill25a/gill25a.pdf},
  url = 	 {https://proceedings.mlr.press/v305/gill25a.html},
  abstract = 	 {Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.}
}

Endnote

%0 Conference Paper
%T Off Policy Lyapunov Stability in Reinforcement Learning
%A Sarvan Gill
%A Daniela Constantinescu
%B Proceedings of The 9th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Joseph Lim
%E Shuran Song
%E Hae-Won Park	
%F pmlr-v305-gill25a
%I PMLR
%P 4093--4102
%U https://proceedings.mlr.press/v305/gill25a.html
%V 305
%X Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.

APA

Gill, S. & Constantinescu, D.. (2025). Off Policy Lyapunov Stability in Reinforcement Learning. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:4093-4102 Available from https://proceedings.mlr.press/v305/gill25a.html.

Off Policy Lyapunov Stability in Reinforcement Learning

Abstract

Cite this Paper

Related Material