A Policy Optimization Method Towards Optimal-time Stability

Shengjie Wang, Lan Fengb, Xiang Zheng, Yuxue Cao, Oluwatosin OluwaPelumi Oseni, Haotian Xu, Tao Zhang, Yang Gao
Proceedings of The 7th Conference on Robot Learning, PMLR 229:1154-1182, 2023.

Abstract

In current model-free reinforcement learning (RL) algorithms, stability criteria based on sampling methods are commonly utilized to guide policy optimization. However, these criteria only guarantee the infinite-time convergence of the system’s state to an equilibrium point, which leads to sub-optimality of the policy. In this paper, we propose a policy optimization technique incorporating sampling-based Lyapunov stability. Our approach enables the system’s state to reach an equilibrium point within an optimal time and maintain stability thereafter, referred to as "optimal-time stability". To achieve this, we integrate the optimization method into the Actor-Critic framework, resulting in the development of the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic tasks, our approach outperforms previous studies significantly, effectively guiding the system to generate stable patterns.

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-wang23d, title = {A Policy Optimization Method Towards Optimal-time Stability}, author = {Wang, Shengjie and Fengb, Lan and Zheng, Xiang and Cao, Yuxue and Oseni, Oluwatosin OluwaPelumi and Xu, Haotian and Zhang, Tao and Gao, Yang}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {1154--1182}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/wang23d/wang23d.pdf}, url = {https://proceedings.mlr.press/v229/wang23d.html}, abstract = {In current model-free reinforcement learning (RL) algorithms, stability criteria based on sampling methods are commonly utilized to guide policy optimization. However, these criteria only guarantee the infinite-time convergence of the system’s state to an equilibrium point, which leads to sub-optimality of the policy. In this paper, we propose a policy optimization technique incorporating sampling-based Lyapunov stability. Our approach enables the system’s state to reach an equilibrium point within an optimal time and maintain stability thereafter, referred to as "optimal-time stability". To achieve this, we integrate the optimization method into the Actor-Critic framework, resulting in the development of the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic tasks, our approach outperforms previous studies significantly, effectively guiding the system to generate stable patterns.} }
Endnote
%0 Conference Paper %T A Policy Optimization Method Towards Optimal-time Stability %A Shengjie Wang %A Lan Fengb %A Xiang Zheng %A Yuxue Cao %A Oluwatosin OluwaPelumi Oseni %A Haotian Xu %A Tao Zhang %A Yang Gao %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-wang23d %I PMLR %P 1154--1182 %U https://proceedings.mlr.press/v229/wang23d.html %V 229 %X In current model-free reinforcement learning (RL) algorithms, stability criteria based on sampling methods are commonly utilized to guide policy optimization. However, these criteria only guarantee the infinite-time convergence of the system’s state to an equilibrium point, which leads to sub-optimality of the policy. In this paper, we propose a policy optimization technique incorporating sampling-based Lyapunov stability. Our approach enables the system’s state to reach an equilibrium point within an optimal time and maintain stability thereafter, referred to as "optimal-time stability". To achieve this, we integrate the optimization method into the Actor-Critic framework, resulting in the development of the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic tasks, our approach outperforms previous studies significantly, effectively guiding the system to generate stable patterns.
APA
Wang, S., Fengb, L., Zheng, X., Cao, Y., Oseni, O.O., Xu, H., Zhang, T. & Gao, Y.. (2023). A Policy Optimization Method Towards Optimal-time Stability. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:1154-1182 Available from https://proceedings.mlr.press/v229/wang23d.html.

Related Material