Accelerating Model-Free Policy Optimization Using Model-Based Gradient: A Composite Optimization Perspective

Yansong Li, Shuo Han
Proceedings of The 4th Annual Learning for Dynamics and Control Conference, PMLR 168:304-315, 2022.

Abstract

We develop an algorithm that combines model-based and model-free methods for solving a nonlinear optimal control problem with a quadratic cost in which the system model is given by a linear state-space model with a small additive nonlinear perturbation. We decompose the cost into a sum of two functions, one having an explicit form obtained from the approximate linear model, the other being a black-box model representing the unknown modeling error. The decomposition allows us to formulate the problem as a composite optimization problem. To solve the optimization problem, our algorithm performs gradient descent using the gradient obtained from the approximate linear model until backtracking line search fails, upon which the model-based gradient is compared with the exact gradient obtained from a model-free algorithm. The difference between the model gradient and the exact gradient is then used for compensating future gradient-based updates. Our algorithm is shown to decrease the number of function evaluations compared with traditional model-free methods both in theory and in practice.

Cite this Paper


BibTeX
@InProceedings{pmlr-v168-li22a, title = {Accelerating Model-Free Policy Optimization Using Model-Based Gradient: A Composite Optimization Perspective}, author = {Li, Yansong and Han, Shuo}, booktitle = {Proceedings of The 4th Annual Learning for Dynamics and Control Conference}, pages = {304--315}, year = {2022}, editor = {Firoozi, Roya and Mehr, Negar and Yel, Esen and Antonova, Rika and Bohg, Jeannette and Schwager, Mac and Kochenderfer, Mykel}, volume = {168}, series = {Proceedings of Machine Learning Research}, month = {23--24 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v168/li22a/li22a.pdf}, url = {https://proceedings.mlr.press/v168/li22a.html}, abstract = {We develop an algorithm that combines model-based and model-free methods for solving a nonlinear optimal control problem with a quadratic cost in which the system model is given by a linear state-space model with a small additive nonlinear perturbation. We decompose the cost into a sum of two functions, one having an explicit form obtained from the approximate linear model, the other being a black-box model representing the unknown modeling error. The decomposition allows us to formulate the problem as a composite optimization problem. To solve the optimization problem, our algorithm performs gradient descent using the gradient obtained from the approximate linear model until backtracking line search fails, upon which the model-based gradient is compared with the exact gradient obtained from a model-free algorithm. The difference between the model gradient and the exact gradient is then used for compensating future gradient-based updates. Our algorithm is shown to decrease the number of function evaluations compared with traditional model-free methods both in theory and in practice.} }
Endnote
%0 Conference Paper %T Accelerating Model-Free Policy Optimization Using Model-Based Gradient: A Composite Optimization Perspective %A Yansong Li %A Shuo Han %B Proceedings of The 4th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2022 %E Roya Firoozi %E Negar Mehr %E Esen Yel %E Rika Antonova %E Jeannette Bohg %E Mac Schwager %E Mykel Kochenderfer %F pmlr-v168-li22a %I PMLR %P 304--315 %U https://proceedings.mlr.press/v168/li22a.html %V 168 %X We develop an algorithm that combines model-based and model-free methods for solving a nonlinear optimal control problem with a quadratic cost in which the system model is given by a linear state-space model with a small additive nonlinear perturbation. We decompose the cost into a sum of two functions, one having an explicit form obtained from the approximate linear model, the other being a black-box model representing the unknown modeling error. The decomposition allows us to formulate the problem as a composite optimization problem. To solve the optimization problem, our algorithm performs gradient descent using the gradient obtained from the approximate linear model until backtracking line search fails, upon which the model-based gradient is compared with the exact gradient obtained from a model-free algorithm. The difference between the model gradient and the exact gradient is then used for compensating future gradient-based updates. Our algorithm is shown to decrease the number of function evaluations compared with traditional model-free methods both in theory and in practice.
APA
Li, Y. & Han, S.. (2022). Accelerating Model-Free Policy Optimization Using Model-Based Gradient: A Composite Optimization Perspective. Proceedings of The 4th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 168:304-315 Available from https://proceedings.mlr.press/v168/li22a.html.

Related Material