Adaptive Variants of Optimal Feedback Policies

Brett Lopez, Jean-Jacques Slotine
Proceedings of The 4th Annual Learning for Dynamics and Control Conference, PMLR 168:1125-1136, 2022.

Abstract

The stable combination of optimal feedback policies with online learning is studied in a new control-theoretic framework for uncertain nonlinear systems. The framework can be systematically used in transfer learning and sim-to-real applications, where an optimal policy learned for a nominal system needs to remain effective in the presence of significant variations in parameters. Given unknown parameters within a bounded range, the resulting adaptive control laws guarantee convergence of the closed-loop system to the state of zero cost. Online adjustment of the learning rate is used as a key stability mechanism, and preserves certainty equivalence when designing optimal policies. The approach is illustrated on the familiar mountain car problem, where it yields near-optimal performance despite the presence of parametric model uncertainty.

Cite this Paper


BibTeX
@InProceedings{pmlr-v168-lopez22a, title = {Adaptive Variants of Optimal Feedback Policies}, author = {Lopez, Brett and Slotine, Jean-Jacques}, booktitle = {Proceedings of The 4th Annual Learning for Dynamics and Control Conference}, pages = {1125--1136}, year = {2022}, editor = {Firoozi, Roya and Mehr, Negar and Yel, Esen and Antonova, Rika and Bohg, Jeannette and Schwager, Mac and Kochenderfer, Mykel}, volume = {168}, series = {Proceedings of Machine Learning Research}, month = {23--24 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v168/lopez22a/lopez22a.pdf}, url = {https://proceedings.mlr.press/v168/lopez22a.html}, abstract = {The stable combination of optimal feedback policies with online learning is studied in a new control-theoretic framework for uncertain nonlinear systems. The framework can be systematically used in transfer learning and sim-to-real applications, where an optimal policy learned for a nominal system needs to remain effective in the presence of significant variations in parameters. Given unknown parameters within a bounded range, the resulting adaptive control laws guarantee convergence of the closed-loop system to the state of zero cost. Online adjustment of the learning rate is used as a key stability mechanism, and preserves certainty equivalence when designing optimal policies. The approach is illustrated on the familiar mountain car problem, where it yields near-optimal performance despite the presence of parametric model uncertainty.} }
Endnote
%0 Conference Paper %T Adaptive Variants of Optimal Feedback Policies %A Brett Lopez %A Jean-Jacques Slotine %B Proceedings of The 4th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2022 %E Roya Firoozi %E Negar Mehr %E Esen Yel %E Rika Antonova %E Jeannette Bohg %E Mac Schwager %E Mykel Kochenderfer %F pmlr-v168-lopez22a %I PMLR %P 1125--1136 %U https://proceedings.mlr.press/v168/lopez22a.html %V 168 %X The stable combination of optimal feedback policies with online learning is studied in a new control-theoretic framework for uncertain nonlinear systems. The framework can be systematically used in transfer learning and sim-to-real applications, where an optimal policy learned for a nominal system needs to remain effective in the presence of significant variations in parameters. Given unknown parameters within a bounded range, the resulting adaptive control laws guarantee convergence of the closed-loop system to the state of zero cost. Online adjustment of the learning rate is used as a key stability mechanism, and preserves certainty equivalence when designing optimal policies. The approach is illustrated on the familiar mountain car problem, where it yields near-optimal performance despite the presence of parametric model uncertainty.
APA
Lopez, B. & Slotine, J.. (2022). Adaptive Variants of Optimal Feedback Policies. Proceedings of The 4th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 168:1125-1136 Available from https://proceedings.mlr.press/v168/lopez22a.html.

Related Material