Adaptive Variants of Optimal Feedback Policies

Brett Lopez; Jean-Jacques Slotine

Adaptive Variants of Optimal Feedback Policies

Brett Lopez, Jean-Jacques Slotine

Proceedings of The 4th Annual Learning for Dynamics and Control Conference, PMLR 168:1125-1136, 2022.

Abstract

The stable combination of optimal feedback policies with online learning is studied in a new control-theoretic framework for uncertain nonlinear systems. The framework can be systematically used in transfer learning and sim-to-real applications, where an optimal policy learned for a nominal system needs to remain effective in the presence of significant variations in parameters. Given unknown parameters within a bounded range, the resulting adaptive control laws guarantee convergence of the closed-loop system to the state of zero cost. Online adjustment of the learning rate is used as a key stability mechanism, and preserves certainty equivalence when designing optimal policies. The approach is illustrated on the familiar mountain car problem, where it yields near-optimal performance despite the presence of parametric model uncertainty.

Cite this Paper

BibTeX


@InProceedings{pmlr-v168-lopez22a,
  title = 	 {Adaptive Variants of Optimal Feedback Policies},
  author =       {Lopez, Brett and Slotine, Jean-Jacques},
  booktitle = 	 {Proceedings of The 4th Annual Learning for Dynamics and Control Conference},
  pages = 	 {1125--1136},
  year = 	 {2022},
  editor = 	 {Firoozi, Roya and Mehr, Negar and Yel, Esen and Antonova, Rika and Bohg, Jeannette and Schwager, Mac and Kochenderfer, Mykel},
  volume = 	 {168},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v168/lopez22a/lopez22a.pdf},
  url = 	 {https://proceedings.mlr.press/v168/lopez22a.html},
  abstract = 	 {The stable combination of optimal feedback policies with online learning is studied in a new control-theoretic framework for uncertain nonlinear systems. The framework can be systematically used in transfer learning and sim-to-real applications, where an optimal policy learned for a nominal system needs to remain effective in the presence of significant variations in parameters. Given unknown parameters within a bounded range, the resulting adaptive control laws guarantee convergence of the closed-loop system to the state of zero cost. Online adjustment of the learning rate is used as a key stability mechanism, and preserves certainty equivalence when designing optimal policies. The approach is illustrated on the familiar mountain car problem, where it yields near-optimal performance despite the presence of parametric model uncertainty.}
}

Endnote

%0 Conference Paper
%T Adaptive Variants of Optimal Feedback Policies
%A Brett Lopez
%A Jean-Jacques Slotine
%B Proceedings of The 4th Annual Learning for Dynamics and Control Conference
%C Proceedings of Machine Learning Research
%D 2022
%E Roya Firoozi
%E Negar Mehr
%E Esen Yel
%E Rika Antonova
%E Jeannette Bohg
%E Mac Schwager
%E Mykel Kochenderfer	
%F pmlr-v168-lopez22a
%I PMLR
%P 1125--1136
%U https://proceedings.mlr.press/v168/lopez22a.html
%V 168
%X The stable combination of optimal feedback policies with online learning is studied in a new control-theoretic framework for uncertain nonlinear systems. The framework can be systematically used in transfer learning and sim-to-real applications, where an optimal policy learned for a nominal system needs to remain effective in the presence of significant variations in parameters. Given unknown parameters within a bounded range, the resulting adaptive control laws guarantee convergence of the closed-loop system to the state of zero cost. Online adjustment of the learning rate is used as a key stability mechanism, and preserves certainty equivalence when designing optimal policies. The approach is illustrated on the familiar mountain car problem, where it yields near-optimal performance despite the presence of parametric model uncertainty.

APA


Lopez, B. & Slotine, J.. (2022). Adaptive Variants of Optimal Feedback Policies. Proceedings of The 4th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 168:1125-1136 Available from https://proceedings.mlr.press/v168/lopez22a.html.

Related Material

Download PDF