HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints

Michael Lutter; Boris Belousov; Kim Listmann; Debora Clever; Jan Peters

HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints

Michael Lutter, Boris Belousov, Kim Listmann, Debora Clever, Jan Peters

Proceedings of the Conference on Robot Learning, PMLR 100:640-650, 2020.

Abstract

Learning optimal feedback control laws capable of executing optimal trajectories is essential for many robotic applications. Such policies can be learned using reinforcement learning or planned using optimal control. While reinforcement learning is sample inefficient, optimal control only plans an optimal trajectory from a specific starting configuration. In this paper we propose HJB control to learn an optimal feedback policy rather than a single trajectory using principles from optimal control. By exploiting the inherent structure of the robot dynamics and strictly convex action cost, we derive principled cost functions such that the optimal policy naturally obeys the action limits, is globally optimal and stable on the training domain given the optimal value function. The corresponding optimal value function is learned end-to-end by embedding a deep differential network in the Hamilton-Jacobi-Bellmann differential equation and minimizing the error of this equality while simultaneously decreasing the discounting from short- to far-sighted to enable the learning. Our proposed approach enables us to learn an optimal feedback control law in continuous time, that in contrast to existing approaches generates an optimal trajectory from any point in state-space without the need of replanning. The resulting approach is evaluated on non-linear systems and achieves optimal feedback control, where standard optimal control methods require frequent replanning.

Cite this Paper

BibTeX


@InProceedings{pmlr-v100-lutter20a,
  title = 	 {HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints},
  author =       {Lutter, Michael and Belousov, Boris and Listmann, Kim and Clever, Debora and Peters, Jan},
  booktitle = 	 {Proceedings of the Conference on Robot Learning},
  pages = 	 {640--650},
  year = 	 {2020},
  editor = 	 {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei},
  volume = 	 {100},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Oct--01 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v100/lutter20a/lutter20a.pdf},
  url = 	 {https://proceedings.mlr.press/v100/lutter20a.html},
  abstract = 	 {Learning optimal feedback control laws capable of executing optimal trajectories is essential for many robotic applications. Such policies can be learned using reinforcement learning or planned using optimal control. While reinforcement learning is sample inefficient, optimal control only plans an optimal trajectory from a specific starting configuration. In this paper we propose HJB control to learn an optimal feedback policy rather than a single trajectory using principles from optimal control. By exploiting the inherent structure of the robot dynamics and strictly convex action cost, we derive principled cost functions such that the optimal policy naturally obeys the action limits, is globally optimal and stable on the training domain given the optimal value function. The corresponding optimal value function is learned end-to-end by embedding a deep differential network in the Hamilton-Jacobi-Bellmann differential equation and minimizing the error of this equality while simultaneously decreasing the discounting from short- to far-sighted to enable the learning. Our proposed approach enables us to learn an optimal feedback control law in continuous time, that in contrast to existing approaches generates an optimal trajectory from any point in state-space without the need of replanning. The resulting approach is evaluated on non-linear systems and achieves optimal feedback control, where standard optimal control methods require frequent replanning.}
}

Endnote

%0 Conference Paper
%T HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints
%A Michael Lutter
%A Boris Belousov
%A Kim Listmann
%A Debora Clever
%A Jan Peters
%B Proceedings of the Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Leslie Pack Kaelbling
%E Danica Kragic
%E Komei Sugiura	
%F pmlr-v100-lutter20a
%I PMLR
%P 640--650
%U https://proceedings.mlr.press/v100/lutter20a.html
%V 100
%X Learning optimal feedback control laws capable of executing optimal trajectories is essential for many robotic applications. Such policies can be learned using reinforcement learning or planned using optimal control. While reinforcement learning is sample inefficient, optimal control only plans an optimal trajectory from a specific starting configuration. In this paper we propose HJB control to learn an optimal feedback policy rather than a single trajectory using principles from optimal control. By exploiting the inherent structure of the robot dynamics and strictly convex action cost, we derive principled cost functions such that the optimal policy naturally obeys the action limits, is globally optimal and stable on the training domain given the optimal value function. The corresponding optimal value function is learned end-to-end by embedding a deep differential network in the Hamilton-Jacobi-Bellmann differential equation and minimizing the error of this equality while simultaneously decreasing the discounting from short- to far-sighted to enable the learning. Our proposed approach enables us to learn an optimal feedback control law in continuous time, that in contrast to existing approaches generates an optimal trajectory from any point in state-space without the need of replanning. The resulting approach is evaluated on non-linear systems and achieves optimal feedback control, where standard optimal control methods require frequent replanning.

APA


Lutter, M., Belousov, B., Listmann, K., Clever, D. & Peters, J.. (2020). HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints. Proceedings of the Conference on Robot Learning, in Proceedings of Machine Learning Research 100:640-650 Available from https://proceedings.mlr.press/v100/lutter20a.html.

Related Material

Download PDF