A Pontryagin Perspective on Reinforcement Learning

Onno Eberhard, Claire Vernade, Michael Muehlebach
Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:233-244, 2025.

Abstract

Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman’s equation from dynamic programming, our work builds on Pontryagin’s principle from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, significantly outperforming existing baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v283-eberhard25a, title = {A Pontryagin Perspective on Reinforcement Learning}, author = {Eberhard, Onno and Vernade, Claire and Muehlebach, Michael}, booktitle = {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference}, pages = {233--244}, year = {2025}, editor = {Ozay, Necmiye and Balzano, Laura and Panagou, Dimitra and Abate, Alessandro}, volume = {283}, series = {Proceedings of Machine Learning Research}, month = {04--06 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v283/main/assets/eberhard25a/eberhard25a.pdf}, url = {https://proceedings.mlr.press/v283/eberhard25a.html}, abstract = {Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman’s equation from dynamic programming, our work builds on Pontryagin’s principle from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, significantly outperforming existing baselines.} }
Endnote
%0 Conference Paper %T A Pontryagin Perspective on Reinforcement Learning %A Onno Eberhard %A Claire Vernade %A Michael Muehlebach %B Proceedings of the 7th Annual Learning for Dynamics \& Control Conference %C Proceedings of Machine Learning Research %D 2025 %E Necmiye Ozay %E Laura Balzano %E Dimitra Panagou %E Alessandro Abate %F pmlr-v283-eberhard25a %I PMLR %P 233--244 %U https://proceedings.mlr.press/v283/eberhard25a.html %V 283 %X Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman’s equation from dynamic programming, our work builds on Pontryagin’s principle from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, significantly outperforming existing baselines.
APA
Eberhard, O., Vernade, C. & Muehlebach, M.. (2025). A Pontryagin Perspective on Reinforcement Learning. Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, in Proceedings of Machine Learning Research 283:233-244 Available from https://proceedings.mlr.press/v283/eberhard25a.html.

Related Material