Policy Optimization for Unknown Systems using Differentiable MPC

Riccardo Zuliani, Efe C. Balta, John Lygeros
Proceedings of The 8th Annual Learning for Dynamics and Control Conference, PMLR 331:1275-1287, 2026.

Abstract

Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.

Cite this Paper


BibTeX
@InProceedings{pmlr-v331-zuliani26a, title = {Policy Optimization for Unknown Systems using Differentiable MPC}, author = {Zuliani, Riccardo and Balta, Efe C. and Lygeros, John}, booktitle = {Proceedings of The 8th Annual Learning for Dynamics and Control Conference}, pages = {1275--1287}, year = {2026}, editor = {Sukhatme, Gaurav and Lindemann, Lars and Tu, Stephen and Wierman, Adam and Atanasov, Nikolay}, volume = {331}, series = {Proceedings of Machine Learning Research}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v331/main/assets/zuliani26a/zuliani26a.pdf}, url = {https://proceedings.mlr.press/v331/zuliani26a.html}, abstract = {Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.} }
Endnote
%0 Conference Paper %T Policy Optimization for Unknown Systems using Differentiable MPC %A Riccardo Zuliani %A Efe C. Balta %A John Lygeros %B Proceedings of The 8th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2026 %E Gaurav Sukhatme %E Lars Lindemann %E Stephen Tu %E Adam Wierman %E Nikolay Atanasov %F pmlr-v331-zuliani26a %I PMLR %P 1275--1287 %U https://proceedings.mlr.press/v331/zuliani26a.html %V 331 %X Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.
APA
Zuliani, R., Balta, E.C. & Lygeros, J.. (2026). Policy Optimization for Unknown Systems using Differentiable MPC. Proceedings of The 8th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 331:1275-1287 Available from https://proceedings.mlr.press/v331/zuliani26a.html.

Related Material