Policy Optimization for Unknown Systems using Differentiable MPC

Riccardo Zuliani; Efe C. Balta; John Lygeros

Policy Optimization for Unknown Systems using Differentiable MPC

Riccardo Zuliani, Efe C. Balta, John Lygeros

Proceedings of The 8th Annual Learning for Dynamics and Control Conference, PMLR 331:1275-1287, 2026.

Abstract

Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.

Cite this Paper

BibTeX

@InProceedings{pmlr-v331-zuliani26a,
  title = 	 {Policy Optimization for Unknown Systems using Differentiable MPC},
  author =       {Zuliani, Riccardo and Balta, Efe C. and Lygeros, John},
  booktitle = 	 {Proceedings of The 8th Annual Learning for Dynamics and Control Conference},
  pages = 	 {1275--1287},
  year = 	 {2026},
  editor = 	 {Sukhatme, Gaurav and Lindemann, Lars and Tu, Stephen and Wierman, Adam and Atanasov, Nikolay},
  volume = 	 {331},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v331/main/assets/zuliani26a/zuliani26a.pdf},
  url = 	 {https://proceedings.mlr.press/v331/zuliani26a.html},
  abstract = 	 {Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.}
}

Endnote

%0 Conference Paper
%T Policy Optimization for Unknown Systems using Differentiable MPC
%A Riccardo Zuliani
%A Efe C. Balta
%A John Lygeros
%B Proceedings of The 8th Annual Learning for Dynamics and Control Conference
%C Proceedings of Machine Learning Research
%D 2026
%E Gaurav Sukhatme
%E Lars Lindemann
%E Stephen Tu
%E Adam Wierman
%E Nikolay Atanasov	
%F pmlr-v331-zuliani26a
%I PMLR
%P 1275--1287
%U https://proceedings.mlr.press/v331/zuliani26a.html
%V 331
%X Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.

APA

Zuliani, R., Balta, E.C. & Lygeros, J.. (2026). Policy Optimization for Unknown Systems using Differentiable MPC. Proceedings of The 8th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 331:1275-1287 Available from https://proceedings.mlr.press/v331/zuliani26a.html.

Policy Optimization for Unknown Systems using Differentiable MPC

Abstract

Cite this Paper

Related Material