Avik Jain, Lawrence Chan, Daniel S. Brown, Anca D. Dragan

Proceedings of the 3rd Conference on Learning for Dynamics and Control, PMLR 144:1205-1217, 2021.

Abstract

Many robotics domains use some form of nonconvex model predictive control (MPC) for planning, which sets a reduced time horizon, performs trajectory optimization, and replans at every step. The actual task typically requires a much longer horizon than is computationally tractable, and is specified via a cost function that cumulates over that full horizon. For instance, an autonomous car may have a cost function that makes a desired trade-off between efficiency, safety risk, and obeying traffic laws. In this work, we challenge the common assumption that the cost we should specify for MPC should be the same as the ground truth cost for the task. We propose that, because MPC solvers have short horizons, suffer from local optima, and, importantly, fail to account for future replanning ability, in many tasks it could be beneficial to purposefully choose a different cost function for MPC to optimize: one that results in the MPC rollout to have low ground truth cost, rather than the MPC planned trajectory. We formalize this as an optimal cost design problem, and propose a zeroth-order optimization-based approach that enables us to design optimal costs for an MPC planning robot in continuous state and action MDPs. We test our approach in an autonomous driving domain where we find costs different from the ground truth that implicitly compensate for replanning, short horizon, and local minima issues. As an example, planning with vanilla MPC under the learned cost incentivizes the car to delay its decision until later, implicitly accounting for the fact that it will get more information in the future and be able to make a better decision.

Cite this Paper

BibTeX

@InProceedings{pmlr-v144-jain21a,
title = {Optimal Cost Design for Model Predictive Control},
author = {Jain, Avik and Chan, Lawrence and Brown, Daniel S. and Dragan, Anca D.},
booktitle = {Proceedings of the 3rd Conference on Learning for Dynamics and Control},
pages = {1205--1217},
year = {2021},
editor = {Jadbabaie, Ali and Lygeros, John and Pappas, George J. and A. Parrilo, Pablo and Recht, Benjamin and Tomlin, Claire J. and Zeilinger, Melanie N.},
volume = {144},
series = {Proceedings of Machine Learning Research},
month = {07 -- 08 June},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v144/jain21a/jain21a.pdf},
url = {https://proceedings.mlr.press/v144/jain21a.html},
abstract = {Many robotics domains use some form of nonconvex model predictive control (MPC) for planning, which sets a reduced time horizon, performs trajectory optimization, and replans at every step. The actual task typically requires a much longer horizon than is computationally tractable, and is specified via a cost function that cumulates over that full horizon. For instance, an autonomous car may have a cost function that makes a desired trade-off between efficiency, safety risk, and obeying traffic laws. In this work, we challenge the common assumption that the cost we should specify for MPC should be the same as the ground truth cost for the task. We propose that, because MPC solvers have short horizons, suffer from local optima, and, importantly, fail to account for future replanning ability, in many tasks it could be beneficial to purposefully choose a different cost function for MPC to optimize: one that results in the MPC rollout to have low ground truth cost, rather than the MPC planned trajectory. We formalize this as an optimal cost design problem, and propose a zeroth-order optimization-based approach that enables us to design optimal costs for an MPC planning robot in continuous state and action MDPs. We test our approach in an autonomous driving domain where we find costs different from the ground truth that implicitly compensate for replanning, short horizon, and local minima issues. As an example, planning with vanilla MPC under the learned cost incentivizes the car to delay its decision until later, implicitly accounting for the fact that it will get more information in the future and be able to make a better decision.}
}

Endnote

%0 Conference Paper
%T Optimal Cost Design for Model Predictive Control
%A Avik Jain
%A Lawrence Chan
%A Daniel S. Brown
%A Anca D. Dragan
%B Proceedings of the 3rd Conference on Learning for Dynamics and Control
%C Proceedings of Machine Learning Research
%D 2021
%E Ali Jadbabaie
%E John Lygeros
%E George J. Pappas
%E Pablo A. Parrilo
%E Benjamin Recht
%E Claire J. Tomlin
%E Melanie N. Zeilinger
%F pmlr-v144-jain21a
%I PMLR
%P 1205--1217
%U https://proceedings.mlr.press/v144/jain21a.html
%V 144
%X Many robotics domains use some form of nonconvex model predictive control (MPC) for planning, which sets a reduced time horizon, performs trajectory optimization, and replans at every step. The actual task typically requires a much longer horizon than is computationally tractable, and is specified via a cost function that cumulates over that full horizon. For instance, an autonomous car may have a cost function that makes a desired trade-off between efficiency, safety risk, and obeying traffic laws. In this work, we challenge the common assumption that the cost we should specify for MPC should be the same as the ground truth cost for the task. We propose that, because MPC solvers have short horizons, suffer from local optima, and, importantly, fail to account for future replanning ability, in many tasks it could be beneficial to purposefully choose a different cost function for MPC to optimize: one that results in the MPC rollout to have low ground truth cost, rather than the MPC planned trajectory. We formalize this as an optimal cost design problem, and propose a zeroth-order optimization-based approach that enables us to design optimal costs for an MPC planning robot in continuous state and action MDPs. We test our approach in an autonomous driving domain where we find costs different from the ground truth that implicitly compensate for replanning, short horizon, and local minima issues. As an example, planning with vanilla MPC under the learned cost incentivizes the car to delay its decision until later, implicitly accounting for the fact that it will get more information in the future and be able to make a better decision.

APA

Jain, A., Chan, L., Brown, D.S. & Dragan, A.D.. (2021). Optimal Cost Design for Model Predictive Control. Proceedings of the 3rd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 144:1205-1217 Available from https://proceedings.mlr.press/v144/jain21a.html.