Learning Convex Optimization Control Policies

Akshay Agrawal, Shane Barratt, Stephen Boyd, Bartolomeo Stellato
Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:361-373, 2020.

Abstract

Many control policies used in applications compute the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) include the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex program with respect to its parameters. A longer version of this paper, which illustrates our method on many examples, is available at https://web.stanford.edu/ boyd/papers/learning_cocps.html.

Cite this Paper


BibTeX
@InProceedings{pmlr-v120-agrawal20a, title = {Learning Convex Optimization Control Policies}, author = {Agrawal, Akshay and Barratt, Shane and Boyd, Stephen and Stellato, Bartolomeo}, booktitle = {Proceedings of the 2nd Conference on Learning for Dynamics and Control}, pages = {361--373}, year = {2020}, editor = {Bayen, Alexandre M. and Jadbabaie, Ali and Pappas, George and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire and Zeilinger, Melanie}, volume = {120}, series = {Proceedings of Machine Learning Research}, month = {10--11 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v120/agrawal20a/agrawal20a.pdf}, url = {https://proceedings.mlr.press/v120/agrawal20a.html}, abstract = {Many control policies used in applications compute the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) include the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex program with respect to its parameters. A longer version of this paper, which illustrates our method on many examples, is available at https://web.stanford.edu/ boyd/papers/learning_cocps.html.} }
Endnote
%0 Conference Paper %T Learning Convex Optimization Control Policies %A Akshay Agrawal %A Shane Barratt %A Stephen Boyd %A Bartolomeo Stellato %B Proceedings of the 2nd Conference on Learning for Dynamics and Control %C Proceedings of Machine Learning Research %D 2020 %E Alexandre M. Bayen %E Ali Jadbabaie %E George Pappas %E Pablo A. Parrilo %E Benjamin Recht %E Claire Tomlin %E Melanie Zeilinger %F pmlr-v120-agrawal20a %I PMLR %P 361--373 %U https://proceedings.mlr.press/v120/agrawal20a.html %V 120 %X Many control policies used in applications compute the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) include the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex program with respect to its parameters. A longer version of this paper, which illustrates our method on many examples, is available at https://web.stanford.edu/ boyd/papers/learning_cocps.html.
APA
Agrawal, A., Barratt, S., Boyd, S. & Stellato, B.. (2020). Learning Convex Optimization Control Policies. Proceedings of the 2nd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 120:361-373 Available from https://proceedings.mlr.press/v120/agrawal20a.html.

Related Material