Fitting a Linear Control Policy to Demonstrations with a Kalman Constraint

Malayandi Palan, Shane Barratt, Alex McCauley, Dorsa Sadigh, Vikas Sindhwani, Stephen Boyd
Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:374-383, 2020.

Abstract

We consider the problem of learning a linear control policy for a linear dynamical system, from demonstrations of an expert regulating the system. The standard approach to this problem is (linear) policy fitting, which fits a linear policy by minimizing a loss function between the demonstrations and the policy’s outputs plus a regularization function that encodes prior knowledge. Despite its simplicity, this method fails to learn policies with low or even finite cost when there are few demonstrations. We propose to add an additional constraint to the regularization function in policy fitting, that the policy is the solution to some LQR problem, i.e., optimal in the stochastic control sense for some choice of quadratic cost. We refer to this constraint as a Kalman constraint. Policy fitting with a Kalman constraint requires solving an optimization problem with convex cost and bilinear constraints. We propose a heuristic method, based on the alternating direction method of multipliers (ADMM), to approximately solve this problem. An illustrative numerical experiment demonstrates that adding the Kalman constraint allows us to learn good, i.e., low cost, policies even when very few data are available.

Cite this Paper


BibTeX
@InProceedings{pmlr-v120-palan20a, title = {Fitting a Linear Control Policy to Demonstrations with a Kalman Constraint}, author = {Palan, Malayandi and Barratt, Shane and McCauley, Alex and Sadigh, Dorsa and Sindhwani, Vikas and Boyd, Stephen}, booktitle = {Proceedings of the 2nd Conference on Learning for Dynamics and Control}, pages = {374--383}, year = {2020}, editor = {Bayen, Alexandre M. and Jadbabaie, Ali and Pappas, George and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire and Zeilinger, Melanie}, volume = {120}, series = {Proceedings of Machine Learning Research}, month = {10--11 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v120/palan20a/palan20a.pdf}, url = {https://proceedings.mlr.press/v120/palan20a.html}, abstract = {We consider the problem of learning a linear control policy for a linear dynamical system, from demonstrations of an expert regulating the system. The standard approach to this problem is (linear) policy fitting, which fits a linear policy by minimizing a loss function between the demonstrations and the policy’s outputs plus a regularization function that encodes prior knowledge. Despite its simplicity, this method fails to learn policies with low or even finite cost when there are few demonstrations. We propose to add an additional constraint to the regularization function in policy fitting, that the policy is the solution to some LQR problem, i.e., optimal in the stochastic control sense for some choice of quadratic cost. We refer to this constraint as a Kalman constraint. Policy fitting with a Kalman constraint requires solving an optimization problem with convex cost and bilinear constraints. We propose a heuristic method, based on the alternating direction method of multipliers (ADMM), to approximately solve this problem. An illustrative numerical experiment demonstrates that adding the Kalman constraint allows us to learn good, i.e., low cost, policies even when very few data are available.} }
Endnote
%0 Conference Paper %T Fitting a Linear Control Policy to Demonstrations with a Kalman Constraint %A Malayandi Palan %A Shane Barratt %A Alex McCauley %A Dorsa Sadigh %A Vikas Sindhwani %A Stephen Boyd %B Proceedings of the 2nd Conference on Learning for Dynamics and Control %C Proceedings of Machine Learning Research %D 2020 %E Alexandre M. Bayen %E Ali Jadbabaie %E George Pappas %E Pablo A. Parrilo %E Benjamin Recht %E Claire Tomlin %E Melanie Zeilinger %F pmlr-v120-palan20a %I PMLR %P 374--383 %U https://proceedings.mlr.press/v120/palan20a.html %V 120 %X We consider the problem of learning a linear control policy for a linear dynamical system, from demonstrations of an expert regulating the system. The standard approach to this problem is (linear) policy fitting, which fits a linear policy by minimizing a loss function between the demonstrations and the policy’s outputs plus a regularization function that encodes prior knowledge. Despite its simplicity, this method fails to learn policies with low or even finite cost when there are few demonstrations. We propose to add an additional constraint to the regularization function in policy fitting, that the policy is the solution to some LQR problem, i.e., optimal in the stochastic control sense for some choice of quadratic cost. We refer to this constraint as a Kalman constraint. Policy fitting with a Kalman constraint requires solving an optimization problem with convex cost and bilinear constraints. We propose a heuristic method, based on the alternating direction method of multipliers (ADMM), to approximately solve this problem. An illustrative numerical experiment demonstrates that adding the Kalman constraint allows us to learn good, i.e., low cost, policies even when very few data are available.
APA
Palan, M., Barratt, S., McCauley, A., Sadigh, D., Sindhwani, V. & Boyd, S.. (2020). Fitting a Linear Control Policy to Demonstrations with a Kalman Constraint. Proceedings of the 2nd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 120:374-383 Available from https://proceedings.mlr.press/v120/palan20a.html.

Related Material