Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence

Kaiqing Zhang, Bin Hu, Tamer Basar

Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:179-190, 2020.

Abstract

Policy optimization (PO) is a key ingredient for modern reinforcement learning (RL). For control design, certain constraints are usually enforced on the policies to optimize, accounting for either the stability, robustness, or safety concerns on the system. Hence, PO is by nature a constrained (nonconvex) optimization in most cases, whose global convergence is challenging to analyze in general. More importantly, some constraints that are safety-critical, e.g., the closed-loop stability, or the $\mathcal{H}_{\infty}$-norm constraint that guarantees the system robustness, can be difficult to enforce on the controller being learned as the PO methods proceed. In this paper, we study the convergence theory of PO for $\mathcal{H}_{2}$ linear control with $\mathcal{H}_{\infty}$ robustness guarantee. This general framework includes risk-sensitive linear control as a special case. One significant new feature of this problem, in contrast to the standard $\mathcal{H}_{2}$ linear control, namely, linear quadratic regulator (LQR) problems, is the lack of coercivity of the cost function. This makes it challenging to guarantee the feasibility, namely, the $\mathcal{H}_{\infty}$ robustness, of the iterates. Interestingly, we propose two PO algorithms that enjoy the implicit regularization property, i.e., the iterates preserve the $\mathcal{H}_{\infty}$ robustness, as if they are regularized by the algorithms. Furthermore, convergence to the globally optimal policies with globally sublinear and locally (super-)linear rates are provided under certain conditions, despite the nonconvexity of the problem. To the best of our knowledge, our work offers the first results on the implicit regularization property and global convergence of PO methods for robust/risk-sensitive control.

Cite this Paper

BibTeX

@InProceedings{pmlr-v120-zhang20a,
title = {Policy Optimization for $\mathcal{H}_{2}$ Linear Control with $\mathcal{H}_{\infty}$ Robustness Guarantee: Implicit Regularization and Global Convergence},
author = {Zhang, Kaiqing and Hu, Bin and Basar, Tamer},
booktitle = {Proceedings of the 2nd Conference on Learning for Dynamics and Control},
pages = {179--190},
year = {2020},
editor = {Bayen, Alexandre M. and Jadbabaie, Ali and Pappas, George and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire and Zeilinger, Melanie},
volume = {120},
series = {Proceedings of Machine Learning Research},
month = {10--11 Jun},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v120/zhang20a/zhang20a.pdf},
url = {https://proceedings.mlr.press/v120/zhang20a.html},
abstract = {Policy optimization (PO) is a key ingredient for modern reinforcement learning (RL). For control design, certain constraints are usually enforced on the policies to optimize, accounting for either the stability, robustness, or safety concerns on the system. Hence, PO is by nature a constrained (nonconvex) optimization in most cases, whose global convergence is challenging to analyze in general. More importantly, some constraints that are safety-critical, e.g., the closed-loop stability, or the $\mathcal{H}_{\infty}$-norm constraint that guarantees the system robustness, can be difficult to enforce on the controller being learned as the PO methods proceed. In this paper, we study the convergence theory of PO for $\mathcal{H}_{2}$ linear control with $\mathcal{H}_{\infty}$ robustness guarantee. This general framework includes risk-sensitive linear control as a special case. One significant new feature of this problem, in contrast to the standard $\mathcal{H}_{2}$ linear control, namely, linear quadratic regulator (LQR) problems, is the lack of coercivity of the cost function. This makes it challenging to guarantee the feasibility, namely, the $\mathcal{H}_{\infty}$ robustness, of the iterates. Interestingly, we propose two PO algorithms that enjoy the implicit regularization property, i.e., the iterates preserve the $\mathcal{H}_{\infty}$ robustness, as if they are regularized by the algorithms. Furthermore, convergence to the globally optimal policies with globally sublinear and locally (super-)linear rates are provided under certain conditions, despite the nonconvexity of the problem. To the best of our knowledge, our work offers the first results on the implicit regularization property and global convergence of PO methods for robust/risk-sensitive control.}
}

Endnote

%0 Conference Paper
%T Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence
%A Kaiqing Zhang
%A Bin Hu
%A Tamer Basar
%B Proceedings of the 2nd Conference on Learning for Dynamics and Control
%C Proceedings of Machine Learning Research
%D 2020
%E Alexandre M. Bayen
%E Ali Jadbabaie
%E George Pappas
%E Pablo A. Parrilo
%E Benjamin Recht
%E Claire Tomlin
%E Melanie Zeilinger
%F pmlr-v120-zhang20a
%I PMLR
%P 179--190
%U https://proceedings.mlr.press/v120/zhang20a.html
%V 120
%X Policy optimization (PO) is a key ingredient for modern reinforcement learning (RL). For control design, certain constraints are usually enforced on the policies to optimize, accounting for either the stability, robustness, or safety concerns on the system. Hence, PO is by nature a constrained (nonconvex) optimization in most cases, whose global convergence is challenging to analyze in general. More importantly, some constraints that are safety-critical, e.g., the closed-loop stability, or the $\mathcal{H}_{\infty}$-norm constraint that guarantees the system robustness, can be difficult to enforce on the controller being learned as the PO methods proceed. In this paper, we study the convergence theory of PO for $\mathcal{H}_{2}$ linear control with $\mathcal{H}_{\infty}$ robustness guarantee. This general framework includes risk-sensitive linear control as a special case. One significant new feature of this problem, in contrast to the standard $\mathcal{H}_{2}$ linear control, namely, linear quadratic regulator (LQR) problems, is the lack of coercivity of the cost function. This makes it challenging to guarantee the feasibility, namely, the $\mathcal{H}_{\infty}$ robustness, of the iterates. Interestingly, we propose two PO algorithms that enjoy the implicit regularization property, i.e., the iterates preserve the $\mathcal{H}_{\infty}$ robustness, as if they are regularized by the algorithms. Furthermore, convergence to the globally optimal policies with globally sublinear and locally (super-)linear rates are provided under certain conditions, despite the nonconvexity of the problem. To the best of our knowledge, our work offers the first results on the implicit regularization property and global convergence of PO methods for robust/risk-sensitive control.

APA

Zhang, K., Hu, B. & Basar, T.. (2020). Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence. Proceedings of the 2nd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 120:179-190 Available from https://proceedings.mlr.press/v120/zhang20a.html.