Policy Evaluation in Distributional LQR

Zifan Wang, Yulong Gao, Siyi Wang, Michael M. Zavlanos, Alessandro Abate, Karl Henrik Johansson
Proceedings of The 5th Annual Learning for Dynamics and Control Conference, PMLR 211:1245-1256, 2023.

Abstract

Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for a special class of DRL problems that rely on linear quadratic regulator (LQR) for control, advocating for a new distributional approach to LQR, which we call \emph{distributional LQR}. Specifically, we provide a closed-form expression of the distribution of the random return which, remarkably, is applicable to all exogenous disturbances on the dynamics, as long as they are independent and identically distributed (i.i.d.). While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be approximated by a finite number of random variables, and the associated approximation error can be analytically bounded under mild assumptions. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a measure of risk. Numerical experiments are provided to illustrate our theoretical results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v211-wang23c, title = {Policy Evaluation in Distributional LQR}, author = {Wang, Zifan and Gao, Yulong and Wang, Siyi and Zavlanos, Michael M. and Abate, Alessandro and Johansson, Karl Henrik}, booktitle = {Proceedings of The 5th Annual Learning for Dynamics and Control Conference}, pages = {1245--1256}, year = {2023}, editor = {Matni, Nikolai and Morari, Manfred and Pappas, George J.}, volume = {211}, series = {Proceedings of Machine Learning Research}, month = {15--16 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v211/wang23c/wang23c.pdf}, url = {https://proceedings.mlr.press/v211/wang23c.html}, abstract = {Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for a special class of DRL problems that rely on linear quadratic regulator (LQR) for control, advocating for a new distributional approach to LQR, which we call \emph{distributional LQR}. Specifically, we provide a closed-form expression of the distribution of the random return which, remarkably, is applicable to all exogenous disturbances on the dynamics, as long as they are independent and identically distributed (i.i.d.). While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be approximated by a finite number of random variables, and the associated approximation error can be analytically bounded under mild assumptions. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a measure of risk. Numerical experiments are provided to illustrate our theoretical results.} }
Endnote
%0 Conference Paper %T Policy Evaluation in Distributional LQR %A Zifan Wang %A Yulong Gao %A Siyi Wang %A Michael M. Zavlanos %A Alessandro Abate %A Karl Henrik Johansson %B Proceedings of The 5th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2023 %E Nikolai Matni %E Manfred Morari %E George J. Pappas %F pmlr-v211-wang23c %I PMLR %P 1245--1256 %U https://proceedings.mlr.press/v211/wang23c.html %V 211 %X Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for a special class of DRL problems that rely on linear quadratic regulator (LQR) for control, advocating for a new distributional approach to LQR, which we call \emph{distributional LQR}. Specifically, we provide a closed-form expression of the distribution of the random return which, remarkably, is applicable to all exogenous disturbances on the dynamics, as long as they are independent and identically distributed (i.i.d.). While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be approximated by a finite number of random variables, and the associated approximation error can be analytically bounded under mild assumptions. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a measure of risk. Numerical experiments are provided to illustrate our theoretical results.
APA
Wang, Z., Gao, Y., Wang, S., Zavlanos, M.M., Abate, A. & Johansson, K.H.. (2023). Policy Evaluation in Distributional LQR. Proceedings of The 5th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 211:1245-1256 Available from https://proceedings.mlr.press/v211/wang23c.html.

Related Material