Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach

Yingying Li, Yujie Tang, Runyu Zhang, Na Li
Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:814-814, 2020.

Abstract

This paper considers a distributed reinforcement learning problem for decentralized linear quadratic control with partial state observations and local costs. We propose a Zero-Order Distributed Policy Optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerance’s inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated throughout ZODPO are stabilizing controllers with high probability. Lastly, we numerically test ZODPO on a multi-zone HVAC system.

Cite this Paper


BibTeX
@InProceedings{pmlr-v120-li20c, title = {Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach}, author = {Li, Yingying and Tang, Yujie and Zhang, Runyu and Li, Na}, booktitle = {Proceedings of the 2nd Conference on Learning for Dynamics and Control}, pages = {814--814}, year = {2020}, editor = {Bayen, Alexandre M. and Jadbabaie, Ali and Pappas, George and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire and Zeilinger, Melanie}, volume = {120}, series = {Proceedings of Machine Learning Research}, month = {10--11 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v120/li20c/li20c.pdf}, url = {https://proceedings.mlr.press/v120/li20c.html}, abstract = {This paper considers a distributed reinforcement learning problem for decentralized linear quadratic control with partial state observations and local costs. We propose a Zero-Order Distributed Policy Optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerance’s inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated throughout ZODPO are stabilizing controllers with high probability. Lastly, we numerically test ZODPO on a multi-zone HVAC system. } }
Endnote
%0 Conference Paper %T Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach %A Yingying Li %A Yujie Tang %A Runyu Zhang %A Na Li %B Proceedings of the 2nd Conference on Learning for Dynamics and Control %C Proceedings of Machine Learning Research %D 2020 %E Alexandre M. Bayen %E Ali Jadbabaie %E George Pappas %E Pablo A. Parrilo %E Benjamin Recht %E Claire Tomlin %E Melanie Zeilinger %F pmlr-v120-li20c %I PMLR %P 814--814 %U https://proceedings.mlr.press/v120/li20c.html %V 120 %X This paper considers a distributed reinforcement learning problem for decentralized linear quadratic control with partial state observations and local costs. We propose a Zero-Order Distributed Policy Optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerance’s inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated throughout ZODPO are stabilizing controllers with high probability. Lastly, we numerically test ZODPO on a multi-zone HVAC system.
APA
Li, Y., Tang, Y., Zhang, R. & Li, N.. (2020). Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach. Proceedings of the 2nd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 120:814-814 Available from https://proceedings.mlr.press/v120/li20c.html.

Related Material