Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Jun Sun, Gang Wang, Georgios B. Giannakis, Qinmin Yang, Zaiyue Yang
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4485-4495, 2020.

Abstract

Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized setting, using temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of the group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an ‘additional’ projection step to control the ‘gradient’ bias incurred by the Markovian observations. In this paper, we provide a finite-time analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a small neighborhood of the optimum. The resultant error bounds are the first of its type—in the sense that they hold under the most practical assumptions—which is made possible by means of a novel multi-step Lyapunov approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-sun20a, title = {Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation}, author = {Sun, Jun and Wang, Gang and Giannakis, Georgios B. and Yang, Qinmin and Yang, Zaiyue}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {4485--4495}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/sun20a/sun20a.pdf}, url = {https://proceedings.mlr.press/v108/sun20a.html}, abstract = {Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized setting, using temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of the group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an ‘additional’ projection step to control the ‘gradient’ bias incurred by the Markovian observations. In this paper, we provide a finite-time analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a small neighborhood of the optimum. The resultant error bounds are the first of its type—in the sense that they hold under the most practical assumptions—which is made possible by means of a novel multi-step Lyapunov approach. } }
Endnote
%0 Conference Paper %T Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation %A Jun Sun %A Gang Wang %A Georgios B. Giannakis %A Qinmin Yang %A Zaiyue Yang %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-sun20a %I PMLR %P 4485--4495 %U https://proceedings.mlr.press/v108/sun20a.html %V 108 %X Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized setting, using temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of the group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an ‘additional’ projection step to control the ‘gradient’ bias incurred by the Markovian observations. In this paper, we provide a finite-time analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a small neighborhood of the optimum. The resultant error bounds are the first of its type—in the sense that they hold under the most practical assumptions—which is made possible by means of a novel multi-step Lyapunov approach.
APA
Sun, J., Wang, G., Giannakis, G.B., Yang, Q. & Yang, Z.. (2020). Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:4485-4495 Available from https://proceedings.mlr.press/v108/sun20a.html.

Related Material