A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4357-4366, 2020.

Abstract

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(?) and Q-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-amortila20a, title = {A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms}, author = {Amortila, Philip and Precup, Doina and Panangaden, Prakash and Bellemare, Marc G.}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {4357--4366}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/amortila20a/amortila20a.pdf}, url = {http://proceedings.mlr.press/v108/amortila20a.html}, abstract = {We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(?) and Q-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.} }
Endnote
%0 Conference Paper %T A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms %A Philip Amortila %A Doina Precup %A Prakash Panangaden %A Marc G. Bellemare %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-amortila20a %I PMLR %P 4357--4366 %U http://proceedings.mlr.press/v108/amortila20a.html %V 108 %X We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(?) and Q-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.
APA
Amortila, P., Precup, D., Panangaden, P. & Bellemare, M.G.. (2020). A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:4357-4366 Available from http://proceedings.mlr.press/v108/amortila20a.html.

Related Material