A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Philip Amortila; Doina Precup; Prakash Panangaden; Marc G. Bellemare

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4357-4366, 2020.

Abstract

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(?) and Q-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.

Cite this Paper

BibTeX


@InProceedings{pmlr-v108-amortila20a,
  title = 	 {A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms},
  author =       {Amortila, Philip and Precup, Doina and Panangaden, Prakash and Bellemare, Marc G.},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {4357--4366},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/amortila20a/amortila20a.pdf},
  url = 	 {https://proceedings.mlr.press/v108/amortila20a.html},
  abstract = 	 {We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(?) and Q-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.}
}

Endnote

%0 Conference Paper
%T A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms
%A Philip Amortila
%A Doina Precup
%A Prakash Panangaden
%A Marc G. Bellemare
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-amortila20a
%I PMLR
%P 4357--4366
%U https://proceedings.mlr.press/v108/amortila20a.html
%V 108
%X We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(?) and Q-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.

APA


Amortila, P., Precup, D., Panangaden, P. & Bellemare, M.G.. (2020). A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:4357-4366 Available from https://proceedings.mlr.press/v108/amortila20a.html.

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Abstract

Cite this Paper

Related Material