An Analysis of Categorical Distributional Reinforcement Learning

Mark Rowland, Marc Bellemare, Will Dabney, Remi Munos, Yee Whye Teh
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:29-37, 2018.

Abstract

Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by the recently proposed C51 algorithm, based on categorical distributional reinforcement learning (CDRL) [Bellemare et al., 2017]. However, the theoretical properties of CDRL algorithms are not yet well understood. In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cramer distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-rowland18a, title = {An Analysis of Categorical Distributional Reinforcement Learning}, author = {Rowland, Mark and Bellemare, Marc and Dabney, Will and Munos, Remi and Teh, Yee Whye}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {29--37}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/rowland18a/rowland18a.pdf}, url = {https://proceedings.mlr.press/v84/rowland18a.html}, abstract = {Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by the recently proposed C51 algorithm, based on categorical distributional reinforcement learning (CDRL) [Bellemare et al., 2017]. However, the theoretical properties of CDRL algorithms are not yet well understood. In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cramer distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.} }
Endnote
%0 Conference Paper %T An Analysis of Categorical Distributional Reinforcement Learning %A Mark Rowland %A Marc Bellemare %A Will Dabney %A Remi Munos %A Yee Whye Teh %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-rowland18a %I PMLR %P 29--37 %U https://proceedings.mlr.press/v84/rowland18a.html %V 84 %X Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by the recently proposed C51 algorithm, based on categorical distributional reinforcement learning (CDRL) [Bellemare et al., 2017]. However, the theoretical properties of CDRL algorithms are not yet well understood. In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cramer distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.
APA
Rowland, M., Bellemare, M., Dabney, W., Munos, R. & Teh, Y.W.. (2018). An Analysis of Categorical Distributional Reinforcement Learning. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:29-37 Available from https://proceedings.mlr.press/v84/rowland18a.html.

Related Material