An Analysis of Categorical Distributional Reinforcement Learning

Mark Rowland, Marc Bellemare, Will Dabney, Remi Munos, Yee Whye Teh
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:29-37, 2018.

Abstract

Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by the recently proposed C51 algorithm, based on categorical distributional reinforcement learning (CDRL) [Bellemare et al., 2017]. However, the theoretical properties of CDRL algorithms are not yet well understood. In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cramer distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-rowland18a, title = {An Analysis of Categorical Distributional Reinforcement Learning}, author = {Mark Rowland and Marc Bellemare and Will Dabney and Remi Munos and Yee Whye Teh}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {29--37}, year = {2018}, editor = {Amos Storkey and Fernando Perez-Cruz}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/rowland18a/rowland18a.pdf}, url = { http://proceedings.mlr.press/v84/rowland18a.html }, abstract = {Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by the recently proposed C51 algorithm, based on categorical distributional reinforcement learning (CDRL) [Bellemare et al., 2017]. However, the theoretical properties of CDRL algorithms are not yet well understood. In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cramer distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.} }
Endnote
%0 Conference Paper %T An Analysis of Categorical Distributional Reinforcement Learning %A Mark Rowland %A Marc Bellemare %A Will Dabney %A Remi Munos %A Yee Whye Teh %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-rowland18a %I PMLR %P 29--37 %U http://proceedings.mlr.press/v84/rowland18a.html %V 84 %X Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by the recently proposed C51 algorithm, based on categorical distributional reinforcement learning (CDRL) [Bellemare et al., 2017]. However, the theoretical properties of CDRL algorithms are not yet well understood. In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cramer distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.
APA
Rowland, M., Bellemare, M., Dabney, W., Munos, R. & Teh, Y.W.. (2018). An Analysis of Categorical Distributional Reinforcement Learning. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:29-37 Available from http://proceedings.mlr.press/v84/rowland18a.html .

Related Material