Implicit Quantile Networks for Distributional Reinforcement Learning

Will Dabney, Georg Ostrovski, David Silver, Remi Munos
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1096-1105, 2018.

Abstract

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm’s implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-dabney18a, title = {Implicit Quantile Networks for Distributional Reinforcement Learning}, author = {Dabney, Will and Ostrovski, Georg and Silver, David and Munos, Remi}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {1096--1105}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/dabney18a/dabney18a.pdf}, url = {https://proceedings.mlr.press/v80/dabney18a.html}, abstract = {In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm’s implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.} }
Endnote
%0 Conference Paper %T Implicit Quantile Networks for Distributional Reinforcement Learning %A Will Dabney %A Georg Ostrovski %A David Silver %A Remi Munos %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-dabney18a %I PMLR %P 1096--1105 %U https://proceedings.mlr.press/v80/dabney18a.html %V 80 %X In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm’s implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.
APA
Dabney, W., Ostrovski, G., Silver, D. & Munos, R.. (2018). Implicit Quantile Networks for Distributional Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1096-1105 Available from https://proceedings.mlr.press/v80/dabney18a.html.

Related Material