Distributional Reinforcement Learning for Efficient Exploration

Borislav Mavrin, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yaoliang Yu
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4424-4434, 2019.

Abstract

In distributional reinforcement learning (RL), the estimated distribution of value functions model both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method achieves 483 % average gain across 49 games in cumulative rewards over QR-DQN. We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves nearoptimal safety rewards twice faster than QRDQN.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-mavrin19a, title = {Distributional Reinforcement Learning for Efficient Exploration}, author = {Mavrin, Borislav and Yao, Hengshuai and Kong, Linglong and Wu, Kaiwen and Yu, Yaoliang}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {4424--4434}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/mavrin19a/mavrin19a.pdf}, url = {https://proceedings.mlr.press/v97/mavrin19a.html}, abstract = {In distributional reinforcement learning (RL), the estimated distribution of value functions model both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method achieves 483 % average gain across 49 games in cumulative rewards over QR-DQN. We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves nearoptimal safety rewards twice faster than QRDQN.} }
Endnote
%0 Conference Paper %T Distributional Reinforcement Learning for Efficient Exploration %A Borislav Mavrin %A Hengshuai Yao %A Linglong Kong %A Kaiwen Wu %A Yaoliang Yu %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-mavrin19a %I PMLR %P 4424--4434 %U https://proceedings.mlr.press/v97/mavrin19a.html %V 97 %X In distributional reinforcement learning (RL), the estimated distribution of value functions model both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method achieves 483 % average gain across 49 games in cumulative rewards over QR-DQN. We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves nearoptimal safety rewards twice faster than QRDQN.
APA
Mavrin, B., Yao, H., Kong, L., Wu, K. & Yu, Y.. (2019). Distributional Reinforcement Learning for Efficient Exploration. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4424-4434 Available from https://proceedings.mlr.press/v97/mavrin19a.html.

Related Material