Distributional Reinforcement Learning for Efficient Exploration

Borislav Mavrin; Hengshuai Yao; Linglong Kong; Kaiwen Wu; Yaoliang Yu

Distributional Reinforcement Learning for Efficient Exploration

Borislav Mavrin, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yaoliang Yu

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4424-4434, 2019.

Abstract

In distributional reinforcement learning (RL), the estimated distribution of value functions model both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method achieves 483 % average gain across 49 games in cumulative rewards over QR-DQN. We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves nearoptimal safety rewards twice faster than QRDQN.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-mavrin19a,
  title = 	 {Distributional Reinforcement Learning for Efficient Exploration},
  author =       {Mavrin, Borislav and Yao, Hengshuai and Kong, Linglong and Wu, Kaiwen and Yu, Yaoliang},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {4424--4434},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/mavrin19a/mavrin19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/mavrin19a.html},
  abstract = 	 {In distributional reinforcement learning (RL), the estimated distribution of value functions model both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method achieves 483 % average gain across 49 games in cumulative rewards over QR-DQN. We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves nearoptimal safety rewards twice faster than QRDQN.}
}

Endnote

%0 Conference Paper
%T Distributional Reinforcement Learning for Efficient Exploration
%A Borislav Mavrin
%A Hengshuai Yao
%A Linglong Kong
%A Kaiwen Wu
%A Yaoliang Yu
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-mavrin19a
%I PMLR
%P 4424--4434
%U https://proceedings.mlr.press/v97/mavrin19a.html
%V 97
%X In distributional reinforcement learning (RL), the estimated distribution of value functions model both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method achieves 483 % average gain across 49 games in cumulative rewards over QR-DQN. We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves nearoptimal safety rewards twice faster than QRDQN.

APA

Mavrin, B., Yao, H., Kong, L., Wu, K. & Yu, Y.. (2019). Distributional Reinforcement Learning for Efficient Exploration. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4424-4434 Available from https://proceedings.mlr.press/v97/mavrin19a.html.

Related Material

Download PDF