Discovering Options for Exploration by Minimizing Cover Time

Yuu Jinnai; Jee Won Park; David Abel; George Konidaris

Discovering Options for Exploration by Minimizing Cover Time

Yuu Jinnai, Jee Won Park, David Abel, George Konidaris

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3130-3139, 2019.

Abstract

One of the main challenges in reinforcement learning is solving tasks with sparse reward. We show that the difficulty of discovering a distant rewarding state in an MDP is bounded by the expected cover time of a random walk over the graph induced by the MDP’s transition dynamics. We therefore propose to accelerate exploration by constructing options that minimize cover time. We introduce a new option discovery algorithm that diminishes the expected cover time by connecting the most distant states in the state-space graph with options. We show empirically that the proposed algorithm improves learning in several domains with sparse rewards.

Cite this Paper

BibTeX


@InProceedings{pmlr-v97-jinnai19b,
  title = 	 {Discovering Options for Exploration by Minimizing Cover Time},
  author =       {Jinnai, Yuu and Park, Jee Won and Abel, David and Konidaris, George},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {3130--3139},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/jinnai19b/jinnai19b.pdf},
  url = 	 {https://proceedings.mlr.press/v97/jinnai19b.html},
  abstract = 	 {One of the main challenges in reinforcement learning is solving tasks with sparse reward. We show that the difficulty of discovering a distant rewarding state in an MDP is bounded by the expected cover time of a random walk over the graph induced by the MDP’s transition dynamics. We therefore propose to accelerate exploration by constructing options that minimize cover time. We introduce a new option discovery algorithm that diminishes the expected cover time by connecting the most distant states in the state-space graph with options. We show empirically that the proposed algorithm improves learning in several domains with sparse rewards.}
}

Endnote

%0 Conference Paper
%T Discovering Options for Exploration by Minimizing Cover Time
%A Yuu Jinnai
%A Jee Won Park
%A David Abel
%A George Konidaris
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-jinnai19b
%I PMLR
%P 3130--3139
%U https://proceedings.mlr.press/v97/jinnai19b.html
%V 97
%X One of the main challenges in reinforcement learning is solving tasks with sparse reward. We show that the difficulty of discovering a distant rewarding state in an MDP is bounded by the expected cover time of a random walk over the graph induced by the MDP’s transition dynamics. We therefore propose to accelerate exploration by constructing options that minimize cover time. We introduce a new option discovery algorithm that diminishes the expected cover time by connecting the most distant states in the state-space graph with options. We show empirically that the proposed algorithm improves learning in several domains with sparse rewards.

APA


Jinnai, Y., Park, J.W., Abel, D. & Konidaris, G.. (2019). Discovering Options for Exploration by Minimizing Cover Time. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3130-3139 Available from https://proceedings.mlr.press/v97/jinnai19b.html.

Discovering Options for Exploration by Minimizing Cover Time

Abstract

Cite this Paper

Related Material