Bayesian RL for Goal-Only Rewards

Philippe Morere; Fabio Ramos

Bayesian RL for Goal-Only Rewards

Philippe Morere, Fabio Ramos

Proceedings of The 2nd Conference on Robot Learning, PMLR 87:386-398, 2018.

Abstract

We address the challenging problem of reinforcement learning under goal-only rewards [1], where rewards are only non-zero when the goal is achieved. This reward definition alleviates the need for cumbersome reward engineering, making the reward formulation trivial. Classic exploration heuristics such as Boltzmann or epsilon-greedy exploration are highly inefficient in domains with goal-only rewards. We solve this problem by leveraging value function posterior variance information to direct exploration where uncertainty is higher. The proposed algorithm (EMU-Q) achieves data-efficient exploration, and balances exploration and exploitation explicitly at a policy level granting users more control over the learning process. We introduce general features approximating kernels, allowing to greatly reduce the algorithm complexity from O(N^3) in the number of transitions to O(M^2) in the number of features. We demonstrate EMU-Q is competitive with other exploration techniques on a variety of continuous control tasks and on a robotic manipulator.

Cite this Paper

BibTeX


@InProceedings{pmlr-v87-morere18a,
  title = 	 {Bayesian RL for Goal-Only Rewards},
  author =       {Morere, Philippe and Ramos, Fabio},
  booktitle = 	 {Proceedings of The 2nd Conference on Robot Learning},
  pages = 	 {386--398},
  year = 	 {2018},
  editor = 	 {Billard, Aude and Dragan, Anca and Peters, Jan and Morimoto, Jun},
  volume = 	 {87},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29--31 Oct},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v87/morere18a/morere18a.pdf},
  url = 	 {https://proceedings.mlr.press/v87/morere18a.html},
  abstract = 	 {We address the challenging problem of reinforcement learning under goal-only rewards [1], where rewards are only non-zero when the goal is achieved. This reward definition alleviates the need for cumbersome reward engineering, making the reward formulation trivial. Classic exploration heuristics such as Boltzmann or epsilon-greedy exploration are highly inefficient in domains with goal-only rewards. We solve this problem by leveraging value function posterior variance information to direct exploration where uncertainty is higher. The proposed algorithm (EMU-Q) achieves data-efficient exploration, and balances exploration and exploitation explicitly at a policy level granting users more control over the learning process. We introduce general features approximating kernels, allowing to greatly reduce the algorithm complexity from O(N^3) in the number of transitions to O(M^2) in the number of features. We demonstrate EMU-Q is competitive with other exploration techniques on a variety of continuous control tasks and on a robotic manipulator. }
}

Endnote

%0 Conference Paper
%T Bayesian RL for Goal-Only Rewards
%A Philippe Morere
%A Fabio Ramos
%B Proceedings of The 2nd Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Aude Billard
%E Anca Dragan
%E Jan Peters
%E Jun Morimoto	
%F pmlr-v87-morere18a
%I PMLR
%P 386--398
%U https://proceedings.mlr.press/v87/morere18a.html
%V 87
%X We address the challenging problem of reinforcement learning under goal-only rewards [1], where rewards are only non-zero when the goal is achieved. This reward definition alleviates the need for cumbersome reward engineering, making the reward formulation trivial. Classic exploration heuristics such as Boltzmann or epsilon-greedy exploration are highly inefficient in domains with goal-only rewards. We solve this problem by leveraging value function posterior variance information to direct exploration where uncertainty is higher. The proposed algorithm (EMU-Q) achieves data-efficient exploration, and balances exploration and exploitation explicitly at a policy level granting users more control over the learning process. We introduce general features approximating kernels, allowing to greatly reduce the algorithm complexity from O(N^3) in the number of transitions to O(M^2) in the number of features. We demonstrate EMU-Q is competitive with other exploration techniques on a variety of continuous control tasks and on a robotic manipulator.

APA


Morere, P. & Ramos, F.. (2018). Bayesian RL for Goal-Only Rewards. Proceedings of The 2nd Conference on Robot Learning, in Proceedings of Machine Learning Research 87:386-398 Available from https://proceedings.mlr.press/v87/morere18a.html.

Bayesian RL for Goal-Only Rewards

Abstract

Cite this Paper

Related Material