What Can Learned Intrinsic Rewards Capture?

Zeyu Zheng; Junhyuk Oh; Matteo Hessel; Zhongwen Xu; Manuel Kroiss; Hado Van Hasselt; David Silver; Satinder Singh

What Can Learned Intrinsic Rewards Capture?

Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado Van Hasselt, David Silver, Satinder Singh

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:11436-11446, 2020.

Abstract

The objective of a reinforcement learning agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. These rewards are typically given and immutable. In this paper, we instead consider the proposition that the reward function itself can be a good locus of learned knowledge. To investigate this, we propose a scalable meta-gradient framework for learning useful intrinsic reward functions across multiple lifetimes of experience. Through several proof-of-concept experiments, we show that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. Furthermore, we show that unlike policy transfer methods that capture “how” the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing “what” the agent should strive to do.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-zheng20b,
  title = 	 {What Can Learned Intrinsic Rewards Capture?},
  author =       {Zheng, Zeyu and Oh, Junhyuk and Hessel, Matteo and Xu, Zhongwen and Kroiss, Manuel and Van Hasselt, Hado and Silver, David and Singh, Satinder},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {11436--11446},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/zheng20b/zheng20b.pdf},
  url = 	 {https://proceedings.mlr.press/v119/zheng20b.html},
  abstract = 	 {The objective of a reinforcement learning agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. These rewards are typically given and immutable. In this paper, we instead consider the proposition that the reward function itself can be a good locus of learned knowledge. To investigate this, we propose a scalable meta-gradient framework for learning useful intrinsic reward functions across multiple lifetimes of experience. Through several proof-of-concept experiments, we show that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. Furthermore, we show that unlike policy transfer methods that capture “how” the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing “what” the agent should strive to do.}
}

Endnote

%0 Conference Paper
%T What Can Learned Intrinsic Rewards Capture?
%A Zeyu Zheng
%A Junhyuk Oh
%A Matteo Hessel
%A Zhongwen Xu
%A Manuel Kroiss
%A Hado Van Hasselt
%A David Silver
%A Satinder Singh
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-zheng20b
%I PMLR
%P 11436--11446
%U https://proceedings.mlr.press/v119/zheng20b.html
%V 119
%X The objective of a reinforcement learning agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. These rewards are typically given and immutable. In this paper, we instead consider the proposition that the reward function itself can be a good locus of learned knowledge. To investigate this, we propose a scalable meta-gradient framework for learning useful intrinsic reward functions across multiple lifetimes of experience. Through several proof-of-concept experiments, we show that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. Furthermore, we show that unlike policy transfer methods that capture “how” the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing “what” the agent should strive to do.

APA

Zheng, Z., Oh, J., Hessel, M., Xu, Z., Kroiss, M., Van Hasselt, H., Silver, D. & Singh, S.. (2020). What Can Learned Intrinsic Rewards Capture?. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:11436-11446 Available from https://proceedings.mlr.press/v119/zheng20b.html.

What Can Learned Intrinsic Rewards Capture?

Abstract

Cite this Paper

Related Material