Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

Junhyuk Oh; Satinder Singh; Honglak Lee; Pushmeet Kohli

Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:2661-2670, 2017.

Abstract

As a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), we introduce a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks. In this problem, we consider two types of generalizations: to previously unseen instructions and to longer sequences of instructions. For generalization over unseen instructions, we propose a new objective which encourages learning correspondences between similar subtasks by making analogies. For generalization over sequential instructions, we present a hierarchical architecture where a meta controller learns to use the acquired skills for executing the instructions. To deal with delayed reward, we propose a new neural architecture in the meta controller that learns when to update the subtask, which makes learning more efficient. Experimental results on a stochastic 3D domain show that the proposed ideas are crucial for generalization to longer instructions as well as unseen instructions.

Cite this Paper

BibTeX


@InProceedings{pmlr-v70-oh17a,
  title = 	 {Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning},
  author =       {Junhyuk Oh and Satinder Singh and Honglak Lee and Pushmeet Kohli},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {2661--2670},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/oh17a/oh17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/oh17a.html},
  abstract = 	 {As a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), we introduce a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks. In this problem, we consider two types of generalizations: to previously unseen instructions and to longer sequences of instructions. For generalization over unseen instructions, we propose a new objective which encourages learning correspondences between similar subtasks by making analogies. For generalization over sequential instructions, we present a hierarchical architecture where a meta controller learns to use the acquired skills for executing the instructions. To deal with delayed reward, we propose a new neural architecture in the meta controller that learns when to update the subtask, which makes learning more efficient. Experimental results on a stochastic 3D domain show that the proposed ideas are crucial for generalization to longer instructions as well as unseen instructions.}
}

Endnote

%0 Conference Paper
%T Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
%A Junhyuk Oh
%A Satinder Singh
%A Honglak Lee
%A Pushmeet Kohli
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-oh17a
%I PMLR
%P 2661--2670
%U https://proceedings.mlr.press/v70/oh17a.html
%V 70
%X As a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), we introduce a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks. In this problem, we consider two types of generalizations: to previously unseen instructions and to longer sequences of instructions. For generalization over unseen instructions, we propose a new objective which encourages learning correspondences between similar subtasks by making analogies. For generalization over sequential instructions, we present a hierarchical architecture where a meta controller learns to use the acquired skills for executing the instructions. To deal with delayed reward, we propose a new neural architecture in the meta controller that learns when to update the subtask, which makes learning more efficient. Experimental results on a stochastic 3D domain show that the proposed ideas are crucial for generalization to longer instructions as well as unseen instructions.

APA


Oh, J., Singh, S., Lee, H. & Kohli, P.. (2017). Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:2661-2670 Available from https://proceedings.mlr.press/v70/oh17a.html.

Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

Abstract

Cite this Paper

Related Material