Reverse Curriculum Generation for Reinforcement Learning

Carlos Florensa; David Held; Markus Wulfmeier; Michael Zhang; Pieter Abbeel

Reverse Curriculum Generation for Reinforcement Learning

Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, Pieter Abbeel

Proceedings of the 1st Annual Conference on Robot Learning, PMLR 78:482-495, 2017.

Abstract

Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These goal-oriented tasks present a considerable challenge for reinforcement learning, since their natural reward function is sparse and prohibitive amounts of exploration are required to reach the goal and receive some learning signal. Past approaches tackle these problems by exploiting expert demonstrations or by manually designing a task-specific reward shaping function to guide the learning agent. Instead, we propose a method to learn these tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved. The robot is trained in “reverse", gradually learning to reach the goal from a set of starting positions increasingly far from the goal. Our method automatically generates a curriculum of starting positions that adapts to the agent’s performance, leading to efficient training on goal-oriented tasks. We demonstrate our approach on difficult simulated navigation and fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v78-florensa17a,
  title = 	 {Reverse Curriculum Generation for Reinforcement Learning},
  author = 	 {Florensa, Carlos and Held, David and Wulfmeier, Markus and Zhang, Michael and Abbeel, Pieter},
  booktitle = 	 {Proceedings of the 1st Annual Conference on Robot Learning},
  pages = 	 {482--495},
  year = 	 {2017},
  editor = 	 {Levine, Sergey and Vanhoucke, Vincent and Goldberg, Ken},
  volume = 	 {78},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--15 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v78/florensa17a/florensa17a.pdf},
  url = 	 {https://proceedings.mlr.press/v78/florensa17a.html},
  abstract = 	 {Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These goal-oriented tasks present a considerable challenge for reinforcement learning, since their natural reward function is sparse and prohibitive amounts of exploration are required to reach the goal and receive some learning signal. Past approaches tackle these problems by exploiting expert demonstrations or by manually designing a task-specific reward shaping function to guide the learning agent. Instead, we propose a method to learn these tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved.  The robot is trained in “reverse", gradually learning to reach the goal from a set of starting positions increasingly far from the goal. Our method automatically generates a curriculum of starting positions that adapts to the agent’s performance, leading to efficient training on goal-oriented tasks.  We demonstrate our approach on difficult simulated navigation and fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods.}
}

Endnote

%0 Conference Paper
%T Reverse Curriculum Generation for Reinforcement Learning
%A Carlos Florensa
%A David Held
%A Markus Wulfmeier
%A Michael Zhang
%A Pieter Abbeel
%B Proceedings of the 1st Annual Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Sergey Levine
%E Vincent Vanhoucke
%E Ken Goldberg	
%F pmlr-v78-florensa17a
%I PMLR
%P 482--495
%U https://proceedings.mlr.press/v78/florensa17a.html
%V 78
%X Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These goal-oriented tasks present a considerable challenge for reinforcement learning, since their natural reward function is sparse and prohibitive amounts of exploration are required to reach the goal and receive some learning signal. Past approaches tackle these problems by exploiting expert demonstrations or by manually designing a task-specific reward shaping function to guide the learning agent. Instead, we propose a method to learn these tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved.  The robot is trained in “reverse", gradually learning to reach the goal from a set of starting positions increasingly far from the goal. Our method automatically generates a curriculum of starting positions that adapts to the agent’s performance, leading to efficient training on goal-oriented tasks.  We demonstrate our approach on difficult simulated navigation and fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods.

APA


Florensa, C., Held, D., Wulfmeier, M., Zhang, M. & Abbeel, P.. (2017). Reverse Curriculum Generation for Reinforcement Learning. Proceedings of the 1st Annual Conference on Robot Learning, in Proceedings of Machine Learning Research 78:482-495 Available from https://proceedings.mlr.press/v78/florensa17a.html.

Related Material

Download PDF