Learning Latent Plans from Play

Corey Lynch; Mohi Khansari; Ted Xiao; Vikash Kumar; Jonathan Tompson; Sergey Levine; Pierre Sermanet

Learning Latent Plans from Play

Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet

Proceedings of the Conference on Robot Learning, PMLR 100:1113-1132, 2020.

Abstract

Acquiring a diverse repertoire of general-purpose skills remains an open challenge for robotics. In this work, we propose self-supervising control on top of human teleoperated play data as a way to scale up skill learning. Play has two properties that make it attractive compared to conventional task demonstrations. Play is cheap, as it can be collected in large quantities quickly without task segmenting, labeling, or resetting to an initial state. Play is naturally rich, covering ∼4x more interaction space than task demonstrations for the same amount of collection time. To learn control from play, we introduce Play-LMP, a self-supervised method that learns to organize play behaviors in a latent space, then reuse them at test time to achieve specific goals. Combining self-supervised control with a diverse play dataset shifts the focus of skill learning from a narrow and discrete set of tasks to the full continuum of behaviors available in an environment. We find that this combination generalizes well empirically—after self-supervising on unlabeled play, our method substantially outperforms individual expert-trained policies on 18 difficult user-specified visual manipulation tasks in a simulated robotic tabletop environment. We additionally find that play-supervised models, unlike their expert-trained counterparts, are more robust to perturbations and exhibit retrying-till-success behaviors. Finally, we find that our agent organizes its latent plan space around functional tasks, despite never being trained with task labels. Videos, code and data are available at learning-from-play.github.io

Cite this Paper

BibTeX

@InProceedings{pmlr-v100-lynch20a,
  title = 	 {Learning Latent Plans from Play},
  author =       {Lynch, Corey and Khansari, Mohi and Xiao, Ted and Kumar, Vikash and Tompson, Jonathan and Levine, Sergey and Sermanet, Pierre},
  booktitle = 	 {Proceedings of the Conference on Robot Learning},
  pages = 	 {1113--1132},
  year = 	 {2020},
  editor = 	 {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei},
  volume = 	 {100},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Oct--01 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v100/lynch20a/lynch20a.pdf},
  url = 	 {https://proceedings.mlr.press/v100/lynch20a.html},
  abstract = 	 {Acquiring a diverse repertoire of general-purpose skills remains an open challenge for robotics. In this work, we propose self-supervising control on top of human teleoperated play data as a way to scale up skill learning. Play has two properties that make it attractive compared to conventional task demonstrations. Play is cheap, as it can be collected in large quantities quickly without task segmenting, labeling, or resetting to an initial state. Play is naturally rich, covering ∼4x more interaction space than task demonstrations for the same amount of collection time. To learn control from play, we introduce Play-LMP, a self-supervised method that learns to organize play behaviors in a latent space, then reuse them at test time to achieve specific goals. Combining self-supervised control with a diverse play dataset shifts the focus of skill learning from a narrow and discrete set of tasks to the full continuum of behaviors available in an environment. We find that this combination generalizes well empirically—after self-supervising on unlabeled play, our method substantially outperforms individual expert-trained policies on 18 difficult user-specified visual manipulation tasks in a simulated robotic tabletop environment. We additionally find that play-supervised models, unlike their expert-trained counterparts, are more robust to perturbations and exhibit retrying-till-success behaviors. Finally, we find that our agent organizes its latent plan space around functional tasks, despite never being trained with task labels. Videos, code and data are available at learning-from-play.github.io}
}

Endnote

%0 Conference Paper
%T Learning Latent Plans from Play
%A Corey Lynch
%A Mohi Khansari
%A Ted Xiao
%A Vikash Kumar
%A Jonathan Tompson
%A Sergey Levine
%A Pierre Sermanet
%B Proceedings of the Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Leslie Pack Kaelbling
%E Danica Kragic
%E Komei Sugiura	
%F pmlr-v100-lynch20a
%I PMLR
%P 1113--1132
%U https://proceedings.mlr.press/v100/lynch20a.html
%V 100
%X Acquiring a diverse repertoire of general-purpose skills remains an open challenge for robotics. In this work, we propose self-supervising control on top of human teleoperated play data as a way to scale up skill learning. Play has two properties that make it attractive compared to conventional task demonstrations. Play is cheap, as it can be collected in large quantities quickly without task segmenting, labeling, or resetting to an initial state. Play is naturally rich, covering ∼4x more interaction space than task demonstrations for the same amount of collection time. To learn control from play, we introduce Play-LMP, a self-supervised method that learns to organize play behaviors in a latent space, then reuse them at test time to achieve specific goals. Combining self-supervised control with a diverse play dataset shifts the focus of skill learning from a narrow and discrete set of tasks to the full continuum of behaviors available in an environment. We find that this combination generalizes well empirically—after self-supervising on unlabeled play, our method substantially outperforms individual expert-trained policies on 18 difficult user-specified visual manipulation tasks in a simulated robotic tabletop environment. We additionally find that play-supervised models, unlike their expert-trained counterparts, are more robust to perturbations and exhibit retrying-till-success behaviors. Finally, we find that our agent organizes its latent plan space around functional tasks, despite never being trained with task labels. Videos, code and data are available at learning-from-play.github.io

APA

Lynch, C., Khansari, M., Xiao, T., Kumar, V., Tompson, J., Levine, S. & Sermanet, P.. (2020). Learning Latent Plans from Play. Proceedings of the Conference on Robot Learning, in Proceedings of Machine Learning Research 100:1113-1132 Available from https://proceedings.mlr.press/v100/lynch20a.html.

Learning Latent Plans from Play

Abstract

Cite this Paper

Related Material