Guided Imitation of Task and Motion Planning

Michael James McDonald; Dylan Hadfield-Menell

Guided Imitation of Task and Motion Planning

Michael James McDonald, Dylan Hadfield-Menell

Proceedings of the 5th Conference on Robot Learning, PMLR 164:630-640, 2022.

Abstract

While modern policy optimization methods can do complex manipulation from sensory data, they struggle on problems with extended time horizons and multiple sub-goals. On the other hand, task and motion planning (TAMP) methods scale to long horizons but they are computationally expensive and need to precisely track world state. We propose a method that draws on the strength of both methods: we train a policy to imitate a TAMP solver’s output. This produces a feed-forward policy that can accomplish multi-step tasks from sensory data. First, we build an asynchronous distributed TAMP solver that can produce supervision data fast enough for imitation learning. Then, we propose a hierarchical policy architecture that lets us use partially trained control policies to speed up the TAMP solver. In robotic manipulation tasks with 7-DoF joint control, the partially trained policies reduce the time needed for planning by a factor of up to 2.6. Among these tasks, we can learn a policy that solves the RoboSuite 4-object pick-place task 88% of the time from object pose observations and a policy that solves the RoboDesk 9-goal benchmark 79% of the time from RGB images (averaged across the 9 disparate tasks).

Cite this Paper

BibTeX

@InProceedings{pmlr-v164-mcdonald22a,
  title = 	 {Guided Imitation of Task and Motion Planning},
  author =       {McDonald, Michael James and Hadfield-Menell, Dylan},
  booktitle = 	 {Proceedings of the 5th Conference on Robot Learning},
  pages = 	 {630--640},
  year = 	 {2022},
  editor = 	 {Faust, Aleksandra and Hsu, David and Neumann, Gerhard},
  volume = 	 {164},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--11 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v164/mcdonald22a/mcdonald22a.pdf},
  url = 	 {https://proceedings.mlr.press/v164/mcdonald22a.html},
  abstract = 	 {While modern policy optimization methods can do complex manipulation from sensory data, they struggle on problems with extended time horizons and multiple sub-goals. On the other hand, task and motion planning (TAMP) methods scale to long horizons but they are computationally expensive and need to precisely track world state. We propose a method that draws on the strength of both methods: we train a policy to imitate a TAMP solver’s output. This produces a feed-forward policy that can accomplish multi-step tasks from sensory data. First, we build an asynchronous distributed TAMP solver that can produce supervision data fast enough for imitation learning. Then, we propose a hierarchical policy architecture that lets us use partially trained control policies to speed up the TAMP solver. In robotic manipulation tasks with 7-DoF joint control, the partially trained policies reduce the time needed for planning by a factor of up to 2.6. Among these tasks, we can learn a policy that solves the RoboSuite 4-object pick-place task 88% of the time from object pose observations and a policy that solves the RoboDesk 9-goal benchmark 79% of the time from RGB images (averaged across the 9 disparate tasks).}
}

Endnote

%0 Conference Paper
%T Guided Imitation of Task and Motion Planning
%A Michael James McDonald
%A Dylan Hadfield-Menell
%B Proceedings of the 5th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Aleksandra Faust
%E David Hsu
%E Gerhard Neumann	
%F pmlr-v164-mcdonald22a
%I PMLR
%P 630--640
%U https://proceedings.mlr.press/v164/mcdonald22a.html
%V 164
%X While modern policy optimization methods can do complex manipulation from sensory data, they struggle on problems with extended time horizons and multiple sub-goals. On the other hand, task and motion planning (TAMP) methods scale to long horizons but they are computationally expensive and need to precisely track world state. We propose a method that draws on the strength of both methods: we train a policy to imitate a TAMP solver’s output. This produces a feed-forward policy that can accomplish multi-step tasks from sensory data. First, we build an asynchronous distributed TAMP solver that can produce supervision data fast enough for imitation learning. Then, we propose a hierarchical policy architecture that lets us use partially trained control policies to speed up the TAMP solver. In robotic manipulation tasks with 7-DoF joint control, the partially trained policies reduce the time needed for planning by a factor of up to 2.6. Among these tasks, we can learn a policy that solves the RoboSuite 4-object pick-place task 88% of the time from object pose observations and a policy that solves the RoboDesk 9-goal benchmark 79% of the time from RGB images (averaged across the 9 disparate tasks).

APA

McDonald, M.J. & Hadfield-Menell, D.. (2022). Guided Imitation of Task and Motion Planning. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:630-640 Available from https://proceedings.mlr.press/v164/mcdonald22a.html.

Guided Imitation of Task and Motion Planning

Abstract

Cite this Paper

Related Material