Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Siddhant Haldar; Vaibhav Mathur; Denis Yarats; Lerrel Pinto

Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Siddhant Haldar, Vaibhav Mathur, Denis Yarats, Lerrel Pinto

Proceedings of The 6th Conference on Robot Learning, PMLR 205:32-43, 2023.

Abstract

Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions for complex control problems. In this work, we present Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based trajectory-matching. Our key technical insight is that adaptively combining trajectory-matching rewards with behavior cloning can significantly accelerate imitation even with only a few demonstrations. Our experiments on 20 visual control tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark demonstrate an average of 7.8x faster imitation to reach 90% of expert performance compared to prior state-of-the-art methods. On real-world robotic manipulation, with just one demonstration and an hour of online training, ROT achieves an average success rate of 90.1% across 14 tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v205-haldar23a,
  title = 	 {Watch and Match: Supercharging Imitation with Regularized Optimal Transport},
  author =       {Haldar, Siddhant and Mathur, Vaibhav and Yarats, Denis and Pinto, Lerrel},
  booktitle = 	 {Proceedings of The 6th Conference on Robot Learning},
  pages = 	 {32--43},
  year = 	 {2023},
  editor = 	 {Liu, Karen and Kulic, Dana and Ichnowski, Jeff},
  volume = 	 {205},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {14--18 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v205/haldar23a/haldar23a.pdf},
  url = 	 {https://proceedings.mlr.press/v205/haldar23a.html},
  abstract = 	 {Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions for complex control problems. In this work, we present Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based trajectory-matching. Our key technical insight is that adaptively combining trajectory-matching rewards with behavior cloning can significantly accelerate imitation even with only a few demonstrations. Our experiments on 20 visual control tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark demonstrate an average of 7.8x faster imitation to reach 90% of expert performance compared to prior state-of-the-art methods. On real-world robotic manipulation, with just one demonstration and an hour of online training, ROT achieves an average success rate of 90.1% across 14 tasks.}
}

Endnote

%0 Conference Paper
%T Watch and Match: Supercharging Imitation with Regularized Optimal Transport
%A Siddhant Haldar
%A Vaibhav Mathur
%A Denis Yarats
%A Lerrel Pinto
%B Proceedings of The 6th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Karen Liu
%E Dana Kulic
%E Jeff Ichnowski	
%F pmlr-v205-haldar23a
%I PMLR
%P 32--43
%U https://proceedings.mlr.press/v205/haldar23a.html
%V 205
%X Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions for complex control problems. In this work, we present Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based trajectory-matching. Our key technical insight is that adaptively combining trajectory-matching rewards with behavior cloning can significantly accelerate imitation even with only a few demonstrations. Our experiments on 20 visual control tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark demonstrate an average of 7.8x faster imitation to reach 90% of expert performance compared to prior state-of-the-art methods. On real-world robotic manipulation, with just one demonstration and an hour of online training, ROT achieves an average success rate of 90.1% across 14 tasks.

APA


Haldar, S., Mathur, V., Yarats, D. & Pinto, L.. (2023). Watch and Match: Supercharging Imitation with Regularized Optimal Transport. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:32-43 Available from https://proceedings.mlr.press/v205/haldar23a.html.

Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Abstract

Cite this Paper

Related Material