Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

Letian Chen; Sravan Jayanthi; Rohan R Paleja; Daniel Martin; Viacheslav Zakharov; Matthew Gombolay

Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

Letian Chen, Sravan Jayanthi, Rohan R Paleja, Daniel Martin, Viacheslav Zakharov, Matthew Gombolay

Proceedings of The 6th Conference on Robot Learning, PMLR 205:2083-2094, 2023.

Abstract

Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task ($p<.05$) and personalization ($p<.05$) performance.

Cite this Paper

BibTeX


@InProceedings{pmlr-v205-chen23e,
  title = 	 {Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations},
  author =       {Chen, Letian and Jayanthi, Sravan and Paleja, Rohan R and Martin, Daniel and Zakharov, Viacheslav and Gombolay, Matthew},
  booktitle = 	 {Proceedings of The 6th Conference on Robot Learning},
  pages = 	 {2083--2094},
  year = 	 {2023},
  editor = 	 {Liu, Karen and Kulic, Dana and Ichnowski, Jeff},
  volume = 	 {205},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {14--18 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v205/chen23e/chen23e.pdf},
  url = 	 {https://proceedings.mlr.press/v205/chen23e.html},
  abstract = 	 {Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task ($p<.05$) and personalization ($p<.05$) performance. }
}

Endnote

%0 Conference Paper
%T Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations
%A Letian Chen
%A Sravan Jayanthi
%A Rohan R Paleja
%A Daniel Martin
%A Viacheslav Zakharov
%A Matthew Gombolay
%B Proceedings of The 6th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Karen Liu
%E Dana Kulic
%E Jeff Ichnowski	
%F pmlr-v205-chen23e
%I PMLR
%P 2083--2094
%U https://proceedings.mlr.press/v205/chen23e.html
%V 205
%X Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task ($p<.05$) and personalization ($p<.05$) performance.

APA


Chen, L., Jayanthi, S., Paleja, R.R., Martin, D., Zakharov, V. & Gombolay, M.. (2023). Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:2083-2094 Available from https://proceedings.mlr.press/v205/chen23e.html.

Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

Abstract

Cite this Paper

Related Material