KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance

Jingxian Lu, Wenke Xia, Dong Wang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li
Proceedings of The 8th Conference on Robot Learning, PMLR 270:3847-3865, 2025.

Abstract

Online Imitation Learning methods struggle with the gap between extensive online exploration space and limited expert trajectories, which hinder efficient exploration due to inaccurate task-aware reward estimation. Inspired by the findings from cognitive neuroscience that task decomposition could facilitate cognitive processing for efficient learning, we hypothesize that an agent could estimate precise task-aware imitation rewards for efficient online exploration by decomposing the target task into the objectives of “what to do” and the mechanisms of “how to do”. In this work, we introduce the hybrid Key-state guided Online Imitation (KOI) learning approach, which leverages the integration of semantic and motion key states as guidance for task-aware reward estimation. Initially, we utilize the visual-language models to segment the expert trajectory into semantic key states, indicating the objectives of “what to do”. Within the intervals between semantic key states, optical flow is employed to capture motion key states to understand the process of “how to do”. By integrating a thorough grasp of both semantic and motion key states, we refine the trajectory-matching reward computation, encouraging task-aware exploration for efficient online imitation learning. Our experiment results prove that our method is more sample efficient than previous state-of-the-art approaches in the Meta-World and LIBERO environments. We also conduct real-world robotic manipulation experiments to validate the efficacy of our method, demonstrating the practical applicability of our KOI method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-lu25a, title = {KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance}, author = {Lu, Jingxian and Xia, Wenke and Wang, Dong and Wang, Zhigang and Zhao, Bin and Hu, Di and Li, Xuelong}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {3847--3865}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/lu25a/lu25a.pdf}, url = {https://proceedings.mlr.press/v270/lu25a.html}, abstract = {Online Imitation Learning methods struggle with the gap between extensive online exploration space and limited expert trajectories, which hinder efficient exploration due to inaccurate task-aware reward estimation. Inspired by the findings from cognitive neuroscience that task decomposition could facilitate cognitive processing for efficient learning, we hypothesize that an agent could estimate precise task-aware imitation rewards for efficient online exploration by decomposing the target task into the objectives of “what to do” and the mechanisms of “how to do”. In this work, we introduce the hybrid Key-state guided Online Imitation (KOI) learning approach, which leverages the integration of semantic and motion key states as guidance for task-aware reward estimation. Initially, we utilize the visual-language models to segment the expert trajectory into semantic key states, indicating the objectives of “what to do”. Within the intervals between semantic key states, optical flow is employed to capture motion key states to understand the process of “how to do”. By integrating a thorough grasp of both semantic and motion key states, we refine the trajectory-matching reward computation, encouraging task-aware exploration for efficient online imitation learning. Our experiment results prove that our method is more sample efficient than previous state-of-the-art approaches in the Meta-World and LIBERO environments. We also conduct real-world robotic manipulation experiments to validate the efficacy of our method, demonstrating the practical applicability of our KOI method.} }
Endnote
%0 Conference Paper %T KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance %A Jingxian Lu %A Wenke Xia %A Dong Wang %A Zhigang Wang %A Bin Zhao %A Di Hu %A Xuelong Li %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-lu25a %I PMLR %P 3847--3865 %U https://proceedings.mlr.press/v270/lu25a.html %V 270 %X Online Imitation Learning methods struggle with the gap between extensive online exploration space and limited expert trajectories, which hinder efficient exploration due to inaccurate task-aware reward estimation. Inspired by the findings from cognitive neuroscience that task decomposition could facilitate cognitive processing for efficient learning, we hypothesize that an agent could estimate precise task-aware imitation rewards for efficient online exploration by decomposing the target task into the objectives of “what to do” and the mechanisms of “how to do”. In this work, we introduce the hybrid Key-state guided Online Imitation (KOI) learning approach, which leverages the integration of semantic and motion key states as guidance for task-aware reward estimation. Initially, we utilize the visual-language models to segment the expert trajectory into semantic key states, indicating the objectives of “what to do”. Within the intervals between semantic key states, optical flow is employed to capture motion key states to understand the process of “how to do”. By integrating a thorough grasp of both semantic and motion key states, we refine the trajectory-matching reward computation, encouraging task-aware exploration for efficient online imitation learning. Our experiment results prove that our method is more sample efficient than previous state-of-the-art approaches in the Meta-World and LIBERO environments. We also conduct real-world robotic manipulation experiments to validate the efficacy of our method, demonstrating the practical applicability of our KOI method.
APA
Lu, J., Xia, W., Wang, D., Wang, Z., Zhao, B., Hu, D. & Li, X.. (2025). KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:3847-3865 Available from https://proceedings.mlr.press/v270/lu25a.html.

Related Material