Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation

Vivek Myers, Chunyuan Zheng, Oier Mees, Kuan Fang, Sergey Levine
Proceedings of The 8th Conference on Robot Learning, PMLR 270:1402-1426, 2025.

Abstract

Learned language-conditioned robot policies often struggle to effectively adapt to new real-world tasks even when pre-trained across a diverse set of instructions. We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition provided by vision-language models (VLMs). Our method, Policy Adaptation via Language Optimization (PALO), combines a handful of demonstrations of a task with proposed language decompositions sampled from a VLM to quickly enable rapid nonparametric adaptation, avoiding the need for a larger fine-tuning dataset. We evaluate PALO on extensive real-world experiments consisting of challenging unseen, long-horizon robot manipulation tasks. We find that PALO is able of consistently complete long-horizon, multi-tier tasks in the real world, outperforming state of the art pre-trained generalist policies, and methods that have access to the same demonstrations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-myers25a, title = {Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation}, author = {Myers, Vivek and Zheng, Chunyuan and Mees, Oier and Fang, Kuan and Levine, Sergey}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {1402--1426}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/myers25a/myers25a.pdf}, url = {https://proceedings.mlr.press/v270/myers25a.html}, abstract = {Learned language-conditioned robot policies often struggle to effectively adapt to new real-world tasks even when pre-trained across a diverse set of instructions. We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition provided by vision-language models (VLMs). Our method, Policy Adaptation via Language Optimization (PALO), combines a handful of demonstrations of a task with proposed language decompositions sampled from a VLM to quickly enable rapid nonparametric adaptation, avoiding the need for a larger fine-tuning dataset. We evaluate PALO on extensive real-world experiments consisting of challenging unseen, long-horizon robot manipulation tasks. We find that PALO is able of consistently complete long-horizon, multi-tier tasks in the real world, outperforming state of the art pre-trained generalist policies, and methods that have access to the same demonstrations.} }
Endnote
%0 Conference Paper %T Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation %A Vivek Myers %A Chunyuan Zheng %A Oier Mees %A Kuan Fang %A Sergey Levine %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-myers25a %I PMLR %P 1402--1426 %U https://proceedings.mlr.press/v270/myers25a.html %V 270 %X Learned language-conditioned robot policies often struggle to effectively adapt to new real-world tasks even when pre-trained across a diverse set of instructions. We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition provided by vision-language models (VLMs). Our method, Policy Adaptation via Language Optimization (PALO), combines a handful of demonstrations of a task with proposed language decompositions sampled from a VLM to quickly enable rapid nonparametric adaptation, avoiding the need for a larger fine-tuning dataset. We evaluate PALO on extensive real-world experiments consisting of challenging unseen, long-horizon robot manipulation tasks. We find that PALO is able of consistently complete long-horizon, multi-tier tasks in the real world, outperforming state of the art pre-trained generalist policies, and methods that have access to the same demonstrations.
APA
Myers, V., Zheng, C., Mees, O., Fang, K. & Levine, S.. (2025). Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:1402-1426 Available from https://proceedings.mlr.press/v270/myers25a.html.

Related Material