LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Video

Noriaki Hirose, Catherine Glossop, Ajay Sridhar, Oier Mees, Sergey Levine
Proceedings of The 8th Conference on Robot Learning, PMLR 270:666-688, 2025.

Abstract

We present our method, LeLaN, which uses action-free egocentric data to learn robust language-conditioned object navigation. By leveraging the knowledge of large vision and language models and grounding this knowledge using pre-trained segmentation and depth estimation models, we can label in-the-wild data from a variety of indoor and outdoor environments with diverse instructions that capture a range of objects with varied granularity and noise in their descriptions. Leveraging this method to label over 50 hours of data collected in indoor and outdoor environments, including robot observations, YouTube video tours, and human-collected walking data allows us to train a policy that can outperform state-of-the-art methods on the zero-shot object navigation task in both success rate and precision.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-hirose25b, title = {LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Video}, author = {Hirose, Noriaki and Glossop, Catherine and Sridhar, Ajay and Mees, Oier and Levine, Sergey}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {666--688}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/hirose25b/hirose25b.pdf}, url = {https://proceedings.mlr.press/v270/hirose25b.html}, abstract = {We present our method, LeLaN, which uses action-free egocentric data to learn robust language-conditioned object navigation. By leveraging the knowledge of large vision and language models and grounding this knowledge using pre-trained segmentation and depth estimation models, we can label in-the-wild data from a variety of indoor and outdoor environments with diverse instructions that capture a range of objects with varied granularity and noise in their descriptions. Leveraging this method to label over 50 hours of data collected in indoor and outdoor environments, including robot observations, YouTube video tours, and human-collected walking data allows us to train a policy that can outperform state-of-the-art methods on the zero-shot object navigation task in both success rate and precision.} }
Endnote
%0 Conference Paper %T LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Video %A Noriaki Hirose %A Catherine Glossop %A Ajay Sridhar %A Oier Mees %A Sergey Levine %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-hirose25b %I PMLR %P 666--688 %U https://proceedings.mlr.press/v270/hirose25b.html %V 270 %X We present our method, LeLaN, which uses action-free egocentric data to learn robust language-conditioned object navigation. By leveraging the knowledge of large vision and language models and grounding this knowledge using pre-trained segmentation and depth estimation models, we can label in-the-wild data from a variety of indoor and outdoor environments with diverse instructions that capture a range of objects with varied granularity and noise in their descriptions. Leveraging this method to label over 50 hours of data collected in indoor and outdoor environments, including robot observations, YouTube video tours, and human-collected walking data allows us to train a policy that can outperform state-of-the-art methods on the zero-shot object navigation task in both success rate and precision.
APA
Hirose, N., Glossop, C., Sridhar, A., Mees, O. & Levine, S.. (2025). LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Video. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:666-688 Available from https://proceedings.mlr.press/v270/hirose25b.html.

Related Material