LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Video

Noriaki Hirose; Catherine Glossop; Ajay Sridhar; Oier Mees; Sergey Levine

LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Video

Noriaki Hirose, Catherine Glossop, Ajay Sridhar, Oier Mees, Sergey Levine

Proceedings of The 8th Conference on Robot Learning, PMLR 270:666-688, 2025.

Abstract

We present our method, LeLaN, which uses action-free egocentric data to learn robust language-conditioned object navigation. By leveraging the knowledge of large vision and language models and grounding this knowledge using pre-trained segmentation and depth estimation models, we can label in-the-wild data from a variety of indoor and outdoor environments with diverse instructions that capture a range of objects with varied granularity and noise in their descriptions. Leveraging this method to label over 50 hours of data collected in indoor and outdoor environments, including robot observations, YouTube video tours, and human-collected walking data allows us to train a policy that can outperform state-of-the-art methods on the zero-shot object navigation task in both success rate and precision.

Cite this Paper

BibTeX

@InProceedings{pmlr-v270-hirose25b,
  title = 	 {LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Video},
  author =       {Hirose, Noriaki and Glossop, Catherine and Sridhar, Ajay and Mees, Oier and Levine, Sergey},
  booktitle = 	 {Proceedings of The 8th Conference on Robot Learning},
  pages = 	 {666--688},
  year = 	 {2025},
  editor = 	 {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram},
  volume = 	 {270},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v270/main/assets/hirose25b/hirose25b.pdf},
  url = 	 {https://proceedings.mlr.press/v270/hirose25b.html},
  abstract = 	 {We present our method, LeLaN, which uses action-free egocentric data to learn robust language-conditioned object navigation. By leveraging the knowledge of large vision and language models and grounding this knowledge using pre-trained segmentation and depth estimation models, we can label in-the-wild data from a variety of indoor and outdoor environments with diverse instructions that capture a range of objects with varied granularity and noise in their descriptions. Leveraging this method to label over 50 hours of data collected in indoor and outdoor environments, including robot observations, YouTube video tours, and human-collected walking data allows us to train a policy that can outperform state-of-the-art methods on the zero-shot object navigation task in both success rate and precision.}
}

Endnote

%0 Conference Paper
%T LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Video
%A Noriaki Hirose
%A Catherine Glossop
%A Ajay Sridhar
%A Oier Mees
%A Sergey Levine
%B Proceedings of The 8th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Pulkit Agrawal
%E Oliver Kroemer
%E Wolfram Burgard	
%F pmlr-v270-hirose25b
%I PMLR
%P 666--688
%U https://proceedings.mlr.press/v270/hirose25b.html
%V 270
%X We present our method, LeLaN, which uses action-free egocentric data to learn robust language-conditioned object navigation. By leveraging the knowledge of large vision and language models and grounding this knowledge using pre-trained segmentation and depth estimation models, we can label in-the-wild data from a variety of indoor and outdoor environments with diverse instructions that capture a range of objects with varied granularity and noise in their descriptions. Leveraging this method to label over 50 hours of data collected in indoor and outdoor environments, including robot observations, YouTube video tours, and human-collected walking data allows us to train a policy that can outperform state-of-the-art methods on the zero-shot object navigation task in both success rate and precision.

APA

Hirose, N., Glossop, C., Sridhar, A., Mees, O. & Levine, S.. (2025). LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Video. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:666-688 Available from https://proceedings.mlr.press/v270/hirose25b.html.

LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Video

Abstract

Cite this Paper

Related Material