GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions

Kei Katsumata, Yui Iioka, Naoki Hosomi, Teruhisa Misu, Kentaro Yamada, Komei Sugiura
Proceedings of The 9th Conference on Robot Learning, PMLR 305:5195-5217, 2025.

Abstract

We focus on the task of identifying the location of target regions from a natural language instruction and a front camera image captured by a mobility. This task is challenging because it requires both existence prediction and segmentation mask generation, particularly for stuff-type target regions with ambiguous boundaries. Existing methods often underperform in handling stuff-type target regions, in addition to absent or multiple targets. To overcome these limitations, we propose GENNAV, which predicts target existence and generates segmentation masks for multiple stuff-type target regions. To evaluate GENNAV, we constructed a novel benchmark called GRiN-Drive, which includes three distinct types of samples: no-target, single-target, and multi-target. GENNAV achieved superior performance over baseline methods on standard evaluation metrics. Furthermore, we conducted real-world experiments with four automobiles operated in five geographically distinct urban areas to validate its zero-shot transfer performance. In these experiments, GENNAV outperformed baseline methods and demonstrated its robustness across diverse real-world environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-katsumata25a, title = {GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions}, author = {Katsumata, Kei and Iioka, Yui and Hosomi, Naoki and Misu, Teruhisa and Yamada, Kentaro and Sugiura, Komei}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {5195--5217}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/katsumata25a/katsumata25a.pdf}, url = {https://proceedings.mlr.press/v305/katsumata25a.html}, abstract = {We focus on the task of identifying the location of target regions from a natural language instruction and a front camera image captured by a mobility. This task is challenging because it requires both existence prediction and segmentation mask generation, particularly for stuff-type target regions with ambiguous boundaries. Existing methods often underperform in handling stuff-type target regions, in addition to absent or multiple targets. To overcome these limitations, we propose GENNAV, which predicts target existence and generates segmentation masks for multiple stuff-type target regions. To evaluate GENNAV, we constructed a novel benchmark called GRiN-Drive, which includes three distinct types of samples: no-target, single-target, and multi-target. GENNAV achieved superior performance over baseline methods on standard evaluation metrics. Furthermore, we conducted real-world experiments with four automobiles operated in five geographically distinct urban areas to validate its zero-shot transfer performance. In these experiments, GENNAV outperformed baseline methods and demonstrated its robustness across diverse real-world environments.} }
Endnote
%0 Conference Paper %T GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions %A Kei Katsumata %A Yui Iioka %A Naoki Hosomi %A Teruhisa Misu %A Kentaro Yamada %A Komei Sugiura %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-katsumata25a %I PMLR %P 5195--5217 %U https://proceedings.mlr.press/v305/katsumata25a.html %V 305 %X We focus on the task of identifying the location of target regions from a natural language instruction and a front camera image captured by a mobility. This task is challenging because it requires both existence prediction and segmentation mask generation, particularly for stuff-type target regions with ambiguous boundaries. Existing methods often underperform in handling stuff-type target regions, in addition to absent or multiple targets. To overcome these limitations, we propose GENNAV, which predicts target existence and generates segmentation masks for multiple stuff-type target regions. To evaluate GENNAV, we constructed a novel benchmark called GRiN-Drive, which includes three distinct types of samples: no-target, single-target, and multi-target. GENNAV achieved superior performance over baseline methods on standard evaluation metrics. Furthermore, we conducted real-world experiments with four automobiles operated in five geographically distinct urban areas to validate its zero-shot transfer performance. In these experiments, GENNAV outperformed baseline methods and demonstrated its robustness across diverse real-world environments.
APA
Katsumata, K., Iioka, Y., Hosomi, N., Misu, T., Yamada, K. & Sugiura, K.. (2025). GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:5195-5217 Available from https://proceedings.mlr.press/v305/katsumata25a.html.

Related Material