Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning

Jiaqi Cheng, Mingfeng Fan, Xuefeng Zhang, Jingsong Liang, Yuhong Cao, Guohua Wu, Guillaume Adrien Sartoretti
Proceedings of The 9th Conference on Robot Learning, PMLR 305:1562-1575, 2025.

Abstract

Effective and efficient task planning is essential for mobile robots, especially in applications like warehouse retrieval and environmental monitoring. These tasks often involve selecting one location from each of several target clusters, forming a Generalized Traveling Salesman Problem (GTSP) that remains challenging to solve both accurately and efficiently. To address this, we propose a Multimodal Fused Learning (MMFL) framework that leverages both graph and image-based representations to capture complementary aspects of the problem, and learns a policy capable of generating high-quality task planning schemes in real time. Specifically, we first introduce a coordinate-based image builder that transforms GTSP instances into spatially informative representations. We then design an adaptive resolution scaling strategy to enhance adaptability across different problem scales, and develop a multimodal fusion module with dedicated bottlenecks that enables effective integration of geometric and spatial features. Extensive experiments show that our MMFL approach significantly outperforms state-of-the-art methods across various GTSP instances while maintaining the computational efficiency required for real-time robotic applications. Physical robot tests further validate its practical effectiveness in real-world scenarios.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-cheng25a, title = {Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning}, author = {Cheng, Jiaqi and Fan, Mingfeng and Zhang, Xuefeng and Liang, Jingsong and Cao, Yuhong and Wu, Guohua and Sartoretti, Guillaume Adrien}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {1562--1575}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/cheng25a/cheng25a.pdf}, url = {https://proceedings.mlr.press/v305/cheng25a.html}, abstract = {Effective and efficient task planning is essential for mobile robots, especially in applications like warehouse retrieval and environmental monitoring. These tasks often involve selecting one location from each of several target clusters, forming a Generalized Traveling Salesman Problem (GTSP) that remains challenging to solve both accurately and efficiently. To address this, we propose a Multimodal Fused Learning (MMFL) framework that leverages both graph and image-based representations to capture complementary aspects of the problem, and learns a policy capable of generating high-quality task planning schemes in real time. Specifically, we first introduce a coordinate-based image builder that transforms GTSP instances into spatially informative representations. We then design an adaptive resolution scaling strategy to enhance adaptability across different problem scales, and develop a multimodal fusion module with dedicated bottlenecks that enables effective integration of geometric and spatial features. Extensive experiments show that our MMFL approach significantly outperforms state-of-the-art methods across various GTSP instances while maintaining the computational efficiency required for real-time robotic applications. Physical robot tests further validate its practical effectiveness in real-world scenarios.} }
Endnote
%0 Conference Paper %T Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning %A Jiaqi Cheng %A Mingfeng Fan %A Xuefeng Zhang %A Jingsong Liang %A Yuhong Cao %A Guohua Wu %A Guillaume Adrien Sartoretti %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-cheng25a %I PMLR %P 1562--1575 %U https://proceedings.mlr.press/v305/cheng25a.html %V 305 %X Effective and efficient task planning is essential for mobile robots, especially in applications like warehouse retrieval and environmental monitoring. These tasks often involve selecting one location from each of several target clusters, forming a Generalized Traveling Salesman Problem (GTSP) that remains challenging to solve both accurately and efficiently. To address this, we propose a Multimodal Fused Learning (MMFL) framework that leverages both graph and image-based representations to capture complementary aspects of the problem, and learns a policy capable of generating high-quality task planning schemes in real time. Specifically, we first introduce a coordinate-based image builder that transforms GTSP instances into spatially informative representations. We then design an adaptive resolution scaling strategy to enhance adaptability across different problem scales, and develop a multimodal fusion module with dedicated bottlenecks that enables effective integration of geometric and spatial features. Extensive experiments show that our MMFL approach significantly outperforms state-of-the-art methods across various GTSP instances while maintaining the computational efficiency required for real-time robotic applications. Physical robot tests further validate its practical effectiveness in real-world scenarios.
APA
Cheng, J., Fan, M., Zhang, X., Liang, J., Cao, Y., Wu, G. & Sartoretti, G.A.. (2025). Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:1562-1575 Available from https://proceedings.mlr.press/v305/cheng25a.html.

Related Material