Exploring Vision Semantic Prompt for Efficient Point Cloud Understanding

Yixin Zha, Chuxin Wang, Wenfei Yang, Tianzhu Zhang, Feng Wu
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:74271-74287, 2025.

Abstract

A series of pre-trained models have demonstrated promising results in point cloud understanding tasks and are widely applied to downstream tasks through fine-tuning. However, full fine-tuning leads to the forgetting of pretrained knowledge and substantial storage costs on edge devices. To address these issues, Parameter-Efficient Transfer Learning (PETL) methods have been proposed. According to our analysis, we find that existing 3D PETL methods cannot adequately align with semantic relationships of features required by downstream tasks, resulting in suboptimal performance. To ensure parameter efficiency while introducing rich semantic cues, we propose a novel fine-tuning paradigm for 3D pre-trained models. We utilize frozen 2D pre-trained models to provide vision semantic prompts and design a new Hybrid Attention Adapter to efficiently fuse 2D semantic cues into 3D representations with minimal trainable parameters(1.8M). Extensive experiments conducted on datasets including ScanObjectNN, ModelNet40, and ShapeNetPart demonstrate the effectiveness of our proposed paradigm. In particular, our method achieves 95.6% accuracy on ModelNet40 and attains 90.09% performance on the most challenging classification split ScanObjectNN(PB-T50-RS).

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zha25a, title = {Exploring Vision Semantic Prompt for Efficient Point Cloud Understanding}, author = {Zha, Yixin and Wang, Chuxin and Yang, Wenfei and Zhang, Tianzhu and Wu, Feng}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {74271--74287}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zha25a/zha25a.pdf}, url = {https://proceedings.mlr.press/v267/zha25a.html}, abstract = {A series of pre-trained models have demonstrated promising results in point cloud understanding tasks and are widely applied to downstream tasks through fine-tuning. However, full fine-tuning leads to the forgetting of pretrained knowledge and substantial storage costs on edge devices. To address these issues, Parameter-Efficient Transfer Learning (PETL) methods have been proposed. According to our analysis, we find that existing 3D PETL methods cannot adequately align with semantic relationships of features required by downstream tasks, resulting in suboptimal performance. To ensure parameter efficiency while introducing rich semantic cues, we propose a novel fine-tuning paradigm for 3D pre-trained models. We utilize frozen 2D pre-trained models to provide vision semantic prompts and design a new Hybrid Attention Adapter to efficiently fuse 2D semantic cues into 3D representations with minimal trainable parameters(1.8M). Extensive experiments conducted on datasets including ScanObjectNN, ModelNet40, and ShapeNetPart demonstrate the effectiveness of our proposed paradigm. In particular, our method achieves 95.6% accuracy on ModelNet40 and attains 90.09% performance on the most challenging classification split ScanObjectNN(PB-T50-RS).} }
Endnote
%0 Conference Paper %T Exploring Vision Semantic Prompt for Efficient Point Cloud Understanding %A Yixin Zha %A Chuxin Wang %A Wenfei Yang %A Tianzhu Zhang %A Feng Wu %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zha25a %I PMLR %P 74271--74287 %U https://proceedings.mlr.press/v267/zha25a.html %V 267 %X A series of pre-trained models have demonstrated promising results in point cloud understanding tasks and are widely applied to downstream tasks through fine-tuning. However, full fine-tuning leads to the forgetting of pretrained knowledge and substantial storage costs on edge devices. To address these issues, Parameter-Efficient Transfer Learning (PETL) methods have been proposed. According to our analysis, we find that existing 3D PETL methods cannot adequately align with semantic relationships of features required by downstream tasks, resulting in suboptimal performance. To ensure parameter efficiency while introducing rich semantic cues, we propose a novel fine-tuning paradigm for 3D pre-trained models. We utilize frozen 2D pre-trained models to provide vision semantic prompts and design a new Hybrid Attention Adapter to efficiently fuse 2D semantic cues into 3D representations with minimal trainable parameters(1.8M). Extensive experiments conducted on datasets including ScanObjectNN, ModelNet40, and ShapeNetPart demonstrate the effectiveness of our proposed paradigm. In particular, our method achieves 95.6% accuracy on ModelNet40 and attains 90.09% performance on the most challenging classification split ScanObjectNN(PB-T50-RS).
APA
Zha, Y., Wang, C., Yang, W., Zhang, T. & Wu, F.. (2025). Exploring Vision Semantic Prompt for Efficient Point Cloud Understanding. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:74271-74287 Available from https://proceedings.mlr.press/v267/zha25a.html.

Related Material