Zero-shot object counting with visual feature extraction and language-guidance

Gaoxin Ma, Zhenfeng Zhu
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:23-29, 2025.

Abstract

Zero - Shot Object Counting (ZSC) focuses on counting objects of any class in a query image without the need for user - supplied exemplars. Recently, ZSC has attracted growing interest because of its broad applicability and higher efficiency when contrasted with Few - Shot Object Counting (FSC). Different from FSC, a significant problem in existing ZSC methods is their failure to efficiently recognize high - quality exemplar features. In this paper, we propose a Zero-Shot Object Counting network with Visual Feature Extraction and Language-Guidance (VELG). Through the visual feature extraction module, we progressively fuse the scale and geometric information of the exemplars. Meanwhile, we introduce a language-guidance module that helps the exemplar learn informative image-level visual representations and refine the exemplar features using Contrastive Language-Image Pre-training. Extensive experiments on the FSC147 and CARPK datasets verify the accuracy and strong generalizability of the proposed approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v278-ma25a, title = {Zero-shot object counting with visual feature extraction and language-guidance}, author = {Ma, Gaoxin and Zhu, Zhenfeng}, booktitle = {Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing}, pages = {23--29}, year = {2025}, editor = {Zeng, Nianyin and Pachori, Ram Bilas and Wang, Dongshu}, volume = {278}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v278/main/assets/ma25a/ma25a.pdf}, url = {https://proceedings.mlr.press/v278/ma25a.html}, abstract = {Zero - Shot Object Counting (ZSC) focuses on counting objects of any class in a query image without the need for user - supplied exemplars. Recently, ZSC has attracted growing interest because of its broad applicability and higher efficiency when contrasted with Few - Shot Object Counting (FSC). Different from FSC, a significant problem in existing ZSC methods is their failure to efficiently recognize high - quality exemplar features. In this paper, we propose a Zero-Shot Object Counting network with Visual Feature Extraction and Language-Guidance (VELG). Through the visual feature extraction module, we progressively fuse the scale and geometric information of the exemplars. Meanwhile, we introduce a language-guidance module that helps the exemplar learn informative image-level visual representations and refine the exemplar features using Contrastive Language-Image Pre-training. Extensive experiments on the FSC147 and CARPK datasets verify the accuracy and strong generalizability of the proposed approach.} }
Endnote
%0 Conference Paper %T Zero-shot object counting with visual feature extraction and language-guidance %A Gaoxin Ma %A Zhenfeng Zhu %B Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing %C Proceedings of Machine Learning Research %D 2025 %E Nianyin Zeng %E Ram Bilas Pachori %E Dongshu Wang %F pmlr-v278-ma25a %I PMLR %P 23--29 %U https://proceedings.mlr.press/v278/ma25a.html %V 278 %X Zero - Shot Object Counting (ZSC) focuses on counting objects of any class in a query image without the need for user - supplied exemplars. Recently, ZSC has attracted growing interest because of its broad applicability and higher efficiency when contrasted with Few - Shot Object Counting (FSC). Different from FSC, a significant problem in existing ZSC methods is their failure to efficiently recognize high - quality exemplar features. In this paper, we propose a Zero-Shot Object Counting network with Visual Feature Extraction and Language-Guidance (VELG). Through the visual feature extraction module, we progressively fuse the scale and geometric information of the exemplars. Meanwhile, we introduce a language-guidance module that helps the exemplar learn informative image-level visual representations and refine the exemplar features using Contrastive Language-Image Pre-training. Extensive experiments on the FSC147 and CARPK datasets verify the accuracy and strong generalizability of the proposed approach.
APA
Ma, G. & Zhu, Z.. (2025). Zero-shot object counting with visual feature extraction and language-guidance. Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, in Proceedings of Machine Learning Research 278:23-29 Available from https://proceedings.mlr.press/v278/ma25a.html.

Related Material