Zero-shot object counting with visual feature extraction and language-guidance

Gaoxin Ma; Zhenfeng Zhu

Zero-shot object counting with visual feature extraction and language-guidance

Gaoxin Ma, Zhenfeng Zhu

Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:23-29, 2025.

Abstract

Zero - Shot Object Counting (ZSC) focuses on counting objects of any class in a query image without the need for user - supplied exemplars. Recently, ZSC has attracted growing interest because of its broad applicability and higher efficiency when contrasted with Few - Shot Object Counting (FSC). Different from FSC, a significant problem in existing ZSC methods is their failure to efficiently recognize high - quality exemplar features. In this paper, we propose a Zero-Shot Object Counting network with Visual Feature Extraction and Language-Guidance (VELG). Through the visual feature extraction module, we progressively fuse the scale and geometric information of the exemplars. Meanwhile, we introduce a language-guidance module that helps the exemplar learn informative image-level visual representations and refine the exemplar features using Contrastive Language-Image Pre-training. Extensive experiments on the FSC147 and CARPK datasets verify the accuracy and strong generalizability of the proposed approach.

Cite this Paper

BibTeX

@InProceedings{pmlr-v278-ma25a,
  title = 	 {Zero-shot object counting with visual feature extraction and language-guidance},
  author =       {Ma, Gaoxin and Zhu, Zhenfeng},
  booktitle = 	 {Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing},
  pages = 	 {23--29},
  year = 	 {2025},
  editor = 	 {Zeng, Nianyin and Pachori, Ram Bilas and Wang, Dongshu},
  volume = 	 {278},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v278/main/assets/ma25a/ma25a.pdf},
  url = 	 {https://proceedings.mlr.press/v278/ma25a.html},
  abstract = 	 {Zero - Shot Object Counting (ZSC) focuses on counting objects of any class in a query image without the need for user - supplied exemplars. Recently, ZSC has attracted growing interest because of its broad applicability and higher efficiency when contrasted with Few - Shot Object Counting (FSC). Different from FSC, a significant problem in existing ZSC methods is their failure to efficiently recognize high - quality exemplar features. In this paper, we propose a Zero-Shot Object Counting network with Visual Feature Extraction and Language-Guidance (VELG). Through the visual feature extraction module, we progressively fuse the scale and geometric information of the exemplars. Meanwhile, we introduce a language-guidance module that helps the exemplar learn informative image-level visual representations and refine the exemplar features using Contrastive Language-Image Pre-training. Extensive experiments on the FSC147 and CARPK datasets verify the accuracy and strong generalizability of the proposed approach.}
}

Endnote

%0 Conference Paper
%T Zero-shot object counting with visual feature extraction and language-guidance
%A Gaoxin Ma
%A Zhenfeng Zhu
%B Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing
%C Proceedings of Machine Learning Research
%D 2025
%E Nianyin Zeng
%E Ram Bilas Pachori
%E Dongshu Wang	
%F pmlr-v278-ma25a
%I PMLR
%P 23--29
%U https://proceedings.mlr.press/v278/ma25a.html
%V 278
%X Zero - Shot Object Counting (ZSC) focuses on counting objects of any class in a query image without the need for user - supplied exemplars. Recently, ZSC has attracted growing interest because of its broad applicability and higher efficiency when contrasted with Few - Shot Object Counting (FSC). Different from FSC, a significant problem in existing ZSC methods is their failure to efficiently recognize high - quality exemplar features. In this paper, we propose a Zero-Shot Object Counting network with Visual Feature Extraction and Language-Guidance (VELG). Through the visual feature extraction module, we progressively fuse the scale and geometric information of the exemplars. Meanwhile, we introduce a language-guidance module that helps the exemplar learn informative image-level visual representations and refine the exemplar features using Contrastive Language-Image Pre-training. Extensive experiments on the FSC147 and CARPK datasets verify the accuracy and strong generalizability of the proposed approach.

APA

Ma, G. & Zhu, Z.. (2025). Zero-shot object counting with visual feature extraction and language-guidance. Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, in Proceedings of Machine Learning Research 278:23-29 Available from https://proceedings.mlr.press/v278/ma25a.html.

Related Material

Download PDF