Enhancing DETRs for Small Object Detection via Multi-Scale Refinement and Query-Aided Mining

Sisi Fu; Zhiming Chen; Xiaocheng Fang; Jieyi Cai; Huanyu Liu; Huosheng Wen; Bingzhi Chen

Enhancing DETRs for Small Object Detection via Multi-Scale Refinement and Query-Aided Mining

Sisi Fu, Zhiming Chen, Xiaocheng Fang, Jieyi Cai, Huanyu Liu, Huosheng Wen, Bingzhi Chen

Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:936-951, 2025.

Abstract

Small object detection (SOD) aims to precisely localize and accurately classify objects from limited spatial extent and discernible features. Despite significant advancements in object detection driven by CNN-based and Transformer-based methods, SOD remains a significant challenge. This is primarily due to their minimal spatial dimensions and distinct features which pose difficulties in both computational efficiency and effective supervision. Particularly, Transformer-based detectors suffer from the high computational cost caused by the introduction of a feature pyramid network (FPN) and the sparse supervision for the encoder output due to insufficient positive queries. Current approaches attempt to mitigate these issues through sparse attention mechanisms and auxiliary one-to-many label assignment strategies. However, these approaches often still suffer from inefficiencies in processing multi-scale information and a deficiency in generating adequate positive queries for small objects. To address this issue, we propose a novel small object detector MRQM, which integrates Multi-scale Refinement and Query-aided Mining. The scale-aware encoder strategically refines features across multiple scales from a bi-directional feature pyramid network (BiFPN) through iterative updates. This process not only reduces redundant computations but also significantly enhances the representation of features at various scales. Furthermore, the IoU-aware head integrates the dynamic anchors mining strategy and one-to-many label assignments to fully mine potential high-quality auxiliary positive queries for small instances, and mitigate issues related to sparse supervision for the encoder. Extensive experiments on the SODA-D and VisDrone datasets consistently demonstrate the superiority and effectiveness of our MRQM method.

Cite this Paper

BibTeX

@InProceedings{pmlr-v260-fu25a,
  title = 	 {Enhancing DETRs for Small Object Detection via Multi-Scale Refinement and Query-Aided Mining},
  author =       {Fu, Sisi and Chen, Zhiming and Fang, Xiaocheng and Cai, Jieyi and Liu, Huanyu and Wen, Huosheng and Chen, Bingzhi},
  booktitle = 	 {Proceedings of the 16th Asian Conference on Machine Learning},
  pages = 	 {936--951},
  year = 	 {2025},
  editor = 	 {Nguyen, Vu and Lin, Hsuan-Tien},
  volume = 	 {260},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {05--08 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v260/main/assets/fu25a/fu25a.pdf},
  url = 	 {https://proceedings.mlr.press/v260/fu25a.html},
  abstract = 	 {Small object detection (SOD) aims to precisely localize and accurately classify objects from limited spatial extent and discernible features. Despite significant advancements in object detection driven by CNN-based and Transformer-based methods, SOD remains a significant challenge. This is primarily due to their minimal spatial dimensions and distinct features which pose difficulties in both computational efficiency and effective supervision. Particularly, Transformer-based detectors suffer from the high computational cost caused by the introduction of a feature pyramid network (FPN) and the sparse supervision for the encoder output due to insufficient positive queries. Current approaches attempt to mitigate these issues through sparse attention mechanisms and auxiliary one-to-many label assignment strategies. However, these approaches often still suffer from inefficiencies in processing multi-scale information and a deficiency in generating adequate positive queries for small objects. To address this issue, we propose a novel small object detector MRQM, which integrates Multi-scale Refinement and Query-aided Mining. The scale-aware encoder strategically refines features across multiple scales from a bi-directional feature pyramid network (BiFPN) through iterative updates. This process not only reduces redundant computations but also significantly enhances the representation of features at various scales. Furthermore, the IoU-aware head integrates the dynamic anchors mining strategy and one-to-many label assignments to fully mine potential high-quality auxiliary positive queries for small instances, and mitigate issues related to sparse supervision for the encoder. Extensive experiments on the SODA-D and VisDrone datasets consistently demonstrate the superiority and effectiveness of our MRQM method.}
}

Endnote

%0 Conference Paper
%T Enhancing DETRs for Small Object Detection via Multi-Scale Refinement and Query-Aided Mining
%A Sisi Fu
%A Zhiming Chen
%A Xiaocheng Fang
%A Jieyi Cai
%A Huanyu Liu
%A Huosheng Wen
%A Bingzhi Chen
%B Proceedings of the 16th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Vu Nguyen
%E Hsuan-Tien Lin	
%F pmlr-v260-fu25a
%I PMLR
%P 936--951
%U https://proceedings.mlr.press/v260/fu25a.html
%V 260
%X Small object detection (SOD) aims to precisely localize and accurately classify objects from limited spatial extent and discernible features. Despite significant advancements in object detection driven by CNN-based and Transformer-based methods, SOD remains a significant challenge. This is primarily due to their minimal spatial dimensions and distinct features which pose difficulties in both computational efficiency and effective supervision. Particularly, Transformer-based detectors suffer from the high computational cost caused by the introduction of a feature pyramid network (FPN) and the sparse supervision for the encoder output due to insufficient positive queries. Current approaches attempt to mitigate these issues through sparse attention mechanisms and auxiliary one-to-many label assignment strategies. However, these approaches often still suffer from inefficiencies in processing multi-scale information and a deficiency in generating adequate positive queries for small objects. To address this issue, we propose a novel small object detector MRQM, which integrates Multi-scale Refinement and Query-aided Mining. The scale-aware encoder strategically refines features across multiple scales from a bi-directional feature pyramid network (BiFPN) through iterative updates. This process not only reduces redundant computations but also significantly enhances the representation of features at various scales. Furthermore, the IoU-aware head integrates the dynamic anchors mining strategy and one-to-many label assignments to fully mine potential high-quality auxiliary positive queries for small instances, and mitigate issues related to sparse supervision for the encoder. Extensive experiments on the SODA-D and VisDrone datasets consistently demonstrate the superiority and effectiveness of our MRQM method.

APA

Fu, S., Chen, Z., Fang, X., Cai, J., Liu, H., Wen, H. & Chen, B.. (2025). Enhancing DETRs for Small Object Detection via Multi-Scale Refinement and Query-Aided Mining. Proceedings of the 16th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 260:936-951 Available from https://proceedings.mlr.press/v260/fu25a.html.

Enhancing DETRs for Small Object Detection via Multi-Scale Refinement and Query-Aided Mining

Abstract

Cite this Paper

Related Material