Not All Tokens Matter All The Time: Dynamic Token Aggregation Towards Efficient Detection Transformers

Jiacheng Cheng, Xiwen Yao, Xiang Yuan, Junwei Han
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:10144-10158, 2025.

Abstract

The substantial computational demands of detection transformers (DETRs) hinder their deployment in resource-constrained scenarios, with the encoder consistently emerging as a critical bottleneck. A promising solution lies in reducing token redundancy within the encoder. However, existing methods perform static sparsification while ignoring the varying importance of tokens across different levels and encoder blocks for object detection, leading to suboptimal sparsification and performance degradation. In this paper, we propose Dynamic DETR (Dynamic token aggregation for DEtection TRansformers), a novel strategy that leverages inherent importance distribution to control token density and performs multi-level token sparsification. Within each stage, we apply a proximal aggregation paradigm for low-level tokens to maintain spatial integrity, and a holistic strategy for high-level tokens to capture broader contextual information. Furthermore, we propose center-distance regularization to align the distribution of tokens throughout the sparsification process, thereby facilitating the representation consistency and effectively preserving critical object-specific patterns. Extensive experiments on canonical DETR models demonstrate that Dynamic DETR is broadly applicable across various models and consistently outperforms existing token sparsification methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-cheng25i, title = {Not All Tokens Matter All The Time: Dynamic Token Aggregation Towards Efficient Detection Transformers}, author = {Cheng, Jiacheng and Yao, Xiwen and Yuan, Xiang and Han, Junwei}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {10144--10158}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/cheng25i/cheng25i.pdf}, url = {https://proceedings.mlr.press/v267/cheng25i.html}, abstract = {The substantial computational demands of detection transformers (DETRs) hinder their deployment in resource-constrained scenarios, with the encoder consistently emerging as a critical bottleneck. A promising solution lies in reducing token redundancy within the encoder. However, existing methods perform static sparsification while ignoring the varying importance of tokens across different levels and encoder blocks for object detection, leading to suboptimal sparsification and performance degradation. In this paper, we propose Dynamic DETR (Dynamic token aggregation for DEtection TRansformers), a novel strategy that leverages inherent importance distribution to control token density and performs multi-level token sparsification. Within each stage, we apply a proximal aggregation paradigm for low-level tokens to maintain spatial integrity, and a holistic strategy for high-level tokens to capture broader contextual information. Furthermore, we propose center-distance regularization to align the distribution of tokens throughout the sparsification process, thereby facilitating the representation consistency and effectively preserving critical object-specific patterns. Extensive experiments on canonical DETR models demonstrate that Dynamic DETR is broadly applicable across various models and consistently outperforms existing token sparsification methods.} }
Endnote
%0 Conference Paper %T Not All Tokens Matter All The Time: Dynamic Token Aggregation Towards Efficient Detection Transformers %A Jiacheng Cheng %A Xiwen Yao %A Xiang Yuan %A Junwei Han %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-cheng25i %I PMLR %P 10144--10158 %U https://proceedings.mlr.press/v267/cheng25i.html %V 267 %X The substantial computational demands of detection transformers (DETRs) hinder their deployment in resource-constrained scenarios, with the encoder consistently emerging as a critical bottleneck. A promising solution lies in reducing token redundancy within the encoder. However, existing methods perform static sparsification while ignoring the varying importance of tokens across different levels and encoder blocks for object detection, leading to suboptimal sparsification and performance degradation. In this paper, we propose Dynamic DETR (Dynamic token aggregation for DEtection TRansformers), a novel strategy that leverages inherent importance distribution to control token density and performs multi-level token sparsification. Within each stage, we apply a proximal aggregation paradigm for low-level tokens to maintain spatial integrity, and a holistic strategy for high-level tokens to capture broader contextual information. Furthermore, we propose center-distance regularization to align the distribution of tokens throughout the sparsification process, thereby facilitating the representation consistency and effectively preserving critical object-specific patterns. Extensive experiments on canonical DETR models demonstrate that Dynamic DETR is broadly applicable across various models and consistently outperforms existing token sparsification methods.
APA
Cheng, J., Yao, X., Yuan, X. & Han, J.. (2025). Not All Tokens Matter All The Time: Dynamic Token Aggregation Towards Efficient Detection Transformers. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:10144-10158 Available from https://proceedings.mlr.press/v267/cheng25i.html.

Related Material