Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation

Yiming Cui, Linjie Yang, Haichao Yu
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:6591-6602, 2023.

Abstract

Transformer-based detection and segmentation methods use a list of learned detection queries to retrieve information from the transformer network and learn to predict the location and category of one specific object from each query. We empirically find that random convex combinations of the learned queries are still good for the corresponding models. We then propose to learn a convex combination with dynamic coefficients based on the high-level semantics of the image. The generated dynamic queries, named as modulated queries, better capture the prior of object locations and categories in the different images. Equipped with our modulated queries, a wide range of DETR-based models achieve consistent and superior performance across multiple tasks (object detection, instance segmentation, panoptic segmentation) and on different benchmarks (MS COCO, CityScapes, YoutubeVIS).

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-cui23f, title = {Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation}, author = {Cui, Yiming and Yang, Linjie and Yu, Haichao}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {6591--6602}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/cui23f/cui23f.pdf}, url = {https://proceedings.mlr.press/v202/cui23f.html}, abstract = {Transformer-based detection and segmentation methods use a list of learned detection queries to retrieve information from the transformer network and learn to predict the location and category of one specific object from each query. We empirically find that random convex combinations of the learned queries are still good for the corresponding models. We then propose to learn a convex combination with dynamic coefficients based on the high-level semantics of the image. The generated dynamic queries, named as modulated queries, better capture the prior of object locations and categories in the different images. Equipped with our modulated queries, a wide range of DETR-based models achieve consistent and superior performance across multiple tasks (object detection, instance segmentation, panoptic segmentation) and on different benchmarks (MS COCO, CityScapes, YoutubeVIS).} }
Endnote
%0 Conference Paper %T Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation %A Yiming Cui %A Linjie Yang %A Haichao Yu %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-cui23f %I PMLR %P 6591--6602 %U https://proceedings.mlr.press/v202/cui23f.html %V 202 %X Transformer-based detection and segmentation methods use a list of learned detection queries to retrieve information from the transformer network and learn to predict the location and category of one specific object from each query. We empirically find that random convex combinations of the learned queries are still good for the corresponding models. We then propose to learn a convex combination with dynamic coefficients based on the high-level semantics of the image. The generated dynamic queries, named as modulated queries, better capture the prior of object locations and categories in the different images. Equipped with our modulated queries, a wide range of DETR-based models achieve consistent and superior performance across multiple tasks (object detection, instance segmentation, panoptic segmentation) and on different benchmarks (MS COCO, CityScapes, YoutubeVIS).
APA
Cui, Y., Yang, L. & Yu, H.. (2023). Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:6591-6602 Available from https://proceedings.mlr.press/v202/cui23f.html.

Related Material