MS-Former: Multi-Scale Self-Guided Transformer for Medical Image Segmentation

Sanaz Karimijafarbigloo, Reza Azad, Amirhossein Kazerouni, Dorit Merhof
Medical Imaging with Deep Learning, PMLR 227:680-694, 2024.

Abstract

Multi-scale representations have proven to be a powerful tool since they can take into account both the fine-grained details of objects in an image as well as the broader context. Inspired by this, we propose a novel dual-branch transformer network that operates on two different scales to encode global contextual dependencies while preserving local information. To learn in a self-supervised fashion, our approach considers the semantic dependency that exists between different scales to generate a supervisory signal for inter-scale consistency and also imposes a spatial stability loss within the scale for self-supervised content clustering. While intra-scale and inter-scale consistency losses aim to increase features similarly within the cluster, we propose to include a cross-entropy loss function on top of the clustering score map to effectively model each cluster distribution and increase the decision boundary between clusters. Iteratively our algorithm learns to assign each pixel to a semantically related cluster to produce the segmentation map. Extensive experiments on skin lesion and lung segmentation datasets show the superiority of our method compared to the state-of-the-art (SOTA) approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v227-karimijafarbigloo24a, title = {MS-Former: Multi-Scale Self-Guided Transformer for Medical Image Segmentation}, author = {Karimijafarbigloo, Sanaz and Azad, Reza and Kazerouni, Amirhossein and Merhof, Dorit}, booktitle = {Medical Imaging with Deep Learning}, pages = {680--694}, year = {2024}, editor = {Oguz, Ipek and Noble, Jack and Li, Xiaoxiao and Styner, Martin and Baumgartner, Christian and Rusu, Mirabela and Heinmann, Tobias and Kontos, Despina and Landman, Bennett and Dawant, Benoit}, volume = {227}, series = {Proceedings of Machine Learning Research}, month = {10--12 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v227/karimijafarbigloo24a/karimijafarbigloo24a.pdf}, url = {https://proceedings.mlr.press/v227/karimijafarbigloo24a.html}, abstract = {Multi-scale representations have proven to be a powerful tool since they can take into account both the fine-grained details of objects in an image as well as the broader context. Inspired by this, we propose a novel dual-branch transformer network that operates on two different scales to encode global contextual dependencies while preserving local information. To learn in a self-supervised fashion, our approach considers the semantic dependency that exists between different scales to generate a supervisory signal for inter-scale consistency and also imposes a spatial stability loss within the scale for self-supervised content clustering. While intra-scale and inter-scale consistency losses aim to increase features similarly within the cluster, we propose to include a cross-entropy loss function on top of the clustering score map to effectively model each cluster distribution and increase the decision boundary between clusters. Iteratively our algorithm learns to assign each pixel to a semantically related cluster to produce the segmentation map. Extensive experiments on skin lesion and lung segmentation datasets show the superiority of our method compared to the state-of-the-art (SOTA) approaches.} }
Endnote
%0 Conference Paper %T MS-Former: Multi-Scale Self-Guided Transformer for Medical Image Segmentation %A Sanaz Karimijafarbigloo %A Reza Azad %A Amirhossein Kazerouni %A Dorit Merhof %B Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2024 %E Ipek Oguz %E Jack Noble %E Xiaoxiao Li %E Martin Styner %E Christian Baumgartner %E Mirabela Rusu %E Tobias Heinmann %E Despina Kontos %E Bennett Landman %E Benoit Dawant %F pmlr-v227-karimijafarbigloo24a %I PMLR %P 680--694 %U https://proceedings.mlr.press/v227/karimijafarbigloo24a.html %V 227 %X Multi-scale representations have proven to be a powerful tool since they can take into account both the fine-grained details of objects in an image as well as the broader context. Inspired by this, we propose a novel dual-branch transformer network that operates on two different scales to encode global contextual dependencies while preserving local information. To learn in a self-supervised fashion, our approach considers the semantic dependency that exists between different scales to generate a supervisory signal for inter-scale consistency and also imposes a spatial stability loss within the scale for self-supervised content clustering. While intra-scale and inter-scale consistency losses aim to increase features similarly within the cluster, we propose to include a cross-entropy loss function on top of the clustering score map to effectively model each cluster distribution and increase the decision boundary between clusters. Iteratively our algorithm learns to assign each pixel to a semantically related cluster to produce the segmentation map. Extensive experiments on skin lesion and lung segmentation datasets show the superiority of our method compared to the state-of-the-art (SOTA) approaches.
APA
Karimijafarbigloo, S., Azad, R., Kazerouni, A. & Merhof, D.. (2024). MS-Former: Multi-Scale Self-Guided Transformer for Medical Image Segmentation. Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 227:680-694 Available from https://proceedings.mlr.press/v227/karimijafarbigloo24a.html.

Related Material