A Study on Context Length and Efficient Transformers for Biomedical Image Analysis

Sarah M. Hooper, Hui Xue
Proceedings of the 4th Machine Learning for Health Symposium, PMLR 259:471-489, 2025.

Abstract

Biomedical images are often high-resolution and multi-dimensional, presenting computational challenges for deep neural networks. These computational challenges are compounded when training transformers due to the self-attention operator, which scales quadratically with context length. Recent works have proposed alternatives to self-attention that scale more favorably with context length, alleviating these computational difficulties and potentially enabling more efficient application of transformers to large biomedical images. However, a systematic evaluation on this topic is lacking. In this study, we investigate the impact of context length on biomedical image analysis and we evaluate the performance of recently proposed substitutes for self-attention. We first curate a suite of biomedical imaging datasets, including 2D and 3D data for segmentation, denoising, and classification tasks. We then analyze the impact of context length on network performance using the Vision Transformer and Swin Transformer. Our findings reveal a strong relationship between context length and performance, particularly for pixel-level prediction tasks. Finally, we show that recent attention-free models demonstrate significant improvements in efficiency while maintaining comparable performance to self-attention-based models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v259-hooper25a, title = {A Study on Context Length and Efficient Transformers for Biomedical Image Analysis}, author = {Hooper, Sarah M. and Xue, Hui}, booktitle = {Proceedings of the 4th Machine Learning for Health Symposium}, pages = {471--489}, year = {2025}, editor = {Hegselmann, Stefan and Zhou, Helen and Healey, Elizabeth and Chang, Trenton and Ellington, Caleb and Mhasawade, Vishwali and Tonekaboni, Sana and Argaw, Peniel and Zhang, Haoran}, volume = {259}, series = {Proceedings of Machine Learning Research}, month = {15--16 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v259/main/assets/hooper25a/hooper25a.pdf}, url = {https://proceedings.mlr.press/v259/hooper25a.html}, abstract = {Biomedical images are often high-resolution and multi-dimensional, presenting computational challenges for deep neural networks. These computational challenges are compounded when training transformers due to the self-attention operator, which scales quadratically with context length. Recent works have proposed alternatives to self-attention that scale more favorably with context length, alleviating these computational difficulties and potentially enabling more efficient application of transformers to large biomedical images. However, a systematic evaluation on this topic is lacking. In this study, we investigate the impact of context length on biomedical image analysis and we evaluate the performance of recently proposed substitutes for self-attention. We first curate a suite of biomedical imaging datasets, including 2D and 3D data for segmentation, denoising, and classification tasks. We then analyze the impact of context length on network performance using the Vision Transformer and Swin Transformer. Our findings reveal a strong relationship between context length and performance, particularly for pixel-level prediction tasks. Finally, we show that recent attention-free models demonstrate significant improvements in efficiency while maintaining comparable performance to self-attention-based models.} }
Endnote
%0 Conference Paper %T A Study on Context Length and Efficient Transformers for Biomedical Image Analysis %A Sarah M. Hooper %A Hui Xue %B Proceedings of the 4th Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2025 %E Stefan Hegselmann %E Helen Zhou %E Elizabeth Healey %E Trenton Chang %E Caleb Ellington %E Vishwali Mhasawade %E Sana Tonekaboni %E Peniel Argaw %E Haoran Zhang %F pmlr-v259-hooper25a %I PMLR %P 471--489 %U https://proceedings.mlr.press/v259/hooper25a.html %V 259 %X Biomedical images are often high-resolution and multi-dimensional, presenting computational challenges for deep neural networks. These computational challenges are compounded when training transformers due to the self-attention operator, which scales quadratically with context length. Recent works have proposed alternatives to self-attention that scale more favorably with context length, alleviating these computational difficulties and potentially enabling more efficient application of transformers to large biomedical images. However, a systematic evaluation on this topic is lacking. In this study, we investigate the impact of context length on biomedical image analysis and we evaluate the performance of recently proposed substitutes for self-attention. We first curate a suite of biomedical imaging datasets, including 2D and 3D data for segmentation, denoising, and classification tasks. We then analyze the impact of context length on network performance using the Vision Transformer and Swin Transformer. Our findings reveal a strong relationship between context length and performance, particularly for pixel-level prediction tasks. Finally, we show that recent attention-free models demonstrate significant improvements in efficiency while maintaining comparable performance to self-attention-based models.
APA
Hooper, S.M. & Xue, H.. (2025). A Study on Context Length and Efficient Transformers for Biomedical Image Analysis. Proceedings of the 4th Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 259:471-489 Available from https://proceedings.mlr.press/v259/hooper25a.html.

Related Material