MTVNet: Multi-Contextual Transformers for Volumes – Network for Super-Resolution with Long-Range Interactions

August Leander Høeg, Sophia W. Bardenfleth, Hans Martin Kjer, Tim B. Dyrby, Vedrana Andersen Dahl, Anders Dahl
Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL), PMLR 307:144-159, 2026.

Abstract

Recent advances in transformer-based models have led to significant improvements in 2D image super-resolution. However, leveraging these advances for volumetric super-resolution remains challenging due to the high memory demands of self-attention mechanisms in 3D volumes, which severely limit the receptive field. As a result, long-range interactions, one of the key strengths of transformers, are underutilized in 3D super-resolution. To investigate this, we propose MTVNet, a volumetric transformer model that leverages information from expanded contextual regions at multiple resolution scales. Here, coarse resolution information from boarder context regions is carried on to inform the super-resolution prediction of a smaller area. Using transformer layers at each resolution, our coarse-to-fine modeling limits the number of tokens at each scale and enables attention over larger regions than previously possible. We compare our method, MTVNet, against state-of-the-art models on five 3D datasets. Our results show that expanding the receptive field of transformer-based methods yields significant performance gains on high-resolution 3D data. While CNNs outperform transformers on low-resolution data, transformer-based methods excel on high-resolution volumes with exploitable long-range dependencies, with our MTVNet achieving state-of-the-art performance. Our code is available at https://github.com/AugustHoeg/MTVNet.

Cite this Paper


BibTeX
@InProceedings{pmlr-v307-hoeg26a, title = {{MTVN}et: Multi-Contextual Transformers for Volumes {–} Network for Super-Resolution with Long-Range Interactions}, author = {H{\o}eg, August Leander and Bardenfleth, Sophia W. and Kjer, Hans Martin and Dyrby, Tim B. and Dahl, Vedrana Andersen and Dahl, Anders}, booktitle = {Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL)}, pages = {144--159}, year = {2026}, editor = {Kim, Hyeongji and Ramírez Rivera, Adín and Ricaud, Benjamin}, volume = {307}, series = {Proceedings of Machine Learning Research}, month = {06--08 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v307/main/assets/hoeg26a/hoeg26a.pdf}, url = {https://proceedings.mlr.press/v307/hoeg26a.html}, abstract = {Recent advances in transformer-based models have led to significant improvements in 2D image super-resolution. However, leveraging these advances for volumetric super-resolution remains challenging due to the high memory demands of self-attention mechanisms in 3D volumes, which severely limit the receptive field. As a result, long-range interactions, one of the key strengths of transformers, are underutilized in 3D super-resolution. To investigate this, we propose MTVNet, a volumetric transformer model that leverages information from expanded contextual regions at multiple resolution scales. Here, coarse resolution information from boarder context regions is carried on to inform the super-resolution prediction of a smaller area. Using transformer layers at each resolution, our coarse-to-fine modeling limits the number of tokens at each scale and enables attention over larger regions than previously possible. We compare our method, MTVNet, against state-of-the-art models on five 3D datasets. Our results show that expanding the receptive field of transformer-based methods yields significant performance gains on high-resolution 3D data. While CNNs outperform transformers on low-resolution data, transformer-based methods excel on high-resolution volumes with exploitable long-range dependencies, with our MTVNet achieving state-of-the-art performance. Our code is available at https://github.com/AugustHoeg/MTVNet.} }
Endnote
%0 Conference Paper %T MTVNet: Multi-Contextual Transformers for Volumes – Network for Super-Resolution with Long-Range Interactions %A August Leander Høeg %A Sophia W. Bardenfleth %A Hans Martin Kjer %A Tim B. Dyrby %A Vedrana Andersen Dahl %A Anders Dahl %B Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL) %C Proceedings of Machine Learning Research %D 2026 %E Hyeongji Kim %E Adín Ramírez Rivera %E Benjamin Ricaud %F pmlr-v307-hoeg26a %I PMLR %P 144--159 %U https://proceedings.mlr.press/v307/hoeg26a.html %V 307 %X Recent advances in transformer-based models have led to significant improvements in 2D image super-resolution. However, leveraging these advances for volumetric super-resolution remains challenging due to the high memory demands of self-attention mechanisms in 3D volumes, which severely limit the receptive field. As a result, long-range interactions, one of the key strengths of transformers, are underutilized in 3D super-resolution. To investigate this, we propose MTVNet, a volumetric transformer model that leverages information from expanded contextual regions at multiple resolution scales. Here, coarse resolution information from boarder context regions is carried on to inform the super-resolution prediction of a smaller area. Using transformer layers at each resolution, our coarse-to-fine modeling limits the number of tokens at each scale and enables attention over larger regions than previously possible. We compare our method, MTVNet, against state-of-the-art models on five 3D datasets. Our results show that expanding the receptive field of transformer-based methods yields significant performance gains on high-resolution 3D data. While CNNs outperform transformers on low-resolution data, transformer-based methods excel on high-resolution volumes with exploitable long-range dependencies, with our MTVNet achieving state-of-the-art performance. Our code is available at https://github.com/AugustHoeg/MTVNet.
APA
Høeg, A.L., Bardenfleth, S.W., Kjer, H.M., Dyrby, T.B., Dahl, V.A. & Dahl, A.. (2026). MTVNet: Multi-Contextual Transformers for Volumes – Network for Super-Resolution with Long-Range Interactions. Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL), in Proceedings of Machine Learning Research 307:144-159 Available from https://proceedings.mlr.press/v307/hoeg26a.html.

Related Material