DiffTAC: Temporal-Conditioned Latent Diffusion with Integrated Attention for Intermediate Frame Generation and Temporal Super-Resolution in Cardiac MRI

Shilajit Banerjee; Aniruddha Sinha

DiffTAC: Temporal-Conditioned Latent Diffusion with Integrated Attention for Intermediate Frame Generation and Temporal Super-Resolution in Cardiac MRI

Shilajit Banerjee, Aniruddha Sinha

Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:414-432, 2026.

Abstract

Cardiac cine MRI captures dynamic cardiac motion, yet its temporal resolution remains fundamentally constrained by long acquisition times and breath-hold requirements. We introduce DiffTAC, a latent diffusion framework that synthesizes intermediate cardiac phases by treating time as an explicit conditioning variable. Using the end-diastolic (ED) and end-systolic (ES) frames as anatomical anchors, DiffTAC performs denoising in the latent space of a pretrained variational autoencoder and conditions generation on a learnable temporal embedding that specifies the desired phase location within the cardiac cycle. To effectively fuse temporal conditioning with anatomical context, we propose the Integrated Attention Block (IAB), a unified module that combines self-attention and cross-attention to modulate spatial features according to the target temporal position. This design enables the model to synthesize anatomically coherent, temporally smooth intermediate frames. Experiments on multiple publicly available datasets demonstrate that DiffTAC produces highly realistic intermediate phases and achieves superior temporal consistency compared to classical interpolation, optical-flow–based reconstruction, and ablated variants of our architecture. These findings show that modeling time as a conditioning signal within a diffusion framework provides an effective and acquisition-free solution for temporal super-resolution in cardiac MRI.

Cite this Paper

BibTeX

@InProceedings{pmlr-v315-banerjee26a,
  title = 	 {DiffTAC: Temporal-Conditioned Latent Diffusion with Integrated Attention for Intermediate Frame Generation and Temporal Super-Resolution in Cardiac MRI},
  author =       {Banerjee, Shilajit and Sinha, Aniruddha},
  booktitle = 	 {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {414--432},
  year = 	 {2026},
  editor = 	 {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining},
  volume = 	 {315},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v315/main/assets/banerjee26a/banerjee26a.pdf},
  url = 	 {https://proceedings.mlr.press/v315/banerjee26a.html},
  abstract = 	 {Cardiac cine MRI captures dynamic cardiac motion, yet its temporal resolution remains fundamentally constrained by long acquisition times and breath-hold requirements. We introduce DiffTAC, a latent diffusion framework that synthesizes intermediate cardiac phases by treating time as an explicit conditioning variable. Using the end-diastolic (ED) and end-systolic (ES) frames as anatomical anchors, DiffTAC performs denoising in the latent space of a pretrained variational autoencoder and conditions generation on a learnable temporal embedding that specifies the desired phase location within the cardiac cycle. To effectively fuse temporal conditioning with anatomical context, we propose the Integrated Attention Block (IAB), a unified module that combines self-attention and cross-attention to modulate spatial features according to the target temporal position. This design enables the model to synthesize anatomically coherent, temporally smooth intermediate frames. Experiments on multiple publicly available datasets demonstrate that DiffTAC produces highly realistic intermediate phases and achieves superior temporal consistency compared to classical interpolation, optical-flow–based reconstruction, and ablated variants of our architecture. These findings show that modeling time as a conditioning signal within a diffusion framework provides an effective and acquisition-free solution for temporal super-resolution in cardiac MRI.}
}

Endnote

%0 Conference Paper
%T DiffTAC: Temporal-Conditioned Latent Diffusion with Integrated Attention for Intermediate Frame Generation and Temporal Super-Resolution in Cardiac MRI
%A Shilajit Banerjee
%A Aniruddha Sinha
%B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Yuankai Huo
%E Mingchen Gao
%E Chang-Fu Kuo
%E Yueming Jin
%E Ruining Deng	
%F pmlr-v315-banerjee26a
%I PMLR
%P 414--432
%U https://proceedings.mlr.press/v315/banerjee26a.html
%V 315
%X Cardiac cine MRI captures dynamic cardiac motion, yet its temporal resolution remains fundamentally constrained by long acquisition times and breath-hold requirements. We introduce DiffTAC, a latent diffusion framework that synthesizes intermediate cardiac phases by treating time as an explicit conditioning variable. Using the end-diastolic (ED) and end-systolic (ES) frames as anatomical anchors, DiffTAC performs denoising in the latent space of a pretrained variational autoencoder and conditions generation on a learnable temporal embedding that specifies the desired phase location within the cardiac cycle. To effectively fuse temporal conditioning with anatomical context, we propose the Integrated Attention Block (IAB), a unified module that combines self-attention and cross-attention to modulate spatial features according to the target temporal position. This design enables the model to synthesize anatomically coherent, temporally smooth intermediate frames. Experiments on multiple publicly available datasets demonstrate that DiffTAC produces highly realistic intermediate phases and achieves superior temporal consistency compared to classical interpolation, optical-flow–based reconstruction, and ablated variants of our architecture. These findings show that modeling time as a conditioning signal within a diffusion framework provides an effective and acquisition-free solution for temporal super-resolution in cardiac MRI.

APA

Banerjee, S. & Sinha, A.. (2026). DiffTAC: Temporal-Conditioned Latent Diffusion with Integrated Attention for Intermediate Frame Generation and Temporal Super-Resolution in Cardiac MRI. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:414-432 Available from https://proceedings.mlr.press/v315/banerjee26a.html.

Related Material

Download PDF