Predicting the Year of Total Knee Replacement: A Transformer-Based Multimodal Approach

Ozkan Cigdem; Refik Soyak; Kyunghyun Cho; Cem M Deniz

Predicting the Year of Total Knee Replacement: A Transformer-Based Multimodal Approach

Ozkan Cigdem, Refik Soyak, Kyunghyun Cho, Cem M Deniz

Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:239-252, 2026.

Abstract

Accurate prediction of the year of total knee replacement (TKR) is challenging due tothe complex interplay of factors influencing the surgical decision. Current deep learningmodels often rely on single-modality data, limiting their predictive power. Multimodalapproaches integrating imaging and patient data offer the potential to improve predictionsand support clinical decisions. This study presents an end-to-end trained, transformer-based multimodal model that integrates MR imaging with tabular data, including clinicalvariables and image readings, to predict the year of TKR for each subject. Our model lever-ages cross-modal attention to fuse features from an image encoder with a self-supervisedpretrained tabular encoder, achieving the highest accuracy of 63.4% among tested mod-els. We evaluated its performance against three unimodal models and four multimodalfusion strategies, including simple concatenation, DAFT, and multimodal interaction. Theresults demonstrate that our model’s cross-modal interaction approach with pretrainedTabNet not only outperformed all unimodal models but also showed improvements overother multimodal fusion techniques, highlighting the effectiveness of cross-modal attentionfusion for integrating complex data modalities in TKR year prediction tasks. Source codeis available at https://github.com/denizlab/2025_MIDL_time2TKR.

Cite this Paper

BibTeX

@InProceedings{pmlr-v301-cigdem26a,
  title = 	 {Predicting the Year of Total Knee Replacement: A Transformer-Based Multimodal Approach},
  author =       {Cigdem, Ozkan and Soyak, Refik and Cho, Kyunghyun and Deniz, Cem M},
  booktitle = 	 {Proceedings of The 8th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {239--252},
  year = 	 {2026},
  editor = 	 {Tasdizen, Tolga and Elhabian, Shireen and Summers, Ronald and Chen, Chen and Koch, Lisa and Zhuang, Yan},
  volume = 	 {301},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v301/main/assets/cigdem26a/cigdem26a.pdf},
  url = 	 {https://proceedings.mlr.press/v301/cigdem26a.html},
  abstract = 	 {Accurate prediction of the year of total knee replacement (TKR) is challenging due tothe complex interplay of factors influencing the surgical decision. Current deep learningmodels often rely on single-modality data, limiting their predictive power. Multimodalapproaches integrating imaging and patient data offer the potential to improve predictionsand support clinical decisions. This study presents an end-to-end trained, transformer-based multimodal model that integrates MR imaging with tabular data, including clinicalvariables and image readings, to predict the year of TKR for each subject. Our model lever-ages cross-modal attention to fuse features from an image encoder with a self-supervisedpretrained tabular encoder, achieving the highest accuracy of 63.4% among tested mod-els. We evaluated its performance against three unimodal models and four multimodalfusion strategies, including simple concatenation, DAFT, and multimodal interaction. Theresults demonstrate that our model’s cross-modal interaction approach with pretrainedTabNet not only outperformed all unimodal models but also showed improvements overother multimodal fusion techniques, highlighting the effectiveness of cross-modal attentionfusion for integrating complex data modalities in TKR year prediction tasks. Source codeis available at https://github.com/denizlab/2025_MIDL_time2TKR.}
}

Endnote

%0 Conference Paper
%T Predicting the Year of Total Knee Replacement: A Transformer-Based Multimodal Approach
%A Ozkan Cigdem
%A Refik Soyak
%A Kyunghyun Cho
%A Cem M Deniz
%B Proceedings of The 8th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Tolga Tasdizen
%E Shireen Elhabian
%E Ronald Summers
%E Chen Chen
%E Lisa Koch
%E Yan Zhuang	
%F pmlr-v301-cigdem26a
%I PMLR
%P 239--252
%U https://proceedings.mlr.press/v301/cigdem26a.html
%V 301
%X Accurate prediction of the year of total knee replacement (TKR) is challenging due tothe complex interplay of factors influencing the surgical decision. Current deep learningmodels often rely on single-modality data, limiting their predictive power. Multimodalapproaches integrating imaging and patient data offer the potential to improve predictionsand support clinical decisions. This study presents an end-to-end trained, transformer-based multimodal model that integrates MR imaging with tabular data, including clinicalvariables and image readings, to predict the year of TKR for each subject. Our model lever-ages cross-modal attention to fuse features from an image encoder with a self-supervisedpretrained tabular encoder, achieving the highest accuracy of 63.4% among tested mod-els. We evaluated its performance against three unimodal models and four multimodalfusion strategies, including simple concatenation, DAFT, and multimodal interaction. Theresults demonstrate that our model’s cross-modal interaction approach with pretrainedTabNet not only outperformed all unimodal models but also showed improvements overother multimodal fusion techniques, highlighting the effectiveness of cross-modal attentionfusion for integrating complex data modalities in TKR year prediction tasks. Source codeis available at https://github.com/denizlab/2025_MIDL_time2TKR.

APA

Cigdem, O., Soyak, R., Cho, K. & Deniz, C.M.. (2026). Predicting the Year of Total Knee Replacement: A Transformer-Based Multimodal Approach. Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 301:239-252 Available from https://proceedings.mlr.press/v301/cigdem26a.html.

Related Material

Download PDF