[edit]
Predicting the Year of Total Knee Replacement: A Transformer-Based Multimodal Approach
Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:239-252, 2026.
Abstract
Accurate prediction of the year of total knee replacement (TKR) is challenging due tothe complex interplay of factors influencing the surgical decision. Current deep learningmodels often rely on single-modality data, limiting their predictive power. Multimodalapproaches integrating imaging and patient data offer the potential to improve predictionsand support clinical decisions. This study presents an end-to-end trained, transformer-based multimodal model that integrates MR imaging with tabular data, including clinicalvariables and image readings, to predict the year of TKR for each subject. Our model lever-ages cross-modal attention to fuse features from an image encoder with a self-supervisedpretrained tabular encoder, achieving the highest accuracy of 63.4% among tested mod-els. We evaluated its performance against three unimodal models and four multimodalfusion strategies, including simple concatenation, DAFT, and multimodal interaction. Theresults demonstrate that our model’s cross-modal interaction approach with pretrainedTabNet not only outperformed all unimodal models but also showed improvements overother multimodal fusion techniques, highlighting the effectiveness of cross-modal attentionfusion for integrating complex data modalities in TKR year prediction tasks. Source codeis available at https://github.com/denizlab/2025_MIDL_time2TKR.