Modalities Contribute Unequally: Enhancing Medical Multi-modal Learning through Adaptive Modality Token Re-balancing

Jie Peng, Jenna L. Ballard, Mohan Zhang, Sukwon Yun, Jiayi Xin, Qi Long, Yanyong Zhang, Tianlong Chen
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:48789-48807, 2025.

Abstract

Medical multi-modal learning requires an effective fusion capability of various heterogeneous modalities. One vital challenge is how to effectively fuse modalities when their data quality varies across different modalities and patients. For example, in the TCGA benchmark, the performance of the same modality can differ between types of cancer. Moreover, data collected at different times, locations, and with varying reagents can introduce inter-modal data quality differences ($i.e.$, $\textbf{Modality Batch Effect}$). In response, we propose ${\textbf{A}}$daptive ${\textbf{M}}$odality Token Re-Balan${\textbf{C}}$ing ($\texttt{AMC}$), a novel top-down dynamic multi-modal fusion approach. The core of $\texttt{AMC}$ is to quantify the significance of each modality (Top) and then fuse them according to the modality importance (Down). Specifically, we access the quality of each input modality and then replace uninformative tokens with inter-modal tokens, accordingly. The more important a modality is, the more informative tokens are retained from that modality. The self-attention will further integrate these mixed tokens to fuse multi-modal knowledge. Comprehensive experiments on both medical and general multi-modal datasets demonstrate the effectiveness and generalizability of $\texttt{AMC}$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-peng25a, title = {Modalities Contribute Unequally: Enhancing Medical Multi-modal Learning through Adaptive Modality Token Re-balancing}, author = {Peng, Jie and Ballard, Jenna L. and Zhang, Mohan and Yun, Sukwon and Xin, Jiayi and Long, Qi and Zhang, Yanyong and Chen, Tianlong}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {48789--48807}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/peng25a/peng25a.pdf}, url = {https://proceedings.mlr.press/v267/peng25a.html}, abstract = {Medical multi-modal learning requires an effective fusion capability of various heterogeneous modalities. One vital challenge is how to effectively fuse modalities when their data quality varies across different modalities and patients. For example, in the TCGA benchmark, the performance of the same modality can differ between types of cancer. Moreover, data collected at different times, locations, and with varying reagents can introduce inter-modal data quality differences ($i.e.$, $\textbf{Modality Batch Effect}$). In response, we propose ${\textbf{A}}$daptive ${\textbf{M}}$odality Token Re-Balan${\textbf{C}}$ing ($\texttt{AMC}$), a novel top-down dynamic multi-modal fusion approach. The core of $\texttt{AMC}$ is to quantify the significance of each modality (Top) and then fuse them according to the modality importance (Down). Specifically, we access the quality of each input modality and then replace uninformative tokens with inter-modal tokens, accordingly. The more important a modality is, the more informative tokens are retained from that modality. The self-attention will further integrate these mixed tokens to fuse multi-modal knowledge. Comprehensive experiments on both medical and general multi-modal datasets demonstrate the effectiveness and generalizability of $\texttt{AMC}$.} }
Endnote
%0 Conference Paper %T Modalities Contribute Unequally: Enhancing Medical Multi-modal Learning through Adaptive Modality Token Re-balancing %A Jie Peng %A Jenna L. Ballard %A Mohan Zhang %A Sukwon Yun %A Jiayi Xin %A Qi Long %A Yanyong Zhang %A Tianlong Chen %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-peng25a %I PMLR %P 48789--48807 %U https://proceedings.mlr.press/v267/peng25a.html %V 267 %X Medical multi-modal learning requires an effective fusion capability of various heterogeneous modalities. One vital challenge is how to effectively fuse modalities when their data quality varies across different modalities and patients. For example, in the TCGA benchmark, the performance of the same modality can differ between types of cancer. Moreover, data collected at different times, locations, and with varying reagents can introduce inter-modal data quality differences ($i.e.$, $\textbf{Modality Batch Effect}$). In response, we propose ${\textbf{A}}$daptive ${\textbf{M}}$odality Token Re-Balan${\textbf{C}}$ing ($\texttt{AMC}$), a novel top-down dynamic multi-modal fusion approach. The core of $\texttt{AMC}$ is to quantify the significance of each modality (Top) and then fuse them according to the modality importance (Down). Specifically, we access the quality of each input modality and then replace uninformative tokens with inter-modal tokens, accordingly. The more important a modality is, the more informative tokens are retained from that modality. The self-attention will further integrate these mixed tokens to fuse multi-modal knowledge. Comprehensive experiments on both medical and general multi-modal datasets demonstrate the effectiveness and generalizability of $\texttt{AMC}$.
APA
Peng, J., Ballard, J.L., Zhang, M., Yun, S., Xin, J., Long, Q., Zhang, Y. & Chen, T.. (2025). Modalities Contribute Unequally: Enhancing Medical Multi-modal Learning through Adaptive Modality Token Re-balancing. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:48789-48807 Available from https://proceedings.mlr.press/v267/peng25a.html.

Related Material