Learning Modality Knowledge Alignment for Cross-Modality Transfer

Wenxuan Ma, Shuang Li, Lincan Cai, Jingxuan Kang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:33777-33793, 2024.

Abstract

Cross-modality transfer aims to leverage large pretrained models to complete tasks that may not belong to the modality of pretraining data. Existing works achieve certain success in extending classical finetuning to cross-modal scenarios, yet we still lack understanding about the influence of modality gap on the transfer. In this work, a series of experiments focusing on the source representation quality during transfer are conducted, revealing the connection between larger modality gap and lesser knowledge reuse which means ineffective transfer. We then formalize the gap as the knowledge misalignment between modalities using conditional distribution $P(Y|X)$. Towards this problem, we present Modality kNowledge Alignment (MoNA), a meta-learning approach that learns target data transformation to reduce the modality knowledge discrepancy ahead of the transfer. Experiments show that the approach significantly improves upon cross-modal finetuning methods, and most importantly leads to better reuse of source modality knowledge.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-ma24d, title = {Learning Modality Knowledge Alignment for Cross-Modality Transfer}, author = {Ma, Wenxuan and Li, Shuang and Cai, Lincan and Kang, Jingxuan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {33777--33793}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ma24d/ma24d.pdf}, url = {https://proceedings.mlr.press/v235/ma24d.html}, abstract = {Cross-modality transfer aims to leverage large pretrained models to complete tasks that may not belong to the modality of pretraining data. Existing works achieve certain success in extending classical finetuning to cross-modal scenarios, yet we still lack understanding about the influence of modality gap on the transfer. In this work, a series of experiments focusing on the source representation quality during transfer are conducted, revealing the connection between larger modality gap and lesser knowledge reuse which means ineffective transfer. We then formalize the gap as the knowledge misalignment between modalities using conditional distribution $P(Y|X)$. Towards this problem, we present Modality kNowledge Alignment (MoNA), a meta-learning approach that learns target data transformation to reduce the modality knowledge discrepancy ahead of the transfer. Experiments show that the approach significantly improves upon cross-modal finetuning methods, and most importantly leads to better reuse of source modality knowledge.} }
Endnote
%0 Conference Paper %T Learning Modality Knowledge Alignment for Cross-Modality Transfer %A Wenxuan Ma %A Shuang Li %A Lincan Cai %A Jingxuan Kang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-ma24d %I PMLR %P 33777--33793 %U https://proceedings.mlr.press/v235/ma24d.html %V 235 %X Cross-modality transfer aims to leverage large pretrained models to complete tasks that may not belong to the modality of pretraining data. Existing works achieve certain success in extending classical finetuning to cross-modal scenarios, yet we still lack understanding about the influence of modality gap on the transfer. In this work, a series of experiments focusing on the source representation quality during transfer are conducted, revealing the connection between larger modality gap and lesser knowledge reuse which means ineffective transfer. We then formalize the gap as the knowledge misalignment between modalities using conditional distribution $P(Y|X)$. Towards this problem, we present Modality kNowledge Alignment (MoNA), a meta-learning approach that learns target data transformation to reduce the modality knowledge discrepancy ahead of the transfer. Experiments show that the approach significantly improves upon cross-modal finetuning methods, and most importantly leads to better reuse of source modality knowledge.
APA
Ma, W., Li, S., Cai, L. & Kang, J.. (2024). Learning Modality Knowledge Alignment for Cross-Modality Transfer. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:33777-33793 Available from https://proceedings.mlr.press/v235/ma24d.html.

Related Material