The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models

Zichao Li, Xueru Wen, Jie Lou, Yuqiu Ji, Yaojie Lu, Xianpei Han, Debing Zhang, Le Sun
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:36281-36297, 2025.

Abstract

Multimodal Reward Models (MM-RMs) are crucial for aligning Large Language Models (LLMs) with human preferences, particularly as LLMs increasingly interact with multimodal data. However, we find that MM-RMs trained on existing datasets often struggle to generalize to out-of-distribution data due to their reliance on unimodal spurious correlations, primarily text-only shortcuts within the training distribution, which prevents them from leveraging true multimodal reward functions. To address this, we introduce a Shortcut-aware MM-RM learning algorithm that mitigates this issue by dynamically reweighting training samples, shifting the distribution toward better multimodal understanding, and reducing dependence on unimodal spurious correlations. Our experiments demonstrate significant improvements in generalization, downstream task performance, and scalability, establishing a more robust framework for multimodal reward modeling. Our source code is provided on https://github.com/alignrm/Generalizable-MM-RM.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-li25cw, title = {The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models}, author = {Li, Zichao and Wen, Xueru and Lou, Jie and Ji, Yuqiu and Lu, Yaojie and Han, Xianpei and Zhang, Debing and Sun, Le}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {36281--36297}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/li25cw/li25cw.pdf}, url = {https://proceedings.mlr.press/v267/li25cw.html}, abstract = {Multimodal Reward Models (MM-RMs) are crucial for aligning Large Language Models (LLMs) with human preferences, particularly as LLMs increasingly interact with multimodal data. However, we find that MM-RMs trained on existing datasets often struggle to generalize to out-of-distribution data due to their reliance on unimodal spurious correlations, primarily text-only shortcuts within the training distribution, which prevents them from leveraging true multimodal reward functions. To address this, we introduce a Shortcut-aware MM-RM learning algorithm that mitigates this issue by dynamically reweighting training samples, shifting the distribution toward better multimodal understanding, and reducing dependence on unimodal spurious correlations. Our experiments demonstrate significant improvements in generalization, downstream task performance, and scalability, establishing a more robust framework for multimodal reward modeling. Our source code is provided on https://github.com/alignrm/Generalizable-MM-RM.} }
Endnote
%0 Conference Paper %T The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models %A Zichao Li %A Xueru Wen %A Jie Lou %A Yuqiu Ji %A Yaojie Lu %A Xianpei Han %A Debing Zhang %A Le Sun %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-li25cw %I PMLR %P 36281--36297 %U https://proceedings.mlr.press/v267/li25cw.html %V 267 %X Multimodal Reward Models (MM-RMs) are crucial for aligning Large Language Models (LLMs) with human preferences, particularly as LLMs increasingly interact with multimodal data. However, we find that MM-RMs trained on existing datasets often struggle to generalize to out-of-distribution data due to their reliance on unimodal spurious correlations, primarily text-only shortcuts within the training distribution, which prevents them from leveraging true multimodal reward functions. To address this, we introduce a Shortcut-aware MM-RM learning algorithm that mitigates this issue by dynamically reweighting training samples, shifting the distribution toward better multimodal understanding, and reducing dependence on unimodal spurious correlations. Our experiments demonstrate significant improvements in generalization, downstream task performance, and scalability, establishing a more robust framework for multimodal reward modeling. Our source code is provided on https://github.com/alignrm/Generalizable-MM-RM.
APA
Li, Z., Wen, X., Lou, J., Ji, Y., Lu, Y., Han, X., Zhang, D. & Sun, L.. (2025). The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:36281-36297 Available from https://proceedings.mlr.press/v267/li25cw.html.

Related Material