Efficient Quantification of Multimodal Interaction at Sample Level

Zequn Yang, Hongfa Wang, Di Hu
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:71302-71317, 2025.

Abstract

Interactions between modalities—redundancy, uniqueness, and synergy—collectively determine the composition of multimodal information. Understanding these interactions is crucial for analyzing information dynamics in multimodal systems, yet their accurate sample-level quantification presents significant theoretical and computational challenges. To address this, we introduce the Lightweight Sample-wise Multimodal Interaction (LSMI) estimator, rigorously grounded in pointwise information theory. We first develop a redundancy estimation framework, employing an appropriate pointwise information measure to quantify this most decomposable and measurable interaction. Building upon this, we propose a general interaction estimation method that employs efficient entropy estimation, specifically tailored for sample-wise estimation in continuous distributions. Extensive experiments on synthetic and real-world datasets validate LSMI’s precision and efficiency. Crucially, our sample-wise approach reveals fine-grained sample- and category-level dynamics within multimodal data, enabling practical applications such as redundancy-informed sample partitioning, targeted knowledge distillation, and interaction-aware model ensembling. The code is available at https://github.com/GeWu-Lab/LSMI_Estimator.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-yang25aj, title = {Efficient Quantification of Multimodal Interaction at Sample Level}, author = {Yang, Zequn and Wang, Hongfa and Hu, Di}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {71302--71317}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/yang25aj/yang25aj.pdf}, url = {https://proceedings.mlr.press/v267/yang25aj.html}, abstract = {Interactions between modalities—redundancy, uniqueness, and synergy—collectively determine the composition of multimodal information. Understanding these interactions is crucial for analyzing information dynamics in multimodal systems, yet their accurate sample-level quantification presents significant theoretical and computational challenges. To address this, we introduce the Lightweight Sample-wise Multimodal Interaction (LSMI) estimator, rigorously grounded in pointwise information theory. We first develop a redundancy estimation framework, employing an appropriate pointwise information measure to quantify this most decomposable and measurable interaction. Building upon this, we propose a general interaction estimation method that employs efficient entropy estimation, specifically tailored for sample-wise estimation in continuous distributions. Extensive experiments on synthetic and real-world datasets validate LSMI’s precision and efficiency. Crucially, our sample-wise approach reveals fine-grained sample- and category-level dynamics within multimodal data, enabling practical applications such as redundancy-informed sample partitioning, targeted knowledge distillation, and interaction-aware model ensembling. The code is available at https://github.com/GeWu-Lab/LSMI_Estimator.} }
Endnote
%0 Conference Paper %T Efficient Quantification of Multimodal Interaction at Sample Level %A Zequn Yang %A Hongfa Wang %A Di Hu %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-yang25aj %I PMLR %P 71302--71317 %U https://proceedings.mlr.press/v267/yang25aj.html %V 267 %X Interactions between modalities—redundancy, uniqueness, and synergy—collectively determine the composition of multimodal information. Understanding these interactions is crucial for analyzing information dynamics in multimodal systems, yet their accurate sample-level quantification presents significant theoretical and computational challenges. To address this, we introduce the Lightweight Sample-wise Multimodal Interaction (LSMI) estimator, rigorously grounded in pointwise information theory. We first develop a redundancy estimation framework, employing an appropriate pointwise information measure to quantify this most decomposable and measurable interaction. Building upon this, we propose a general interaction estimation method that employs efficient entropy estimation, specifically tailored for sample-wise estimation in continuous distributions. Extensive experiments on synthetic and real-world datasets validate LSMI’s precision and efficiency. Crucially, our sample-wise approach reveals fine-grained sample- and category-level dynamics within multimodal data, enabling practical applications such as redundancy-informed sample partitioning, targeted knowledge distillation, and interaction-aware model ensembling. The code is available at https://github.com/GeWu-Lab/LSMI_Estimator.
APA
Yang, Z., Wang, H. & Hu, D.. (2025). Efficient Quantification of Multimodal Interaction at Sample Level. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:71302-71317 Available from https://proceedings.mlr.press/v267/yang25aj.html.

Related Material