Learning Optimal Multimodal Information Bottleneck Representations

Qilong Wu; Yiyang Shao; Jun Wang; Xiaobo Sun

Learning Optimal Multimodal Information Bottleneck Representations

Qilong Wu, Yiyang Shao, Jun Wang, Xiaobo Sun

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:67584-67622, 2025.

Abstract

Leveraging high-quality joint representations from multimodal data can greatly enhance model performance in various machine-learning based applications. Recent multimodal learning methods, based on the multimodal information bottleneck (MIB) principle, aim to generate optimal MIB with maximal task-relevant information and minimal superfluous information via regularization. However, these methods often set regularization weights in an ad hoc manner and overlook imbalanced task-relevant information across modalities, limiting their ability to achieve optimal MIB. To address this gap, we propose a novel multimodal learning framework, Optimal Multimodal Information Bottleneck (OMIB), whose optimization objective guarantees the achievability of optimal MIB by setting the regularization weight within a theoretically derived bound. OMIB further addresses imbalanced task-relevant information by dynamically adjusting regularization weights per modality, ensuring the inclusion of all task-relevant information. Moreover, we establish a solid information-theoretical foundation for OMIB’s optimization and implement it under the variational approximation framework for computational efficiency. Finally, we empirically validate the OMIB’s theoretical properties on synthetic data and demonstrate its superiority over the state-of-the-art benchmark methods in various downstream tasks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-wu25x,
  title = 	 {Learning Optimal Multimodal Information Bottleneck Representations},
  author =       {Wu, Qilong and Shao, Yiyang and Wang, Jun and Sun, Xiaobo},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {67584--67622},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wu25x/wu25x.pdf},
  url = 	 {https://proceedings.mlr.press/v267/wu25x.html},
  abstract = 	 {Leveraging high-quality joint representations from multimodal data can greatly enhance model performance in various machine-learning based applications. Recent multimodal learning methods, based on the multimodal information bottleneck (MIB) principle, aim to generate optimal MIB with maximal task-relevant information and minimal superfluous information via regularization. However, these methods often set regularization weights in an ad hoc manner and overlook imbalanced task-relevant information across modalities, limiting their ability to achieve optimal MIB. To address this gap, we propose a novel multimodal learning framework, Optimal Multimodal Information Bottleneck (OMIB), whose optimization objective guarantees the achievability of optimal MIB by setting the regularization weight within a theoretically derived bound. OMIB further addresses imbalanced task-relevant information by dynamically adjusting regularization weights per modality, ensuring the inclusion of all task-relevant information. Moreover, we establish a solid information-theoretical foundation for OMIB’s optimization and implement it under the variational approximation framework for computational efficiency. Finally, we empirically validate the OMIB’s theoretical properties on synthetic data and demonstrate its superiority over the state-of-the-art benchmark methods in various downstream tasks.}
}

Endnote

%0 Conference Paper
%T Learning Optimal Multimodal Information Bottleneck Representations
%A Qilong Wu
%A Yiyang Shao
%A Jun Wang
%A Xiaobo Sun
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-wu25x
%I PMLR
%P 67584--67622
%U https://proceedings.mlr.press/v267/wu25x.html
%V 267
%X Leveraging high-quality joint representations from multimodal data can greatly enhance model performance in various machine-learning based applications. Recent multimodal learning methods, based on the multimodal information bottleneck (MIB) principle, aim to generate optimal MIB with maximal task-relevant information and minimal superfluous information via regularization. However, these methods often set regularization weights in an ad hoc manner and overlook imbalanced task-relevant information across modalities, limiting their ability to achieve optimal MIB. To address this gap, we propose a novel multimodal learning framework, Optimal Multimodal Information Bottleneck (OMIB), whose optimization objective guarantees the achievability of optimal MIB by setting the regularization weight within a theoretically derived bound. OMIB further addresses imbalanced task-relevant information by dynamically adjusting regularization weights per modality, ensuring the inclusion of all task-relevant information. Moreover, we establish a solid information-theoretical foundation for OMIB’s optimization and implement it under the variational approximation framework for computational efficiency. Finally, we empirically validate the OMIB’s theoretical properties on synthetic data and demonstrate its superiority over the state-of-the-art benchmark methods in various downstream tasks.

APA

Wu, Q., Shao, Y., Wang, J. & Sun, X.. (2025). Learning Optimal Multimodal Information Bottleneck Representations. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:67584-67622 Available from https://proceedings.mlr.press/v267/wu25x.html.

Learning Optimal Multimodal Information Bottleneck Representations

Abstract

Cite this Paper

Related Material