Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders

Rogelio A. Mancisidor, Robert Jenssen, Shujian Yu, Michael Kampffmeyer
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:1-26, 2025.

Abstract

Multimodal learning with variational autoencoders (VAEs) requires estimating joint distributions to evaluate the evidence lower bound (ELBO). Current methods, the product and mixture of experts, aggregate single-modality distributions assuming independence for simplicity, which is an overoptimistic assumption. This research introduces a novel methodology for aggregating single-modality distributions by exploiting the principle of consensus of dependent experts (CoDE), which circumvents the aforementioned assumption. Utilizing the CoDE method, we propose a novel ELBO that approximates the joint likelihood of the multimodal data by learning the contribution of each subset of modalities. The resulting CoDE-VAE model demonstrates better performance in terms of balancing the trade-off between generative coherence and generative quality, as well as generating more precise log-likelihood estimations. CoDE-VAE further minimizes the generative quality gap as the number of modalities increases. In certain cases, it reaches a generative quality similar to that of unimodal VAEs, which is a desirable property that is lacking in most current methods. Finally, the classification accuracy achieved by CoDE-VAE is comparable to that of state-of-the-art multimodal VAE models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-a-mancisidor25a, title = {Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders}, author = {A. Mancisidor, Rogelio and Jenssen, Robert and Yu, Shujian and Kampffmeyer, Michael}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {1--26}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/a-mancisidor25a/a-mancisidor25a.pdf}, url = {https://proceedings.mlr.press/v267/a-mancisidor25a.html}, abstract = {Multimodal learning with variational autoencoders (VAEs) requires estimating joint distributions to evaluate the evidence lower bound (ELBO). Current methods, the product and mixture of experts, aggregate single-modality distributions assuming independence for simplicity, which is an overoptimistic assumption. This research introduces a novel methodology for aggregating single-modality distributions by exploiting the principle of consensus of dependent experts (CoDE), which circumvents the aforementioned assumption. Utilizing the CoDE method, we propose a novel ELBO that approximates the joint likelihood of the multimodal data by learning the contribution of each subset of modalities. The resulting CoDE-VAE model demonstrates better performance in terms of balancing the trade-off between generative coherence and generative quality, as well as generating more precise log-likelihood estimations. CoDE-VAE further minimizes the generative quality gap as the number of modalities increases. In certain cases, it reaches a generative quality similar to that of unimodal VAEs, which is a desirable property that is lacking in most current methods. Finally, the classification accuracy achieved by CoDE-VAE is comparable to that of state-of-the-art multimodal VAE models.} }
Endnote
%0 Conference Paper %T Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders %A Rogelio A. Mancisidor %A Robert Jenssen %A Shujian Yu %A Michael Kampffmeyer %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-a-mancisidor25a %I PMLR %P 1--26 %U https://proceedings.mlr.press/v267/a-mancisidor25a.html %V 267 %X Multimodal learning with variational autoencoders (VAEs) requires estimating joint distributions to evaluate the evidence lower bound (ELBO). Current methods, the product and mixture of experts, aggregate single-modality distributions assuming independence for simplicity, which is an overoptimistic assumption. This research introduces a novel methodology for aggregating single-modality distributions by exploiting the principle of consensus of dependent experts (CoDE), which circumvents the aforementioned assumption. Utilizing the CoDE method, we propose a novel ELBO that approximates the joint likelihood of the multimodal data by learning the contribution of each subset of modalities. The resulting CoDE-VAE model demonstrates better performance in terms of balancing the trade-off between generative coherence and generative quality, as well as generating more precise log-likelihood estimations. CoDE-VAE further minimizes the generative quality gap as the number of modalities increases. In certain cases, it reaches a generative quality similar to that of unimodal VAEs, which is a desirable property that is lacking in most current methods. Finally, the classification accuracy achieved by CoDE-VAE is comparable to that of state-of-the-art multimodal VAE models.
APA
A. Mancisidor, R., Jenssen, R., Yu, S. & Kampffmeyer, M.. (2025). Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:1-26 Available from https://proceedings.mlr.press/v267/a-mancisidor25a.html.

Related Material