Geometric Multimodal Contrastive Representation Learning

Petra Poklukar; Miguel Vasco; Hang Yin; Francisco S. Melo; Ana Paiva; Danica Kragic

Geometric Multimodal Contrastive Representation Learning

Petra Poklukar, Miguel Vasco, Hang Yin, Francisco S. Melo, Ana Paiva, Danica Kragic

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:17782-17800, 2022.

Abstract

Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-poklukar22a,
  title = 	 {Geometric Multimodal Contrastive Representation Learning},
  author =       {Poklukar, Petra and Vasco, Miguel and Yin, Hang and Melo, Francisco S. and Paiva, Ana and Kragic, Danica},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {17782--17800},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/poklukar22a/poklukar22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/poklukar22a.html},
  abstract = 	 {Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.}
}

Endnote

%0 Conference Paper
%T Geometric Multimodal Contrastive Representation Learning
%A Petra Poklukar
%A Miguel Vasco
%A Hang Yin
%A Francisco S. Melo
%A Ana Paiva
%A Danica Kragic
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-poklukar22a
%I PMLR
%P 17782--17800
%U https://proceedings.mlr.press/v162/poklukar22a.html
%V 162
%X Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

APA


Poklukar, P., Vasco, M., Yin, H., Melo, F.S., Paiva, A. & Kragic, D.. (2022). Geometric Multimodal Contrastive Representation Learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:17782-17800 Available from https://proceedings.mlr.press/v162/poklukar22a.html.

Related Material

Download PDF