Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness

Adam Foster, Arpi Vezer, Craig A. Glastonbury, Paidi Creed, Samer Abujudeh, Aaron Sim
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:6578-6621, 2022.

Abstract

Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we show that marginal independence between the representation and a condition variable plays a key role in both of these challenges. We propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty defined in terms of mixtures of the variational posteriors to enforce this independence in latent space. We show that CoMP has attractive theoretical properties compared to previous approaches, and we prove counterfactual identifiability of CoMP under additional assumptions. We demonstrate state-of-the-art performance on a set of challenging tasks including aligning human tumour samples with cancer cell-lines, predicting transcriptome-level perturbation responses, and batch correction on single-cell RNA sequencing data. We also find parallels to fair representation learning and demonstrate that CoMP is competitive on a common task in the field.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-foster22a, title = {Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness}, author = {Foster, Adam and Vezer, Arpi and Glastonbury, Craig A. and Creed, Paidi and Abujudeh, Samer and Sim, Aaron}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {6578--6621}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/foster22a/foster22a.pdf}, url = {https://proceedings.mlr.press/v162/foster22a.html}, abstract = {Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we show that marginal independence between the representation and a condition variable plays a key role in both of these challenges. We propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty defined in terms of mixtures of the variational posteriors to enforce this independence in latent space. We show that CoMP has attractive theoretical properties compared to previous approaches, and we prove counterfactual identifiability of CoMP under additional assumptions. We demonstrate state-of-the-art performance on a set of challenging tasks including aligning human tumour samples with cancer cell-lines, predicting transcriptome-level perturbation responses, and batch correction on single-cell RNA sequencing data. We also find parallels to fair representation learning and demonstrate that CoMP is competitive on a common task in the field.} }
Endnote
%0 Conference Paper %T Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness %A Adam Foster %A Arpi Vezer %A Craig A. Glastonbury %A Paidi Creed %A Samer Abujudeh %A Aaron Sim %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-foster22a %I PMLR %P 6578--6621 %U https://proceedings.mlr.press/v162/foster22a.html %V 162 %X Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we show that marginal independence between the representation and a condition variable plays a key role in both of these challenges. We propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty defined in terms of mixtures of the variational posteriors to enforce this independence in latent space. We show that CoMP has attractive theoretical properties compared to previous approaches, and we prove counterfactual identifiability of CoMP under additional assumptions. We demonstrate state-of-the-art performance on a set of challenging tasks including aligning human tumour samples with cancer cell-lines, predicting transcriptome-level perturbation responses, and batch correction on single-cell RNA sequencing data. We also find parallels to fair representation learning and demonstrate that CoMP is competitive on a common task in the field.
APA
Foster, A., Vezer, A., Glastonbury, C.A., Creed, P., Abujudeh, S. & Sim, A.. (2022). Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:6578-6621 Available from https://proceedings.mlr.press/v162/foster22a.html.

Related Material