A Supervised Contrastive Framework for Learning Disentangled Representations of Cellular Perturbation Data

Xinming Tu; Jan-Christian Hütter; Zitong Jerry Wang; Takamasa Kudo; Aviv Regev; Romain Lopez

A Supervised Contrastive Framework for Learning Disentangled Representations of Cellular Perturbation Data

Xinming Tu, Jan-Christian Hütter, Zitong Jerry Wang, Takamasa Kudo, Aviv Regev, Romain Lopez

Proceedings of the 18th Machine Learning in Computational Biology meeting, PMLR 240:90-100, 2024.

Abstract

CRISPR technology, combined with single-cell RNA-Seq, has opened the way to large scale pooled perturbation screens, allowing more systematic interrogations of gene functions in cells at scale. However, such Perturb-seq data poses many analysis challenges, due to its high-dimensionality, high level of technical noise, and variable Cas9 efficiency. The single-cell nature of the data also poses its own challenges, as we observe the heterogeneity of phenotypes in the unperturbed cells, along with the effect of the perturbations. All in all, these characteristics make it difficult to discern subtler effects. Existing tools, like mixscape and ContrastiveVI, provide partial solutions, but may oversimplify biological dynamics, or have low power to characterize perturbations with a smaller effect size. Here, we address these limitations by introducing the Supervised Contrastive Variational Autoencoder (SC-VAE). SC-VAE integrates guide RNA identity with gene expression data, ensuring a more discriminative analysis, and adopts the Hilbert-Schmidt Independence Criterion as a way to achieve disentangled representations, separating the heterogeneity in the control population from the effect of the perturbations. Evaluation on large-scale data sets highlights SC-VAE’s superior sensitivity in identifying perturbation effects compared to ContrastiveVI, scVI and PCA. The perturbation embeddings better reflect known protein complexes (evaluated on CORUM), while its classifier offers promise in identifying assignment errors and cells escaping the perturbation phenotype. SC-VAE is readily applicable across diverse perturbation data sets.

Cite this Paper

BibTeX


@InProceedings{pmlr-v240-tu24a,
  title = 	 {A Supervised Contrastive Framework for Learning Disentangled Representations of Cellular Perturbation Data},
  author =       {Tu, Xinming and H\"utter, Jan-Christian and Wang, Zitong Jerry and Kudo, Takamasa and Regev, Aviv and Lopez, Romain},
  booktitle = 	 {Proceedings of the 18th Machine Learning in Computational Biology meeting},
  pages = 	 {90--100},
  year = 	 {2024},
  editor = 	 {Knowles, David A. and Mostafavi, Sara},
  volume = 	 {240},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Nov--01 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v240/tu24a/tu24a.pdf},
  url = 	 {https://proceedings.mlr.press/v240/tu24a.html},
  abstract = 	 {CRISPR technology, combined with single-cell RNA-Seq, has opened the way to large scale pooled perturbation screens, allowing more systematic interrogations of gene functions in cells at scale. However, such Perturb-seq data poses many analysis challenges, due to its high-dimensionality, high level of technical noise, and variable Cas9 efficiency. The single-cell nature of the data also poses its own challenges, as we observe the heterogeneity of phenotypes in the unperturbed cells, along with the effect of the perturbations. All in all, these characteristics make it difficult to discern subtler effects. Existing tools, like mixscape and ContrastiveVI, provide partial solutions, but may oversimplify biological dynamics, or have low power to characterize perturbations with a smaller effect size. Here, we address these limitations by introducing the Supervised Contrastive Variational Autoencoder (SC-VAE). SC-VAE integrates guide RNA identity with gene expression data, ensuring a more discriminative analysis, and adopts the Hilbert-Schmidt Independence Criterion as a way to achieve disentangled representations, separating the heterogeneity in the control population from the effect of the perturbations. Evaluation on large-scale data sets highlights SC-VAE’s superior sensitivity in identifying perturbation effects compared to ContrastiveVI, scVI and PCA. The perturbation embeddings better reflect known protein complexes (evaluated on CORUM), while its classifier offers promise in identifying assignment errors and cells escaping the perturbation phenotype. SC-VAE is readily applicable across diverse perturbation data sets.}
}

Endnote

%0 Conference Paper
%T A Supervised Contrastive Framework for Learning Disentangled Representations of Cellular Perturbation Data
%A Xinming Tu
%A Jan-Christian Hütter
%A Zitong Jerry Wang
%A Takamasa Kudo
%A Aviv Regev
%A Romain Lopez
%B Proceedings of the 18th Machine Learning in Computational Biology meeting
%C Proceedings of Machine Learning Research
%D 2024
%E David A. Knowles
%E Sara Mostafavi	
%F pmlr-v240-tu24a
%I PMLR
%P 90--100
%U https://proceedings.mlr.press/v240/tu24a.html
%V 240
%X CRISPR technology, combined with single-cell RNA-Seq, has opened the way to large scale pooled perturbation screens, allowing more systematic interrogations of gene functions in cells at scale. However, such Perturb-seq data poses many analysis challenges, due to its high-dimensionality, high level of technical noise, and variable Cas9 efficiency. The single-cell nature of the data also poses its own challenges, as we observe the heterogeneity of phenotypes in the unperturbed cells, along with the effect of the perturbations. All in all, these characteristics make it difficult to discern subtler effects. Existing tools, like mixscape and ContrastiveVI, provide partial solutions, but may oversimplify biological dynamics, or have low power to characterize perturbations with a smaller effect size. Here, we address these limitations by introducing the Supervised Contrastive Variational Autoencoder (SC-VAE). SC-VAE integrates guide RNA identity with gene expression data, ensuring a more discriminative analysis, and adopts the Hilbert-Schmidt Independence Criterion as a way to achieve disentangled representations, separating the heterogeneity in the control population from the effect of the perturbations. Evaluation on large-scale data sets highlights SC-VAE’s superior sensitivity in identifying perturbation effects compared to ContrastiveVI, scVI and PCA. The perturbation embeddings better reflect known protein complexes (evaluated on CORUM), while its classifier offers promise in identifying assignment errors and cells escaping the perturbation phenotype. SC-VAE is readily applicable across diverse perturbation data sets.

APA


Tu, X., Hütter, J., Wang, Z.J., Kudo, T., Regev, A. & Lopez, R.. (2024). A Supervised Contrastive Framework for Learning Disentangled Representations of Cellular Perturbation Data. Proceedings of the 18th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 240:90-100 Available from https://proceedings.mlr.press/v240/tu24a.html.

Related Material

Download PDF