CVQVAE: A representation learning based method for multi-omics single cell data integration

Tianyu Liu; Grant Greenberg; Ilan Shomorony

CVQVAE: A representation learning based method for multi-omics single cell data integration

Tianyu Liu, Grant Greenberg, Ilan Shomorony

Proceedings of the 17th Machine Learning in Computational Biology meeting, PMLR 200:1-15, 2022.

Abstract

The rapid development of second-generation sequencing has brought about a significant increase in the amount of omics data. Integrating and analyzing these single-cell datasets is a challenging problem. In this paper, we propose a new model, called as CVQVAE, based on a cross-trained VAE, and strengthened by the Vector Quantization technique for multi-omics data integration. CVQVAE projects data vectors from different omics onto a common latent space in such a way that (1) similar cells are close in the latent space and (2) the original biological information present in each of the omics (including cell cycle and trajectory) are preserved. Our model is trained and optimized solely based on the multi-omics data and requires no additional information such as cell-type labels. We empirically demonstrate the stability and efficiency of our method in data integration (alignment) on datasets from a recent competition on Open Problems in Single Cell Analysis.

Cite this Paper

BibTeX


@InProceedings{pmlr-v200-liu22a,
  title = 	 {CVQVAE: A representation learning based method for multi-omics single cell data integration},
  author =       {Liu, Tianyu and Greenberg, Grant and Shomorony, Ilan},
  booktitle = 	 {Proceedings of the 17th Machine Learning in Computational Biology meeting},
  pages = 	 {1--15},
  year = 	 {2022},
  editor = 	 {Knowles, David A and Mostafavi, Sara and Lee, Su-In},
  volume = 	 {200},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--22 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v200/liu22a/liu22a.pdf},
  url = 	 {https://proceedings.mlr.press/v200/liu22a.html},
  abstract = 	 {The rapid development of second-generation sequencing has brought about a significant increase in the amount of omics data. Integrating and analyzing these single-cell datasets is a challenging problem. In this paper, we propose a new model, called as CVQVAE, based on a cross-trained VAE, and strengthened by the Vector Quantization technique for multi-omics data integration. CVQVAE projects data vectors from different omics onto a common latent space in such a way that (1) similar cells are close in the latent space and (2) the original biological information present in each of the omics (including cell cycle and trajectory) are preserved. Our model is trained and optimized solely based on the multi-omics data and requires no additional information such as cell-type labels. We empirically demonstrate the stability and efficiency of our method in data integration (alignment) on datasets from a recent competition on Open Problems in Single Cell Analysis.}
}

Endnote

%0 Conference Paper
%T CVQVAE: A representation learning based method for multi-omics single cell data integration
%A Tianyu Liu
%A Grant Greenberg
%A Ilan Shomorony
%B Proceedings of the 17th Machine Learning in Computational Biology meeting
%C Proceedings of Machine Learning Research
%D 2022
%E David A Knowles
%E Sara Mostafavi
%E Su-In Lee	
%F pmlr-v200-liu22a
%I PMLR
%P 1--15
%U https://proceedings.mlr.press/v200/liu22a.html
%V 200
%X The rapid development of second-generation sequencing has brought about a significant increase in the amount of omics data. Integrating and analyzing these single-cell datasets is a challenging problem. In this paper, we propose a new model, called as CVQVAE, based on a cross-trained VAE, and strengthened by the Vector Quantization technique for multi-omics data integration. CVQVAE projects data vectors from different omics onto a common latent space in such a way that (1) similar cells are close in the latent space and (2) the original biological information present in each of the omics (including cell cycle and trajectory) are preserved. Our model is trained and optimized solely based on the multi-omics data and requires no additional information such as cell-type labels. We empirically demonstrate the stability and efficiency of our method in data integration (alignment) on datasets from a recent competition on Open Problems in Single Cell Analysis.

APA


Liu, T., Greenberg, G. & Shomorony, I.. (2022). CVQVAE: A representation learning based method for multi-omics single cell data integration. Proceedings of the 17th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 200:1-15 Available from https://proceedings.mlr.press/v200/liu22a.html.

Related Material

Download PDF