Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs

Vidhi Lalchand, Aditya Ravuri, Emma Dann, Natsuhiko Kumasaka, Dinithi Sumanaweera, Rik G. H. Lindeboom, Shaista Madad, Sarah Teichmann, Neil D. Lawrence
Proceedings of the 17th Machine Learning in Computational Biology meeting, PMLR 200:46-60, 2022.

Abstract

Single-cell RNA-seq datasets are growing in size and complexity, enabling the study of cellular composition changes in various biological/clinical contexts. Scalable dimensionality reduction techniques are in need to disentangle biological variation in them, while accounting for technical and biological confounders. In this work, we extend a popular approach for probabilistic non-linear dimensionality reduction, the Gaussian process latent variable model, to scale to massive single-cell datasets while explicitly accounting for technical and biological confounders. The key idea is to use an augmented kernel which preserves the factorisability of the lower bound allowing for fast stochastic variational inference. We demonstrate its ability to reconstruct previously described latent signatures of innate immunity with 9x speed-up on training time. We further analyse a dataset of blood cells from COVID-19 patients and demonstrate that this framework enables to capture interpretable signatures of infection, while integrating data across individuals and technical batches. Specifically, we explore COVID-19 severity as a latent dimension to refine patient stratification and capture disease-specific gene expression signatures.

Cite this Paper


BibTeX
@InProceedings{pmlr-v200-lalchand22a, title = {Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs}, author = {Lalchand, Vidhi and Ravuri, Aditya and Dann, Emma and Kumasaka, Natsuhiko and Sumanaweera, Dinithi and Lindeboom, Rik G. H. and Madad, Shaista and Teichmann, Sarah and Lawrence, Neil D.}, booktitle = {Proceedings of the 17th Machine Learning in Computational Biology meeting}, pages = {46--60}, year = {2022}, editor = {Knowles, David A and Mostafavi, Sara and Lee, Su-In}, volume = {200}, series = {Proceedings of Machine Learning Research}, month = {21--22 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v200/lalchand22a/lalchand22a.pdf}, url = {https://proceedings.mlr.press/v200/lalchand22a.html}, abstract = {Single-cell RNA-seq datasets are growing in size and complexity, enabling the study of cellular composition changes in various biological/clinical contexts. Scalable dimensionality reduction techniques are in need to disentangle biological variation in them, while accounting for technical and biological confounders. In this work, we extend a popular approach for probabilistic non-linear dimensionality reduction, the Gaussian process latent variable model, to scale to massive single-cell datasets while explicitly accounting for technical and biological confounders. The key idea is to use an augmented kernel which preserves the factorisability of the lower bound allowing for fast stochastic variational inference. We demonstrate its ability to reconstruct previously described latent signatures of innate immunity with 9x speed-up on training time. We further analyse a dataset of blood cells from COVID-19 patients and demonstrate that this framework enables to capture interpretable signatures of infection, while integrating data across individuals and technical batches. Specifically, we explore COVID-19 severity as a latent dimension to refine patient stratification and capture disease-specific gene expression signatures.} }
Endnote
%0 Conference Paper %T Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs %A Vidhi Lalchand %A Aditya Ravuri %A Emma Dann %A Natsuhiko Kumasaka %A Dinithi Sumanaweera %A Rik G. H. Lindeboom %A Shaista Madad %A Sarah Teichmann %A Neil D. Lawrence %B Proceedings of the 17th Machine Learning in Computational Biology meeting %C Proceedings of Machine Learning Research %D 2022 %E David A Knowles %E Sara Mostafavi %E Su-In Lee %F pmlr-v200-lalchand22a %I PMLR %P 46--60 %U https://proceedings.mlr.press/v200/lalchand22a.html %V 200 %X Single-cell RNA-seq datasets are growing in size and complexity, enabling the study of cellular composition changes in various biological/clinical contexts. Scalable dimensionality reduction techniques are in need to disentangle biological variation in them, while accounting for technical and biological confounders. In this work, we extend a popular approach for probabilistic non-linear dimensionality reduction, the Gaussian process latent variable model, to scale to massive single-cell datasets while explicitly accounting for technical and biological confounders. The key idea is to use an augmented kernel which preserves the factorisability of the lower bound allowing for fast stochastic variational inference. We demonstrate its ability to reconstruct previously described latent signatures of innate immunity with 9x speed-up on training time. We further analyse a dataset of blood cells from COVID-19 patients and demonstrate that this framework enables to capture interpretable signatures of infection, while integrating data across individuals and technical batches. Specifically, we explore COVID-19 severity as a latent dimension to refine patient stratification and capture disease-specific gene expression signatures.
APA
Lalchand, V., Ravuri, A., Dann, E., Kumasaka, N., Sumanaweera, D., Lindeboom, R.G.H., Madad, S., Teichmann, S. & Lawrence, N.D.. (2022). Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs. Proceedings of the 17th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 200:46-60 Available from https://proceedings.mlr.press/v200/lalchand22a.html.

Related Material