Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data

Sandhya Prabhakaran, Elham Azizi, Ambrose Carr, Dana Pe’er
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1070-1079, 2016.

Abstract

We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-prabhakaran16, title = {Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data}, author = {Prabhakaran, Sandhya and Azizi, Elham and Carr, Ambrose and Pe’er, Dana}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {1070--1079}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/prabhakaran16.pdf}, url = {https://proceedings.mlr.press/v48/prabhakaran16.html}, abstract = {We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.} }
Endnote
%0 Conference Paper %T Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data %A Sandhya Prabhakaran %A Elham Azizi %A Ambrose Carr %A Dana Pe’er %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-prabhakaran16 %I PMLR %P 1070--1079 %U https://proceedings.mlr.press/v48/prabhakaran16.html %V 48 %X We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.
RIS
TY - CPAPER TI - Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data AU - Sandhya Prabhakaran AU - Elham Azizi AU - Ambrose Carr AU - Dana Pe’er BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-prabhakaran16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 1070 EP - 1079 L1 - http://proceedings.mlr.press/v48/prabhakaran16.pdf UR - https://proceedings.mlr.press/v48/prabhakaran16.html AB - We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types. ER -
APA
Prabhakaran, S., Azizi, E., Carr, A. & Pe’er, D.. (2016). Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1070-1079 Available from https://proceedings.mlr.press/v48/prabhakaran16.html.

Related Material