Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data

Sandhya Prabhakaran; Elham Azizi; Ambrose Carr; Dana Pe’er

Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data

Sandhya Prabhakaran, Elham Azizi, Ambrose Carr, Dana Pe’er

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1070-1079, 2016.

Abstract

We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-prabhakaran16,
  title = 	 {Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data},
  author = 	 {Prabhakaran, Sandhya and Azizi, Elham and Carr, Ambrose and Pe’er, Dana},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {1070--1079},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/prabhakaran16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/prabhakaran16.html},
  abstract = 	 {We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.}
}

Endnote

%0 Conference Paper
%T Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data
%A Sandhya Prabhakaran
%A Elham Azizi
%A Ambrose Carr
%A Dana Pe’er
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-prabhakaran16
%I PMLR
%P 1070--1079
%U https://proceedings.mlr.press/v48/prabhakaran16.html
%V 48
%X We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

RIS


TY  - CPAPER
TI  - Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data
AU  - Sandhya Prabhakaran
AU  - Elham Azizi
AU  - Ambrose Carr
AU  - Dana Pe’er
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-prabhakaran16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 1070
EP  - 1079
L1  - http://proceedings.mlr.press/v48/prabhakaran16.pdf
UR  - https://proceedings.mlr.press/v48/prabhakaran16.html
AB  - We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.
ER  -

APA


Prabhakaran, S., Azizi, E., Carr, A. & Pe’er, D.. (2016). Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1070-1079 Available from https://proceedings.mlr.press/v48/prabhakaran16.html.

Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data

Abstract

Cite this Paper

Related Material