Bayesian Trees for Automated Cytometry Data Analysis
Proceedings of the 3rd Machine Learning for Healthcare Conference, PMLR 85:465-483, 2018.
Cytometry is an important single cell analysis technology in furthering our understanding of cellular biological processes and in supporting clinical diagnoses across a variety hematological and immunological conditions. Current data analysis workflows for cytometry data rely on a manual process called gating to classify cells into canonical types. This dependence on human annotation significantly limits the rate, reproducibility, and scope of cytometry’s use in both biological research and clinical practice. We develop a novel Bayesian approach for automated gating that classifies cells into different types by combining cell-level marker measurements with an informative prior. The Bayesian approach allows for the incorporation of biologically-meaningful prior information that captures the domain expertise of human experts. The inference algorithm results in a hierarchically-structured classification of individual cells in a manner that mimics the tree-structured recursive process of manual gating, making the results readily interpretable. The approach can be extended in a natural fashion to handle data from multiple different samples by the incorporation of random effects in the Bayesian model. The proposed approach is evaluated using mass cytometry data, on the problems of unsupervised cell classification and supervised clinical diagnosis, illustrating the benefits of both incorporating prior knowledge and sharing information across multiple samples.