FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking

Lorenzo Bini, Fatemeh Nassajian Mojarrad, Margarita Liarou, Thomas Matthes, Stephane Marchand-Maillet
Proceedings of the fifth Conference on Health, Inference, and Learning, PMLR 248:326-338, 2024.

Abstract

This paper presents FlowCyt, the first comprehensive benchmark for multi-class single-cell classification in flow cytometry data. The dataset comprises bone marrow samples from 30 patients, with each cell characterized by twelve markers. Ground truth labels identify five hematological cell types: T lymphocytes, B lymphocytes, Monocytes, Mast cells, and Hematopoietic Stem/Progenitor Cells (HSPCs). Experiments utilize supervised inductive learning and semi-supervised transductive learning on up to 1 million cells per patient. Baseline methods include Gaussian Mixture Models, XGBoost, Random Forests, Deep Neural Networks, and Graph Neural Networks (GNNs). GNNs demonstrate superior performance by exploiting spatial relationships in graph-encoded data. The benchmark allows standardized evaluation of clinically relevant classification tasks, along with exploratory analyses to gain insights into hematological cell phenotypes. This represents the first public flow cytometry benchmark with a richly annotated, heterogeneous dataset. It will empower the development and rigorous assessment of novel methodologies for single-cell analysis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v248-bini24a, title = {FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking}, author = {Bini, Lorenzo and Mojarrad, Fatemeh Nassajian and Liarou, Margarita and Matthes, Thomas and Marchand-Maillet, Stephane}, booktitle = {Proceedings of the fifth Conference on Health, Inference, and Learning}, pages = {326--338}, year = {2024}, editor = {Pollard, Tom and Choi, Edward and Singhal, Pankhuri and Hughes, Michael and Sizikova, Elena and Mortazavi, Bobak and Chen, Irene and Wang, Fei and Sarker, Tasmie and McDermott, Matthew and Ghassemi, Marzyeh}, volume = {248}, series = {Proceedings of Machine Learning Research}, month = {27--28 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v248/main/assets/bini24a/bini24a.pdf}, url = {https://proceedings.mlr.press/v248/bini24a.html}, abstract = {This paper presents FlowCyt, the first comprehensive benchmark for multi-class single-cell classification in flow cytometry data. The dataset comprises bone marrow samples from 30 patients, with each cell characterized by twelve markers. Ground truth labels identify five hematological cell types: T lymphocytes, B lymphocytes, Monocytes, Mast cells, and Hematopoietic Stem/Progenitor Cells (HSPCs). Experiments utilize supervised inductive learning and semi-supervised transductive learning on up to 1 million cells per patient. Baseline methods include Gaussian Mixture Models, XGBoost, Random Forests, Deep Neural Networks, and Graph Neural Networks (GNNs). GNNs demonstrate superior performance by exploiting spatial relationships in graph-encoded data. The benchmark allows standardized evaluation of clinically relevant classification tasks, along with exploratory analyses to gain insights into hematological cell phenotypes. This represents the first public flow cytometry benchmark with a richly annotated, heterogeneous dataset. It will empower the development and rigorous assessment of novel methodologies for single-cell analysis.} }
Endnote
%0 Conference Paper %T FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking %A Lorenzo Bini %A Fatemeh Nassajian Mojarrad %A Margarita Liarou %A Thomas Matthes %A Stephane Marchand-Maillet %B Proceedings of the fifth Conference on Health, Inference, and Learning %C Proceedings of Machine Learning Research %D 2024 %E Tom Pollard %E Edward Choi %E Pankhuri Singhal %E Michael Hughes %E Elena Sizikova %E Bobak Mortazavi %E Irene Chen %E Fei Wang %E Tasmie Sarker %E Matthew McDermott %E Marzyeh Ghassemi %F pmlr-v248-bini24a %I PMLR %P 326--338 %U https://proceedings.mlr.press/v248/bini24a.html %V 248 %X This paper presents FlowCyt, the first comprehensive benchmark for multi-class single-cell classification in flow cytometry data. The dataset comprises bone marrow samples from 30 patients, with each cell characterized by twelve markers. Ground truth labels identify five hematological cell types: T lymphocytes, B lymphocytes, Monocytes, Mast cells, and Hematopoietic Stem/Progenitor Cells (HSPCs). Experiments utilize supervised inductive learning and semi-supervised transductive learning on up to 1 million cells per patient. Baseline methods include Gaussian Mixture Models, XGBoost, Random Forests, Deep Neural Networks, and Graph Neural Networks (GNNs). GNNs demonstrate superior performance by exploiting spatial relationships in graph-encoded data. The benchmark allows standardized evaluation of clinically relevant classification tasks, along with exploratory analyses to gain insights into hematological cell phenotypes. This represents the first public flow cytometry benchmark with a richly annotated, heterogeneous dataset. It will empower the development and rigorous assessment of novel methodologies for single-cell analysis.
APA
Bini, L., Mojarrad, F.N., Liarou, M., Matthes, T. & Marchand-Maillet, S.. (2024). FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking. Proceedings of the fifth Conference on Health, Inference, and Learning, in Proceedings of Machine Learning Research 248:326-338 Available from https://proceedings.mlr.press/v248/bini24a.html.

Related Material