Directed Graphical Models and Causal Discovery for Zero-Inflated Data

Shiqing Yu; Mathias Drton; Ali Shojaie

Directed Graphical Models and Causal Discovery for Zero-Inflated Data

Shiqing Yu, Mathias Drton, Ali Shojaie

Proceedings of the Second Conference on Causal Learning and Reasoning, PMLR 213:27-67, 2023.

Abstract

With advances in technology, gene expression measurements from single cells can be used to gain refined insights into regulatory relationships among genes. Directed graphical models are well-suited to explore such (cause-effect) relationships. However, statistical analyses of single cell data are complicated by the fact that the data often show zero-inflated expression patterns. To address this challenge, we propose directed graphical models that are based on Hurdle conditional distributions parametrized in terms of polynomials in parent variables and their $0/1$ indicators of being zero or nonzero. While directed graphs for Gaussian models are only identifiable up to an equivalence class in general, we show that, under a natural and weak assumption, the exact directed acyclic graph of our zero-inflated models can be identified. We propose methods for graph recovery, apply our model to real single-cell gene expression data on T helper cells, and show simulated experiments that validate the identifiability and graph estimation methods in practice.

Cite this Paper

BibTeX


@InProceedings{pmlr-v213-yu23a,
  title = 	 {Directed Graphical Models and Causal Discovery for Zero-Inflated Data},
  author =       {Yu, Shiqing and Drton, Mathias and Shojaie, Ali},
  booktitle = 	 {Proceedings of the Second Conference on Causal Learning and Reasoning},
  pages = 	 {27--67},
  year = 	 {2023},
  editor = 	 {van der Schaar, Mihaela and Zhang, Cheng and Janzing, Dominik},
  volume = 	 {213},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {11--14 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v213/yu23a/yu23a.pdf},
  url = 	 {https://proceedings.mlr.press/v213/yu23a.html},
  abstract = 	 {With advances in technology, gene expression measurements from single cells can be used to gain refined insights into regulatory relationships among genes. Directed graphical models are well-suited to explore such (cause-effect) relationships. However, statistical analyses of single cell data are complicated by the fact that the data often show zero-inflated expression patterns. To address this challenge, we propose directed graphical models that are based on Hurdle conditional distributions parametrized in terms of polynomials in parent variables and their $0/1$ indicators of being zero or nonzero. While directed graphs for Gaussian models are only identifiable up to an equivalence class in general, we show that, under a natural and weak assumption, the exact directed acyclic graph of our zero-inflated models can be identified. We propose methods for graph recovery, apply our model to real single-cell gene expression data on T helper cells, and show simulated experiments that validate the identifiability and graph estimation methods in practice.}
}

Endnote

%0 Conference Paper
%T Directed Graphical Models and Causal Discovery for Zero-Inflated Data
%A Shiqing Yu
%A Mathias Drton
%A Ali Shojaie
%B Proceedings of the Second Conference on Causal Learning and Reasoning
%C Proceedings of Machine Learning Research
%D 2023
%E Mihaela van der Schaar
%E Cheng Zhang
%E Dominik Janzing	
%F pmlr-v213-yu23a
%I PMLR
%P 27--67
%U https://proceedings.mlr.press/v213/yu23a.html
%V 213
%X With advances in technology, gene expression measurements from single cells can be used to gain refined insights into regulatory relationships among genes. Directed graphical models are well-suited to explore such (cause-effect) relationships. However, statistical analyses of single cell data are complicated by the fact that the data often show zero-inflated expression patterns. To address this challenge, we propose directed graphical models that are based on Hurdle conditional distributions parametrized in terms of polynomials in parent variables and their $0/1$ indicators of being zero or nonzero. While directed graphs for Gaussian models are only identifiable up to an equivalence class in general, we show that, under a natural and weak assumption, the exact directed acyclic graph of our zero-inflated models can be identified. We propose methods for graph recovery, apply our model to real single-cell gene expression data on T helper cells, and show simulated experiments that validate the identifiability and graph estimation methods in practice.

APA


Yu, S., Drton, M. & Shojaie, A.. (2023). Directed Graphical Models and Causal Discovery for Zero-Inflated Data. Proceedings of the Second Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 213:27-67 Available from https://proceedings.mlr.press/v213/yu23a.html.

Directed Graphical Models and Causal Discovery for Zero-Inflated Data

Abstract

Cite this Paper

Related Material