Causal Discovery for Linear Mixed Data

Yan Zeng; Shohei Shimizu; Hidetoshi Matsui; Fuchun Sun

Causal Discovery for Linear Mixed Data

Yan Zeng, Shohei Shimizu, Hidetoshi Matsui, Fuchun Sun

Proceedings of the First Conference on Causal Learning and Reasoning, PMLR 177:994-1009, 2022.

Abstract

Discovery of causal relationships from observational data, especially from mixed data that consist of both continuous and discrete variables, is a fundamental yet challenging problem. Traditional methods focus on polishing the data type processing policy, which may lose data information. Compared with such methods, the constraint-based and score-based methods for mixed data derive certain conditional independence tests or score functions from the data’s characteristics. However, they may return the Markov equivalence class due to the lack of identifiability guarantees, which may limit their applicability or hinder their interpretability of causal graphs. Thus, in this paper, based on the structural causal models of continuous and discrete variables, we provide sufficient identifiability conditions in bivariate as well as multivariate cases. We show that if the data follow our proposed restricted Linear Mixed causal model (LiM), such a model is identifiable. In addition, we proposed a two-step hybrid method to discover the causal structure for mixed data. Experiments on both synthetic and real-world data empirically demonstrate the identifiability and efficacy of our proposed LiM model.

Cite this Paper

BibTeX


@InProceedings{pmlr-v177-zeng22a,
  title = 	 {Causal Discovery for Linear Mixed Data},
  author =       {Zeng, Yan and Shimizu, Shohei and Matsui, Hidetoshi and Sun, Fuchun},
  booktitle = 	 {Proceedings of the First Conference on Causal Learning and Reasoning},
  pages = 	 {994--1009},
  year = 	 {2022},
  editor = 	 {Schölkopf, Bernhard and Uhler, Caroline and Zhang, Kun},
  volume = 	 {177},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {11--13 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v177/zeng22a/zeng22a.pdf},
  url = 	 {https://proceedings.mlr.press/v177/zeng22a.html},
  abstract = 	 {Discovery of causal relationships from observational data, especially from mixed data that consist of both continuous and discrete variables, is a fundamental yet challenging problem. Traditional methods focus on polishing the data type processing policy, which may lose data information. Compared with such methods, the constraint-based and score-based methods for mixed data derive certain conditional independence tests or score functions from the data’s characteristics. However, they may return the Markov equivalence class due to the lack of identifiability guarantees, which may limit their applicability or hinder their interpretability of causal graphs. Thus, in this paper, based on the structural causal models of continuous and discrete variables, we provide sufficient identifiability conditions in bivariate as well as multivariate cases. We show that if the data follow our proposed restricted Linear Mixed causal model (LiM), such a model is identifiable. In addition, we proposed a two-step hybrid method to discover the causal structure for mixed data. Experiments on both synthetic and real-world data empirically demonstrate the identifiability and efficacy of our proposed LiM model. }
}

Endnote

%0 Conference Paper
%T Causal Discovery for Linear Mixed Data
%A Yan Zeng
%A Shohei Shimizu
%A Hidetoshi Matsui
%A Fuchun Sun
%B Proceedings of the First Conference on Causal Learning and Reasoning
%C Proceedings of Machine Learning Research
%D 2022
%E Bernhard Schölkopf
%E Caroline Uhler
%E Kun Zhang	
%F pmlr-v177-zeng22a
%I PMLR
%P 994--1009
%U https://proceedings.mlr.press/v177/zeng22a.html
%V 177
%X Discovery of causal relationships from observational data, especially from mixed data that consist of both continuous and discrete variables, is a fundamental yet challenging problem. Traditional methods focus on polishing the data type processing policy, which may lose data information. Compared with such methods, the constraint-based and score-based methods for mixed data derive certain conditional independence tests or score functions from the data’s characteristics. However, they may return the Markov equivalence class due to the lack of identifiability guarantees, which may limit their applicability or hinder their interpretability of causal graphs. Thus, in this paper, based on the structural causal models of continuous and discrete variables, we provide sufficient identifiability conditions in bivariate as well as multivariate cases. We show that if the data follow our proposed restricted Linear Mixed causal model (LiM), such a model is identifiable. In addition, we proposed a two-step hybrid method to discover the causal structure for mixed data. Experiments on both synthetic and real-world data empirically demonstrate the identifiability and efficacy of our proposed LiM model.

APA


Zeng, Y., Shimizu, S., Matsui, H. & Sun, F.. (2022). Causal Discovery for Linear Mixed Data. Proceedings of the First Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 177:994-1009 Available from https://proceedings.mlr.press/v177/zeng22a.html.

Related Material

Download PDF