Bayesian Algorithms for Causal Data Mining

Subramani Mani; Constantin F. Aliferis; Alexander Statnikov

Bayesian Algorithms for Causal Data Mining

Subramani Mani, Constantin F. Aliferis, Alexander Statnikov

Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008, PMLR 6:121-136, 2010.

Abstract

We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node $X$ using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of $X$. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node $X$ based on the identification of $Y$ arcs. Recall that if a node $X$ has two parent nodes $A, B$ and a child node $C$ such that there is no arc between $A, B$ and $A, B$ are not parents of $C$, then the arc from $X$ to $C$ is called a $Y$ arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node $X$. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of $Y$ structures and $Y$ arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.

Cite this Paper

BibTeX


@InProceedings{pmlr-v6-mani10a,
  title = 	 {Bayesian Algorithms for Causal Data Mining},
  author = 	 {Mani, Subramani and Aliferis, Constantin F. and Statnikov, Alexander},
  booktitle = 	 {Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008},
  pages = 	 {121--136},
  year = 	 {2010},
  editor = 	 {Guyon, Isabelle and Janzing, Dominik and Schölkopf, Bernhard},
  volume = 	 {6},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Whistler, Canada},
  month = 	 {12 Dec},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v6/mani10a/mani10a.pdf},
  url = 	 {https://proceedings.mlr.press/v6/mani10a.html},
  abstract = 	 {We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node $X$ using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of $X$. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node $X$ based on the identification of $Y$ arcs. Recall that if a node $X$ has two parent nodes $A, B$ and a child node $C$ such that there is no arc between $A, B$ and $A, B$ are not parents of $C$, then the arc from $X$ to $C$ is called a $Y$ arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node $X$. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of $Y$ structures and $Y$ arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.}
}

Endnote

%0 Conference Paper
%T Bayesian Algorithms for Causal Data Mining
%A Subramani Mani
%A Constantin F. Aliferis
%A Alexander Statnikov
%B Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008
%C Proceedings of Machine Learning Research
%D 2010
%E Isabelle Guyon
%E Dominik Janzing
%E Bernhard Schölkopf	
%F pmlr-v6-mani10a
%I PMLR
%P 121--136
%U https://proceedings.mlr.press/v6/mani10a.html
%V 6
%X We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node $X$ using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of $X$. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node $X$ based on the identification of $Y$ arcs. Recall that if a node $X$ has two parent nodes $A, B$ and a child node $C$ such that there is no arc between $A, B$ and $A, B$ are not parents of $C$, then the arc from $X$ to $C$ is called a $Y$ arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node $X$. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of $Y$ structures and $Y$ arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.

RIS


TY  - CPAPER
TI  - Bayesian Algorithms for Causal Data Mining
AU  - Subramani Mani
AU  - Constantin F. Aliferis
AU  - Alexander Statnikov
BT  - Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008
DA  - 2010/02/18
ED  - Isabelle Guyon
ED  - Dominik Janzing
ED  - Bernhard Schölkopf	
ID  - pmlr-v6-mani10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 6
SP  - 121
EP  - 136
L1  - http://proceedings.mlr.press/v6/mani10a/mani10a.pdf
UR  - https://proceedings.mlr.press/v6/mani10a.html
AB  - We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node $X$ using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of $X$. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node $X$ based on the identification of $Y$ arcs. Recall that if a node $X$ has two parent nodes $A, B$ and a child node $C$ such that there is no arc between $A, B$ and $A, B$ are not parents of $C$, then the arc from $X$ to $C$ is called a $Y$ arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node $X$. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of $Y$ structures and $Y$ arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.
ER  -

APA


Mani, S., Aliferis, C.F. & Statnikov, A.. (2010). Bayesian Algorithms for Causal Data Mining. Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008, in Proceedings of Machine Learning Research 6:121-136 Available from https://proceedings.mlr.press/v6/mani10a.html.

Related Material

Download PDF