Bayesian Algorithms for Causal Data Mining

Subramani Mani, Constantin F. Aliferis, Alexander Statnikov
Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008, PMLR 6:121-136, 2010.

Abstract

We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node $X$ using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of $X$. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node $X$ based on the identification of $Y$ arcs. Recall that if a node $X$ has two parent nodes $A, B$ and a child node $C$ such that there is no arc between $A, B$ and $A, B$ are not parents of $C$, then the arc from $X$ to $C$ is called a $Y$ arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node $X$. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of $Y$ structures and $Y$ arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v6-mani10a, title = {Bayesian Algorithms for Causal Data Mining}, author = {Mani, Subramani and Aliferis, Constantin F. and Statnikov, Alexander}, booktitle = {Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008}, pages = {121--136}, year = {2010}, editor = {Guyon, Isabelle and Janzing, Dominik and Schölkopf, Bernhard}, volume = {6}, series = {Proceedings of Machine Learning Research}, address = {Whistler, Canada}, month = {12 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v6/mani10a/mani10a.pdf}, url = {https://proceedings.mlr.press/v6/mani10a.html}, abstract = {We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node $X$ using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of $X$. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node $X$ based on the identification of $Y$ arcs. Recall that if a node $X$ has two parent nodes $A, B$ and a child node $C$ such that there is no arc between $A, B$ and $A, B$ are not parents of $C$, then the arc from $X$ to $C$ is called a $Y$ arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node $X$. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of $Y$ structures and $Y$ arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.} }
Endnote
%0 Conference Paper %T Bayesian Algorithms for Causal Data Mining %A Subramani Mani %A Constantin F. Aliferis %A Alexander Statnikov %B Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008 %C Proceedings of Machine Learning Research %D 2010 %E Isabelle Guyon %E Dominik Janzing %E Bernhard Schölkopf %F pmlr-v6-mani10a %I PMLR %P 121--136 %U https://proceedings.mlr.press/v6/mani10a.html %V 6 %X We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node $X$ using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of $X$. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node $X$ based on the identification of $Y$ arcs. Recall that if a node $X$ has two parent nodes $A, B$ and a child node $C$ such that there is no arc between $A, B$ and $A, B$ are not parents of $C$, then the arc from $X$ to $C$ is called a $Y$ arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node $X$. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of $Y$ structures and $Y$ arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.
RIS
TY - CPAPER TI - Bayesian Algorithms for Causal Data Mining AU - Subramani Mani AU - Constantin F. Aliferis AU - Alexander Statnikov BT - Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008 DA - 2010/02/18 ED - Isabelle Guyon ED - Dominik Janzing ED - Bernhard Schölkopf ID - pmlr-v6-mani10a PB - PMLR DP - Proceedings of Machine Learning Research VL - 6 SP - 121 EP - 136 L1 - http://proceedings.mlr.press/v6/mani10a/mani10a.pdf UR - https://proceedings.mlr.press/v6/mani10a.html AB - We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node $X$ using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of $X$. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node $X$ based on the identification of $Y$ arcs. Recall that if a node $X$ has two parent nodes $A, B$ and a child node $C$ such that there is no arc between $A, B$ and $A, B$ are not parents of $C$, then the arc from $X$ to $C$ is called a $Y$ arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node $X$. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of $Y$ structures and $Y$ arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data. ER -
APA
Mani, S., Aliferis, C.F. & Statnikov, A.. (2010). Bayesian Algorithms for Causal Data Mining. Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008, in Proceedings of Machine Learning Research 6:121-136 Available from https://proceedings.mlr.press/v6/mani10a.html.

Related Material