Bayesian Algorithms for Causal Data Mining
Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008, PMLR 6:121-136, 2010.
We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node \emphX using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of \emphX. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node \emphX based on the identification of Y arcs. Recall that if a node \emphX has two parent nodes \emphA, B and a child node \emphC such that there is no arc between \emphA, B and \emphA, B are not parents of \emphC, then the arc from \emphX to \emphC is called a Y arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node \emphX. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of Y structures and Y arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.