Bayesian Algorithms for Causal Data Mining

Subramani Mani, Constantin F. Aliferis, Alexander Statnikov
; Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008, PMLR 6:121-136, 2010.

Abstract

We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node \emphX using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of \emphX. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node \emphX based on the identification of Y arcs. Recall that if a node \emphX has two parent nodes \emphA, B and a child node \emphC such that there is no arc between \emphA, B and \emphA, B are not parents of \emphC, then the arc from \emphX to \emphC is called a Y arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node \emphX. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of Y structures and Y arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v6-mani10a, title = {Bayesian Algorithms for Causal Data Mining}, author = {Subramani Mani and Constantin F. Aliferis and Alexander Statnikov}, pages = {121--136}, year = {2010}, editor = {Isabelle Guyon and Dominik Janzing and Bernhard Schölkopf}, volume = {6}, series = {Proceedings of Machine Learning Research}, address = {Whistler, Canada}, month = {12 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v6/mani10a/mani10a.pdf}, url = {http://proceedings.mlr.press/v6/mani10a.html}, abstract = {We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node \emphX using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of \emphX. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node \emphX based on the identification of Y arcs. Recall that if a node \emphX has two parent nodes \emphA, B and a child node \emphC such that there is no arc between \emphA, B and \emphA, B are not parents of \emphC, then the arc from \emphX to \emphC is called a Y arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node \emphX. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of Y structures and Y arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.} }
Endnote
%0 Conference Paper %T Bayesian Algorithms for Causal Data Mining %A Subramani Mani %A Constantin F. Aliferis %A Alexander Statnikov %B Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008 %C Proceedings of Machine Learning Research %D 2010 %E Isabelle Guyon %E Dominik Janzing %E Bernhard Schölkopf %F pmlr-v6-mani10a %I PMLR %J Proceedings of Machine Learning Research %P 121--136 %U http://proceedings.mlr.press %V 6 %W PMLR %X We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node \emphX using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of \emphX. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node \emphX based on the identification of Y arcs. Recall that if a node \emphX has two parent nodes \emphA, B and a child node \emphC such that there is no arc between \emphA, B and \emphA, B are not parents of \emphC, then the arc from \emphX to \emphC is called a Y arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node \emphX. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of Y structures and Y arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data.
RIS
TY - CPAPER TI - Bayesian Algorithms for Causal Data Mining AU - Subramani Mani AU - Constantin F. Aliferis AU - Alexander Statnikov BT - Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008 PY - 2010/02/18 DA - 2010/02/18 ED - Isabelle Guyon ED - Dominik Janzing ED - Bernhard Schölkopf ID - pmlr-v6-mani10a PB - PMLR SP - 121 DP - PMLR EP - 136 L1 - http://proceedings.mlr.press/v6/mani10a/mani10a.pdf UR - http://proceedings.mlr.press/v6/mani10a.html AB - We present two Bayesian algorithms CD-B and CD-H for discovering unconfounded cause and effect relationships from observational data without assuming causal sufficiency which precludes hidden common causes for the observed variables. The CD-B algorithm first estimates the Markov blanket of a node \emphX using a Bayesian greedy search method and then applies Bayesian scoring methods to discriminate the parents and children of \emphX. Using the set of parents and set of children CD-B constructs a global Bayesian network and outputs the causal effects of a node \emphX based on the identification of Y arcs. Recall that if a node \emphX has two parent nodes \emphA, B and a child node \emphC such that there is no arc between \emphA, B and \emphA, B are not parents of \emphC, then the arc from \emphX to \emphC is called a Y arc. The CD-H algorithm uses the MMPC algorithm to estimate the union of parents and children of a target node \emphX. The subsequent steps are similar to those of CD-B. We evaluated the CD-B and CD-H algorithms empirically based on simulated data from four different Bayesian networks. We also present comparative results based on the identification of Y structures and Y arcs from the output of the PC, MMHC and FCI algorithms. The results appear promising for mining causal relationships that are unconfounded by hidden variables from observational data. ER -
APA
Mani, S., Aliferis, C.F. & Statnikov, A.. (2010). Bayesian Algorithms for Causal Data Mining. Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008, in PMLR 6:121-136

Related Material