A Simulation Study of Three Related Causal Data Mining Algorithms
Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, PMLR R3:184-191, 2001.
In all scientific domains causality plays a significant role. This study focused on evaluating and refining efficient algorithms to learn causal relationships from observational data. Evaluation of learned causal output is difficult, due to lack of a gold standard in real-world domains. Therefore, we used simulated data from a known causal network in a medical domain-the Alarm network. For causal discovery we used three variants of the Local Causal Discovery (LCD) algorithms, that are referred to as LCDa, LCDb and LCDc. These algorithms use the framework of causal Bayesian Networks to represent causal relationships among model variables. LCDa, LCDb and LCDe take as input a dataset and a partial node ordering, and output purported causes of the form variable $Y$ causally influences variable $Z$. Using the simulated Alarm dataset as input, LCDa had a false positive rate of $0.09$, LCDb $0.08$ and LCDc 0.04. All the algorithms had a true positive rate of about 0.27 . Most of the false positives occurred when a causal relationship was confounded. LCDc output as causal only those causally confounded pairs that had very weak confounding. We identify and discuss the causally confounded relationships that often seem to induce false positive results.