A Simulation Study of Three Related Causal Data Mining Algorithms

Subramani Mani, Gregory F. Cooper
Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, PMLR R3:184-191, 2001.

Abstract

In all scientific domains causality plays a significant role. This study focused on evaluating and refining efficient algorithms to learn causal relationships from observational data. Evaluation of learned causal output is difficult, due to lack of a gold standard in real-world domains. Therefore, we used simulated data from a known causal network in a medical domain-the Alarm network. For causal discovery we used three variants of the Local Causal Discovery (LCD) algorithms, that are referred to as LCDa, LCDb and LCDc. These algorithms use the framework of causal Bayesian Networks to represent causal relationships among model variables. LCDa, LCDb and LCDe take as input a dataset and a partial node ordering, and output purported causes of the form variable $Y$ causally influences variable $Z$. Using the simulated Alarm dataset as input, LCDa had a false positive rate of $0.09$, LCDb $0.08$ and LCDc 0.04. All the algorithms had a true positive rate of about 0.27 . Most of the false positives occurred when a causal relationship was confounded. LCDc output as causal only those causally confounded pairs that had very weak confounding. We identify and discuss the causally confounded relationships that often seem to induce false positive results.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR3-mani01a, title = {A Simulation Study of Three Related Causal Data Mining Algorithms}, author = {Mani, Subramani and Cooper, Gregory F.}, booktitle = {Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics}, pages = {184--191}, year = {2001}, editor = {Richardson, Thomas S. and Jaakkola, Tommi S.}, volume = {R3}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r3/mani01a/mani01a.pdf}, url = {https://proceedings.mlr.press/r3/mani01a.html}, abstract = {In all scientific domains causality plays a significant role. This study focused on evaluating and refining efficient algorithms to learn causal relationships from observational data. Evaluation of learned causal output is difficult, due to lack of a gold standard in real-world domains. Therefore, we used simulated data from a known causal network in a medical domain-the Alarm network. For causal discovery we used three variants of the Local Causal Discovery (LCD) algorithms, that are referred to as LCDa, LCDb and LCDc. These algorithms use the framework of causal Bayesian Networks to represent causal relationships among model variables. LCDa, LCDb and LCDe take as input a dataset and a partial node ordering, and output purported causes of the form variable $Y$ causally influences variable $Z$. Using the simulated Alarm dataset as input, LCDa had a false positive rate of $0.09$, LCDb $0.08$ and LCDc 0.04. All the algorithms had a true positive rate of about 0.27 . Most of the false positives occurred when a causal relationship was confounded. LCDc output as causal only those causally confounded pairs that had very weak confounding. We identify and discuss the causally confounded relationships that often seem to induce false positive results.}, note = {Reissued by PMLR on 31 March 2021.} }
Endnote
%0 Conference Paper %T A Simulation Study of Three Related Causal Data Mining Algorithms %A Subramani Mani %A Gregory F. Cooper %B Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2001 %E Thomas S. Richardson %E Tommi S. Jaakkola %F pmlr-vR3-mani01a %I PMLR %P 184--191 %U https://proceedings.mlr.press/r3/mani01a.html %V R3 %X In all scientific domains causality plays a significant role. This study focused on evaluating and refining efficient algorithms to learn causal relationships from observational data. Evaluation of learned causal output is difficult, due to lack of a gold standard in real-world domains. Therefore, we used simulated data from a known causal network in a medical domain-the Alarm network. For causal discovery we used three variants of the Local Causal Discovery (LCD) algorithms, that are referred to as LCDa, LCDb and LCDc. These algorithms use the framework of causal Bayesian Networks to represent causal relationships among model variables. LCDa, LCDb and LCDe take as input a dataset and a partial node ordering, and output purported causes of the form variable $Y$ causally influences variable $Z$. Using the simulated Alarm dataset as input, LCDa had a false positive rate of $0.09$, LCDb $0.08$ and LCDc 0.04. All the algorithms had a true positive rate of about 0.27 . Most of the false positives occurred when a causal relationship was confounded. LCDc output as causal only those causally confounded pairs that had very weak confounding. We identify and discuss the causally confounded relationships that often seem to induce false positive results. %Z Reissued by PMLR on 31 March 2021.
APA
Mani, S. & Cooper, G.F.. (2001). A Simulation Study of Three Related Causal Data Mining Algorithms. Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R3:184-191 Available from https://proceedings.mlr.press/r3/mani01a.html. Reissued by PMLR on 31 March 2021.

Related Material