Using Classification Trees to Improve Causal Inferences in Observational Studies

Louis Anthony Cox Jr
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:123-138, 1997.

Abstract

Much of the recent literature on AI and statistics has focused on how to use causal knowledge to enrich the set of valid causal inferences that can be drawn from available data and applied to practical problems such as decision-making, probabilistic diagnosis, and cost-effective control of systems. This paper examines classical problems of valid causal inference in observational studies, using epidemiological studies on the association between exposure to diesel exhaust (DE) and risk of lung cancer as a case study. It shows that one of the main applied computational tools of AI and statistics, classification tree analysis, can be adapted to help control or avoid many of the usual statistical threats to valid causal inference, and links this new use of classification trees to an established older literature on techniques for causal inference in social statistics based on elimination of competing (non-causal) explanations for observed associations. A strong link is then forged between an extension of classification tree analysis and modem AI and statistics approaches to causal modeling and inference based in directed acyclic graph (DAG) causal models and influence diagrams. This new link is based on the observation that classification tree analysis can be adapted to test the local Markov conditions that provide the critical defining structure of DAG models, as well as to quantify the conditional distributions of variables given the values of their parents - the key numerical information needed to quantify an influence diagram model. Finally, these insights are applied to available data on DE and lung cancer risks and are used to conclude that there is no evidence of a causal relation between them.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR1-cox97a, title = {Using Classification Trees to Improve Causal Inferences in Observational Studies}, author = {Cox, Jr, Louis Anthony}, booktitle = {Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics}, pages = {123--138}, year = {1997}, editor = {Madigan, David and Smyth, Padhraic}, volume = {R1}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r1/cox97a/cox97a.pdf}, url = {https://proceedings.mlr.press/r1/cox97a.html}, abstract = {Much of the recent literature on AI and statistics has focused on how to use causal knowledge to enrich the set of valid causal inferences that can be drawn from available data and applied to practical problems such as decision-making, probabilistic diagnosis, and cost-effective control of systems. This paper examines classical problems of valid causal inference in observational studies, using epidemiological studies on the association between exposure to diesel exhaust (DE) and risk of lung cancer as a case study. It shows that one of the main applied computational tools of AI and statistics, classification tree analysis, can be adapted to help control or avoid many of the usual statistical threats to valid causal inference, and links this new use of classification trees to an established older literature on techniques for causal inference in social statistics based on elimination of competing (non-causal) explanations for observed associations. A strong link is then forged between an extension of classification tree analysis and modem AI and statistics approaches to causal modeling and inference based in directed acyclic graph (DAG) causal models and influence diagrams. This new link is based on the observation that classification tree analysis can be adapted to test the local Markov conditions that provide the critical defining structure of DAG models, as well as to quantify the conditional distributions of variables given the values of their parents - the key numerical information needed to quantify an influence diagram model. Finally, these insights are applied to available data on DE and lung cancer risks and are used to conclude that there is no evidence of a causal relation between them.}, note = {Reissued by PMLR on 30 March 2021.} }
Endnote
%0 Conference Paper %T Using Classification Trees to Improve Causal Inferences in Observational Studies %A Louis Anthony Cox, Jr %B Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 1997 %E David Madigan %E Padhraic Smyth %F pmlr-vR1-cox97a %I PMLR %P 123--138 %U https://proceedings.mlr.press/r1/cox97a.html %V R1 %X Much of the recent literature on AI and statistics has focused on how to use causal knowledge to enrich the set of valid causal inferences that can be drawn from available data and applied to practical problems such as decision-making, probabilistic diagnosis, and cost-effective control of systems. This paper examines classical problems of valid causal inference in observational studies, using epidemiological studies on the association between exposure to diesel exhaust (DE) and risk of lung cancer as a case study. It shows that one of the main applied computational tools of AI and statistics, classification tree analysis, can be adapted to help control or avoid many of the usual statistical threats to valid causal inference, and links this new use of classification trees to an established older literature on techniques for causal inference in social statistics based on elimination of competing (non-causal) explanations for observed associations. A strong link is then forged between an extension of classification tree analysis and modem AI and statistics approaches to causal modeling and inference based in directed acyclic graph (DAG) causal models and influence diagrams. This new link is based on the observation that classification tree analysis can be adapted to test the local Markov conditions that provide the critical defining structure of DAG models, as well as to quantify the conditional distributions of variables given the values of their parents - the key numerical information needed to quantify an influence diagram model. Finally, these insights are applied to available data on DE and lung cancer risks and are used to conclude that there is no evidence of a causal relation between them. %Z Reissued by PMLR on 30 March 2021.
APA
Cox, Jr, L.A.. (1997). Using Classification Trees to Improve Causal Inferences in Observational Studies. Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R1:123-138 Available from https://proceedings.mlr.press/r1/cox97a.html. Reissued by PMLR on 30 March 2021.

Related Material