Using Classification Trees to Improve Causal Inferences in Observational Studies
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:123-138, 1997.
Much of the recent literature on AI and statistics has focused on how to use causal knowledge to enrich the set of valid causal inferences that can be drawn from available data and applied to practical problems such as decision-making, probabilistic diagnosis, and cost-effective control of systems. This paper examines classical problems of valid causal inference in observational studies, using epidemiological studies on the association between exposure to diesel exhaust (DE) and risk of lung cancer as a case study. It shows that one of the main applied computational tools of AI and statistics, classification tree analysis, can be adapted to help control or avoid many of the usual statistical threats to valid causal inference, and links this new use of classification trees to an established older literature on techniques for causal inference in social statistics based on elimination of competing (non-causal) explanations for observed associations. A strong link is then forged between an extension of classification tree analysis and modem AI and statistics approaches to causal modeling and inference based in directed acyclic graph (DAG) causal models and influence diagrams. This new link is based on the observation that classification tree analysis can be adapted to test the local Markov conditions that provide the critical defining structure of DAG models, as well as to quantify the conditional distributions of variables given the values of their parents - the key numerical information needed to quantify an influence diagram model. Finally, these insights are applied to available data on DE and lung cancer risks and are used to conclude that there is no evidence of a causal relation between them.