How and Why to Use Experimental Data to Evaluate Methods for Observational Causal Inference

Amanda M Gentzel; Purva Pruthi; David Jensen

How and Why to Use Experimental Data to Evaluate Methods for Observational Causal Inference

Amanda M Gentzel, Purva Pruthi, David Jensen

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:3660-3671, 2021.

Abstract

Methods that infer causal dependence from observational data are central to many areas of science, including medicine, economics, and the social sciences. A variety of theoretical properties of these methods have been proven, but empirical evaluation remains a challenge, largely due to the lack of observational data sets for which treatment effect is known. We describe and analyze observational sampling from randomized controlled trials (OSRCT), a method for evaluating causal inference methods using data from randomized controlled trials (RCTs). This method can be used to create constructed observational data sets with corresponding unbiased estimates of treatment effect, substantially increasing the number of data sets available for evaluating causal inference methods. We show that, in expectation, OSRCT creates data sets that are equivalent to those produced by randomly sampling from empirical data sets in which all potential outcomes are available. We then perform a large-scale evaluation of seven causal inference methods over 37 data sets, drawn from RCTs, as well as simulators, real-world computational systems, and observational data sets augmented with a synthetic response variable. We find notable performance differences when comparing across data from different sources, demonstrating the importance of using data from a variety of sources when evaluating any causal inference method.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-gentzel21a,
  title = 	 {How and Why to Use Experimental Data to Evaluate Methods for Observational Causal Inference},
  author =       {Gentzel, Amanda M and Pruthi, Purva and Jensen, David},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {3660--3671},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/gentzel21a/gentzel21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/gentzel21a.html},
  abstract = 	 {Methods that infer causal dependence from observational data are central to many areas of science, including medicine, economics, and the social sciences. A variety of theoretical properties of these methods have been proven, but empirical evaluation remains a challenge, largely due to the lack of observational data sets for which treatment effect is known. We describe and analyze observational sampling from randomized controlled trials (OSRCT), a method for evaluating causal inference methods using data from randomized controlled trials (RCTs). This method can be used to create constructed observational data sets with corresponding unbiased estimates of treatment effect, substantially increasing the number of data sets available for evaluating causal inference methods. We show that, in expectation, OSRCT creates data sets that are equivalent to those produced by randomly sampling from empirical data sets in which all potential outcomes are available. We then perform a large-scale evaluation of seven causal inference methods over 37 data sets, drawn from RCTs, as well as simulators, real-world computational systems, and observational data sets augmented with a synthetic response variable. We find notable performance differences when comparing across data from different sources, demonstrating the importance of using data from a variety of sources when evaluating any causal inference method.}
}

Endnote

%0 Conference Paper
%T How and Why to Use Experimental Data to Evaluate Methods for Observational Causal Inference
%A Amanda M Gentzel
%A Purva Pruthi
%A David Jensen
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-gentzel21a
%I PMLR
%P 3660--3671
%U https://proceedings.mlr.press/v139/gentzel21a.html
%V 139
%X Methods that infer causal dependence from observational data are central to many areas of science, including medicine, economics, and the social sciences. A variety of theoretical properties of these methods have been proven, but empirical evaluation remains a challenge, largely due to the lack of observational data sets for which treatment effect is known. We describe and analyze observational sampling from randomized controlled trials (OSRCT), a method for evaluating causal inference methods using data from randomized controlled trials (RCTs). This method can be used to create constructed observational data sets with corresponding unbiased estimates of treatment effect, substantially increasing the number of data sets available for evaluating causal inference methods. We show that, in expectation, OSRCT creates data sets that are equivalent to those produced by randomly sampling from empirical data sets in which all potential outcomes are available. We then perform a large-scale evaluation of seven causal inference methods over 37 data sets, drawn from RCTs, as well as simulators, real-world computational systems, and observational data sets augmented with a synthetic response variable. We find notable performance differences when comparing across data from different sources, demonstrating the importance of using data from a variety of sources when evaluating any causal inference method.

APA

Gentzel, A.M., Pruthi, P. & Jensen, D.. (2021). How and Why to Use Experimental Data to Evaluate Methods for Observational Causal Inference. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:3660-3671 Available from https://proceedings.mlr.press/v139/gentzel21a.html.

How and Why to Use Experimental Data to Evaluate Methods for Observational Causal Inference

Abstract

Cite this Paper

Related Material