Universal Causal Evaluation Engine: An API for empirically evaluating causal inference models
Proceedings of Machine Learning Research, PMLR 104:50-58, 2019.
A major driver in the success of predictive machine learning has been the “common task framework,” where community-wide benchmarks are shared for evaluating new algorithms. This pattern, however, is difficult to implement for causal learning tasks because the ground truth in these tasks is in general unobservable. Instead, causal inference methods are often evaluated on synthetic or semi-synthetic datasets that incorporate idiosyncratic assump- tions about the underlying data-generating process. These evaluations are often proposed in conjunction with new causal inference methods—as a result, many methods are eval- uated on incomparable benchmarks. To address this issue, we establish an API for gen- eralized causal inference model assessment, with the goal of developing a platform that lets researchers deploy and evaluate new model classes in instances where treatments are explicitly known. The API uses a common interface for each of its components, and it allows for new methods and datasets to be evaluated and saved for future benchmarking.