[edit]
Robust Direct Learning for Causal Data Fusion
Proceedings of The 14th Asian Conference on Machine
Learning, PMLR 189:611-626, 2023.
Abstract
In the era of big data, the explosive growth of
multi-source heterogeneous data offers many exciting
challenges and opportunities for improving the
inference of conditional average treatment
effects. In this paper, we investigate homogeneous
and heterogeneous causal data fusion problems under
a general setting that allows for the presence of
source-specific covariates. We provide a direct
learning framework for integrating multi-source data
that separates the treatment effect from other
nuisance functions, and achieves double robustness
against certain misspecification. To improve
estimation precision and stability, we propose a
causal information-aware weighting function
motivated by theoretical insights from the
semiparametric efficiency theory; it assigns larger
weights to samples containing more causal
information with high interpretability. We introduce
a two-step algorithm, the weighted multi-source
direct learner, based on constructing a
pseudo-outcome and regressing it on covariates under
a weighted least square criterion; it offers us a
powerful tool for causal data fusion, enjoying the
advantages of easy implementation, double robustness
and model flexibility. In simulation studies, we
demonstrate the effectiveness of our proposed
methods in both homogeneous and heterogeneous causal
data fusion scenarios.