Robust Direct Learning for Causal Data Fusion

Xinyu Li, Yilin Li, Qing Cui, Longfei Li, Jun Zhou
Proceedings of The 14th Asian Conference on Machine Learning, PMLR 189:611-626, 2023.

Abstract

In the era of big data, the explosive growth of multi-source heterogeneous data offers many exciting challenges and opportunities for improving the inference of conditional average treatment effects. In this paper, we investigate homogeneous and heterogeneous causal data fusion problems under a general setting that allows for the presence of source-specific covariates. We provide a direct learning framework for integrating multi-source data that separates the treatment effect from other nuisance functions, and achieves double robustness against certain misspecification. To improve estimation precision and stability, we propose a causal information-aware weighting function motivated by theoretical insights from the semiparametric efficiency theory; it assigns larger weights to samples containing more causal information with high interpretability. We introduce a two-step algorithm, the weighted multi-source direct learner, based on constructing a pseudo-outcome and regressing it on covariates under a weighted least square criterion; it offers us a powerful tool for causal data fusion, enjoying the advantages of easy implementation, double robustness and model flexibility. In simulation studies, we demonstrate the effectiveness of our proposed methods in both homogeneous and heterogeneous causal data fusion scenarios.

Cite this Paper


BibTeX
@InProceedings{pmlr-v189-li23c, title = {Robust Direct Learning for Causal Data Fusion}, author = {Li, Xinyu and Li, Yilin and Cui, Qing and Li, Longfei and Zhou, Jun}, booktitle = {Proceedings of The 14th Asian Conference on Machine Learning}, pages = {611--626}, year = {2023}, editor = {Khan, Emtiyaz and Gonen, Mehmet}, volume = {189}, series = {Proceedings of Machine Learning Research}, month = {12--14 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v189/li23c/li23c.pdf}, url = {https://proceedings.mlr.press/v189/li23c.html}, abstract = {In the era of big data, the explosive growth of multi-source heterogeneous data offers many exciting challenges and opportunities for improving the inference of conditional average treatment effects. In this paper, we investigate homogeneous and heterogeneous causal data fusion problems under a general setting that allows for the presence of source-specific covariates. We provide a direct learning framework for integrating multi-source data that separates the treatment effect from other nuisance functions, and achieves double robustness against certain misspecification. To improve estimation precision and stability, we propose a causal information-aware weighting function motivated by theoretical insights from the semiparametric efficiency theory; it assigns larger weights to samples containing more causal information with high interpretability. We introduce a two-step algorithm, the weighted multi-source direct learner, based on constructing a pseudo-outcome and regressing it on covariates under a weighted least square criterion; it offers us a powerful tool for causal data fusion, enjoying the advantages of easy implementation, double robustness and model flexibility. In simulation studies, we demonstrate the effectiveness of our proposed methods in both homogeneous and heterogeneous causal data fusion scenarios.} }
Endnote
%0 Conference Paper %T Robust Direct Learning for Causal Data Fusion %A Xinyu Li %A Yilin Li %A Qing Cui %A Longfei Li %A Jun Zhou %B Proceedings of The 14th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Emtiyaz Khan %E Mehmet Gonen %F pmlr-v189-li23c %I PMLR %P 611--626 %U https://proceedings.mlr.press/v189/li23c.html %V 189 %X In the era of big data, the explosive growth of multi-source heterogeneous data offers many exciting challenges and opportunities for improving the inference of conditional average treatment effects. In this paper, we investigate homogeneous and heterogeneous causal data fusion problems under a general setting that allows for the presence of source-specific covariates. We provide a direct learning framework for integrating multi-source data that separates the treatment effect from other nuisance functions, and achieves double robustness against certain misspecification. To improve estimation precision and stability, we propose a causal information-aware weighting function motivated by theoretical insights from the semiparametric efficiency theory; it assigns larger weights to samples containing more causal information with high interpretability. We introduce a two-step algorithm, the weighted multi-source direct learner, based on constructing a pseudo-outcome and regressing it on covariates under a weighted least square criterion; it offers us a powerful tool for causal data fusion, enjoying the advantages of easy implementation, double robustness and model flexibility. In simulation studies, we demonstrate the effectiveness of our proposed methods in both homogeneous and heterogeneous causal data fusion scenarios.
APA
Li, X., Li, Y., Cui, Q., Li, L. & Zhou, J.. (2023). Robust Direct Learning for Causal Data Fusion. Proceedings of The 14th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 189:611-626 Available from https://proceedings.mlr.press/v189/li23c.html.

Related Material