Sparse Causal Effect Estimation using Two-Sample Summary Statistics in the Presence of Unmeasured Confounding

Shimeng Huang, Niklas Pfister, Jack Bowden
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:3394-3402, 2025.

Abstract

Observational genome-wide association studies are now widely used for causal inference in genetic epidemiology. To maintain privacy, such data is often only publicly available as summary statistics, and often studies for the endogenous covariates and the outcome are available separately. This has necessitated methods tailored to two-sample summary statistics. Current state-of-the-art methods modify linear instrumental variable (IV) regression—with genetic variants as instruments—to account for unmeasured confounding. However, since the endogenous covariates can be high dimensional, standard IV assumptions are generally insufficient to identify all causal effects simultaneously. We ensure identifiability by assuming the causal effects are sparse and propose a sparse causal effect two-sample IV estimator, spaceTSIV, adapting the spaceIV estimator by Pfister and Peters (2022) for two-sample summary statistics. We provide two methods, based on L0- and L1-penalization, respectively. We prove identifiability of the sparse causal effects in the two-sample setting and consistency of spaceTSIV. The performance of spaceTSIV is compared with existing two-sample IV methods in simulations. Finally, we showcase our methods using real proteomic and gene-expression data for drug-target discovery.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-huang25c, title = {Sparse Causal Effect Estimation using Two-Sample Summary Statistics in the Presence of Unmeasured Confounding}, author = {Huang, Shimeng and Pfister, Niklas and Bowden, Jack}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {3394--3402}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/huang25c/huang25c.pdf}, url = {https://proceedings.mlr.press/v258/huang25c.html}, abstract = {Observational genome-wide association studies are now widely used for causal inference in genetic epidemiology. To maintain privacy, such data is often only publicly available as summary statistics, and often studies for the endogenous covariates and the outcome are available separately. This has necessitated methods tailored to two-sample summary statistics. Current state-of-the-art methods modify linear instrumental variable (IV) regression—with genetic variants as instruments—to account for unmeasured confounding. However, since the endogenous covariates can be high dimensional, standard IV assumptions are generally insufficient to identify all causal effects simultaneously. We ensure identifiability by assuming the causal effects are sparse and propose a sparse causal effect two-sample IV estimator, spaceTSIV, adapting the spaceIV estimator by Pfister and Peters (2022) for two-sample summary statistics. We provide two methods, based on L0- and L1-penalization, respectively. We prove identifiability of the sparse causal effects in the two-sample setting and consistency of spaceTSIV. The performance of spaceTSIV is compared with existing two-sample IV methods in simulations. Finally, we showcase our methods using real proteomic and gene-expression data for drug-target discovery.} }
Endnote
%0 Conference Paper %T Sparse Causal Effect Estimation using Two-Sample Summary Statistics in the Presence of Unmeasured Confounding %A Shimeng Huang %A Niklas Pfister %A Jack Bowden %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-huang25c %I PMLR %P 3394--3402 %U https://proceedings.mlr.press/v258/huang25c.html %V 258 %X Observational genome-wide association studies are now widely used for causal inference in genetic epidemiology. To maintain privacy, such data is often only publicly available as summary statistics, and often studies for the endogenous covariates and the outcome are available separately. This has necessitated methods tailored to two-sample summary statistics. Current state-of-the-art methods modify linear instrumental variable (IV) regression—with genetic variants as instruments—to account for unmeasured confounding. However, since the endogenous covariates can be high dimensional, standard IV assumptions are generally insufficient to identify all causal effects simultaneously. We ensure identifiability by assuming the causal effects are sparse and propose a sparse causal effect two-sample IV estimator, spaceTSIV, adapting the spaceIV estimator by Pfister and Peters (2022) for two-sample summary statistics. We provide two methods, based on L0- and L1-penalization, respectively. We prove identifiability of the sparse causal effects in the two-sample setting and consistency of spaceTSIV. The performance of spaceTSIV is compared with existing two-sample IV methods in simulations. Finally, we showcase our methods using real proteomic and gene-expression data for drug-target discovery.
APA
Huang, S., Pfister, N. & Bowden, J.. (2025). Sparse Causal Effect Estimation using Two-Sample Summary Statistics in the Presence of Unmeasured Confounding. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:3394-3402 Available from https://proceedings.mlr.press/v258/huang25c.html.

Related Material