[edit]
Inference of nonlinear causal effects with application to TWAS with GWAS summary data
Proceedings of the Third Conference on Causal Learning and Reasoning, PMLR 236:793-826, 2024.
Abstract
Large-scale genome-wide association studies (GWAS) have offered an exciting opportunity to discover putative causal genes or risk factors associated with diseases by using SNPs as instrumental variables (IVs). However, conventional approaches assume linear causal relations partly for simplicity and partly for the availability of GWAS summary data. In this work, we propose a novel model for transcriptome-wide association studies (TWAS) to incorporate nonlinear relationships across IVs, an exposure/gene, and an outcome, which is robust against violations of the valid IV assumptions, permits the use of GWAS summary data, and covers two-stage least squares (2SLS) as a special case. We decouple the estimation of a marginal causal effect and a nonlinear transformation, where the former is estimated via sliced inverse regression and a sparse instrumental variable regression, and the latter is estimated by a ratio-adjusted inverse regression. On this ground, we propose an inferential procedure. An application of the proposed method to the ADNI gene expression data and the IGAP GWAS summary data identifies 18 causal genes associated with Alzheimer’s disease, including APOE and TOMM40, in addition to 7 other genes missed by 2SLS considering only linear relationships. Our findings suggest that nonlinear modeling is required to unleash the power of IV regression for identifying potentially nonlinear gene-trait associations. The source code and accompanying software *nl-causal* can be accessed through the link: [https://github.com/statmlben/nonlinear-causal](https://github.com/statmlben/nonlinear-causal).