Causal Imputation via Synthetic Interventions

Chandler Squires, Dennis Shen, Anish Agarwal, Devavrat Shah, Caroline Uhler
Proceedings of the First Conference on Causal Learning and Reasoning, PMLR 177:688-711, 2022.

Abstract

Consider the problem of determining the effect of a compound on a specific cell type. To answer this question, researchers traditionally need to run an experiment applying the drug of interest to that cell type. This approach is not scalable: given a large number of different actions (compounds) and a large number of different contexts (cell types), it is infeasible to run an experiment for every action-context pair. In such cases, one would ideally like to predict the outcome for every pair while only needing outcome data for a small _subset_ of pairs. This task, which we label "causal imputation", is a generalization of the causal transportability problem. To address this challenge, we extend the recently introduced _synthetic interventions_ (SI) estimator to handle more general data sparsity patterns. We prove that, under a latent factor model, our estimator provides valid estimates for the causal imputation task. We motivate this model by establishing a connection to the linear structural causal model literature. Finally, we consider the prominent CMAP dataset in predicting the effects of compounds on gene expression across cell types. We find that our estimator outperforms standard baselines, thus confirming its utility in biological applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v177-squires22b, title = {Causal Imputation via Synthetic Interventions}, author = {Squires, Chandler and Shen, Dennis and Agarwal, Anish and Shah, Devavrat and Uhler, Caroline}, booktitle = {Proceedings of the First Conference on Causal Learning and Reasoning}, pages = {688--711}, year = {2022}, editor = {Schölkopf, Bernhard and Uhler, Caroline and Zhang, Kun}, volume = {177}, series = {Proceedings of Machine Learning Research}, month = {11--13 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v177/squires22b/squires22b.pdf}, url = {https://proceedings.mlr.press/v177/squires22b.html}, abstract = {Consider the problem of determining the effect of a compound on a specific cell type. To answer this question, researchers traditionally need to run an experiment applying the drug of interest to that cell type. This approach is not scalable: given a large number of different actions (compounds) and a large number of different contexts (cell types), it is infeasible to run an experiment for every action-context pair. In such cases, one would ideally like to predict the outcome for every pair while only needing outcome data for a small _subset_ of pairs. This task, which we label "causal imputation", is a generalization of the causal transportability problem. To address this challenge, we extend the recently introduced _synthetic interventions_ (SI) estimator to handle more general data sparsity patterns. We prove that, under a latent factor model, our estimator provides valid estimates for the causal imputation task. We motivate this model by establishing a connection to the linear structural causal model literature. Finally, we consider the prominent CMAP dataset in predicting the effects of compounds on gene expression across cell types. We find that our estimator outperforms standard baselines, thus confirming its utility in biological applications. } }
Endnote
%0 Conference Paper %T Causal Imputation via Synthetic Interventions %A Chandler Squires %A Dennis Shen %A Anish Agarwal %A Devavrat Shah %A Caroline Uhler %B Proceedings of the First Conference on Causal Learning and Reasoning %C Proceedings of Machine Learning Research %D 2022 %E Bernhard Schölkopf %E Caroline Uhler %E Kun Zhang %F pmlr-v177-squires22b %I PMLR %P 688--711 %U https://proceedings.mlr.press/v177/squires22b.html %V 177 %X Consider the problem of determining the effect of a compound on a specific cell type. To answer this question, researchers traditionally need to run an experiment applying the drug of interest to that cell type. This approach is not scalable: given a large number of different actions (compounds) and a large number of different contexts (cell types), it is infeasible to run an experiment for every action-context pair. In such cases, one would ideally like to predict the outcome for every pair while only needing outcome data for a small _subset_ of pairs. This task, which we label "causal imputation", is a generalization of the causal transportability problem. To address this challenge, we extend the recently introduced _synthetic interventions_ (SI) estimator to handle more general data sparsity patterns. We prove that, under a latent factor model, our estimator provides valid estimates for the causal imputation task. We motivate this model by establishing a connection to the linear structural causal model literature. Finally, we consider the prominent CMAP dataset in predicting the effects of compounds on gene expression across cell types. We find that our estimator outperforms standard baselines, thus confirming its utility in biological applications.
APA
Squires, C., Shen, D., Agarwal, A., Shah, D. & Uhler, C.. (2022). Causal Imputation via Synthetic Interventions. Proceedings of the First Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 177:688-711 Available from https://proceedings.mlr.press/v177/squires22b.html.

Related Material