Causal Structure Discovery between Clusters of Nodes Induced by Latent Factors

Chandler Squires, Annie Yun, Eshaan Nichani, Raj Agrawal, Caroline Uhler
Proceedings of the First Conference on Causal Learning and Reasoning, PMLR 177:669-687, 2022.

Abstract

We consider the problem of learning the structure of a causal directed acyclic graph (DAG) model in the presence of latent variables. We define "latent factor causal models" (LFCMs) as a restriction on causal DAG models with latent variables, which are composed of clusters of observed variables that share the same latent parent and connections between these clusters given by edges pointing from the observed variables to latent variables. LFCMs are motivated by gene regulatory networks, where regulatory edges, corresponding to transcription factors, connect spatially clustered genes. We show identifiability results on this model and design a consistent three-stage algorithm that discovers clusters of observed nodes, a partial ordering over clusters, and finally, the entire structure over both observed and latent nodes. We evaluate our method in a synthetic setting, demonstrating its ability to almost perfectly recover the ground truth clustering even at relatively low sample sizes, as well as the ability to recover a significant number of the edges from observed variables to latent factors. Finally, we apply our method in a semi-synthetic setting to protein mass spectrometry data with a known ground truth network, and achieve almost perfect recovery of the ground truth variable clusters.

Cite this Paper


BibTeX
@InProceedings{pmlr-v177-squires22a, title = {Causal Structure Discovery between Clusters of Nodes Induced by Latent Factors}, author = {Squires, Chandler and Yun, Annie and Nichani, Eshaan and Agrawal, Raj and Uhler, Caroline}, booktitle = {Proceedings of the First Conference on Causal Learning and Reasoning}, pages = {669--687}, year = {2022}, editor = {Schölkopf, Bernhard and Uhler, Caroline and Zhang, Kun}, volume = {177}, series = {Proceedings of Machine Learning Research}, month = {11--13 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v177/squires22a/squires22a.pdf}, url = {https://proceedings.mlr.press/v177/squires22a.html}, abstract = {We consider the problem of learning the structure of a causal directed acyclic graph (DAG) model in the presence of latent variables. We define "latent factor causal models" (LFCMs) as a restriction on causal DAG models with latent variables, which are composed of clusters of observed variables that share the same latent parent and connections between these clusters given by edges pointing from the observed variables to latent variables. LFCMs are motivated by gene regulatory networks, where regulatory edges, corresponding to transcription factors, connect spatially clustered genes. We show identifiability results on this model and design a consistent three-stage algorithm that discovers clusters of observed nodes, a partial ordering over clusters, and finally, the entire structure over both observed and latent nodes. We evaluate our method in a synthetic setting, demonstrating its ability to almost perfectly recover the ground truth clustering even at relatively low sample sizes, as well as the ability to recover a significant number of the edges from observed variables to latent factors. Finally, we apply our method in a semi-synthetic setting to protein mass spectrometry data with a known ground truth network, and achieve almost perfect recovery of the ground truth variable clusters.} }
Endnote
%0 Conference Paper %T Causal Structure Discovery between Clusters of Nodes Induced by Latent Factors %A Chandler Squires %A Annie Yun %A Eshaan Nichani %A Raj Agrawal %A Caroline Uhler %B Proceedings of the First Conference on Causal Learning and Reasoning %C Proceedings of Machine Learning Research %D 2022 %E Bernhard Schölkopf %E Caroline Uhler %E Kun Zhang %F pmlr-v177-squires22a %I PMLR %P 669--687 %U https://proceedings.mlr.press/v177/squires22a.html %V 177 %X We consider the problem of learning the structure of a causal directed acyclic graph (DAG) model in the presence of latent variables. We define "latent factor causal models" (LFCMs) as a restriction on causal DAG models with latent variables, which are composed of clusters of observed variables that share the same latent parent and connections between these clusters given by edges pointing from the observed variables to latent variables. LFCMs are motivated by gene regulatory networks, where regulatory edges, corresponding to transcription factors, connect spatially clustered genes. We show identifiability results on this model and design a consistent three-stage algorithm that discovers clusters of observed nodes, a partial ordering over clusters, and finally, the entire structure over both observed and latent nodes. We evaluate our method in a synthetic setting, demonstrating its ability to almost perfectly recover the ground truth clustering even at relatively low sample sizes, as well as the ability to recover a significant number of the edges from observed variables to latent factors. Finally, we apply our method in a semi-synthetic setting to protein mass spectrometry data with a known ground truth network, and achieve almost perfect recovery of the ground truth variable clusters.
APA
Squires, C., Yun, A., Nichani, E., Agrawal, R. & Uhler, C.. (2022). Causal Structure Discovery between Clusters of Nodes Induced by Latent Factors. Proceedings of the First Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 177:669-687 Available from https://proceedings.mlr.press/v177/squires22a.html.

Related Material