A Distance Covariance-based Kernel for Nonlinear Causal Clustering in Heterogeneous Populations

Alex Markham, Richeek Das, Moritz Grosse-Wentrup
Proceedings of the First Conference on Causal Learning and Reasoning, PMLR 177:542-558, 2022.

Abstract

We consider the problem of causal structure learning in the setting of heterogeneous populations, i.e., populations in which a single causal structure does not adequately represent all population members, as is common in biological and social sciences. To this end, we introduce a distance covariance-based kernel designed specifically to measure the similarity between the underlying nonlinear causal structures of different samples. Indeed, we prove that the corresponding feature map is a statistically consistent estimator of nonlinear independence structure, rendering the kernel itself a statistical test for the hypothesis that sets of samples come from different generating causal structures. Even stronger, we prove that the kernel space is isometric to the space of causal ancestral graphs, so that distance between samples in the kernel space is guaranteed to correspond to distance between their generating causal structures. This kernel thus enables us to perform clustering to identify the homogeneous subpopulations, for which we can then learn causal structures using existing methods. Though we focus on the theoretical aspects of the kernel, we also evaluate its performance on synthetic data and demonstrate its use on a real gene expression data set.

Cite this Paper


BibTeX
@InProceedings{pmlr-v177-markham22a, title = {A Distance Covariance-based Kernel for Nonlinear Causal Clustering in Heterogeneous Populations}, author = {Markham, Alex and Das, Richeek and Grosse-Wentrup, Moritz}, booktitle = {Proceedings of the First Conference on Causal Learning and Reasoning}, pages = {542--558}, year = {2022}, editor = {Schölkopf, Bernhard and Uhler, Caroline and Zhang, Kun}, volume = {177}, series = {Proceedings of Machine Learning Research}, month = {11--13 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v177/markham22a/markham22a.pdf}, url = {https://proceedings.mlr.press/v177/markham22a.html}, abstract = {We consider the problem of causal structure learning in the setting of heterogeneous populations, i.e., populations in which a single causal structure does not adequately represent all population members, as is common in biological and social sciences. To this end, we introduce a distance covariance-based kernel designed specifically to measure the similarity between the underlying nonlinear causal structures of different samples. Indeed, we prove that the corresponding feature map is a statistically consistent estimator of nonlinear independence structure, rendering the kernel itself a statistical test for the hypothesis that sets of samples come from different generating causal structures. Even stronger, we prove that the kernel space is isometric to the space of causal ancestral graphs, so that distance between samples in the kernel space is guaranteed to correspond to distance between their generating causal structures. This kernel thus enables us to perform clustering to identify the homogeneous subpopulations, for which we can then learn causal structures using existing methods. Though we focus on the theoretical aspects of the kernel, we also evaluate its performance on synthetic data and demonstrate its use on a real gene expression data set.} }
Endnote
%0 Conference Paper %T A Distance Covariance-based Kernel for Nonlinear Causal Clustering in Heterogeneous Populations %A Alex Markham %A Richeek Das %A Moritz Grosse-Wentrup %B Proceedings of the First Conference on Causal Learning and Reasoning %C Proceedings of Machine Learning Research %D 2022 %E Bernhard Schölkopf %E Caroline Uhler %E Kun Zhang %F pmlr-v177-markham22a %I PMLR %P 542--558 %U https://proceedings.mlr.press/v177/markham22a.html %V 177 %X We consider the problem of causal structure learning in the setting of heterogeneous populations, i.e., populations in which a single causal structure does not adequately represent all population members, as is common in biological and social sciences. To this end, we introduce a distance covariance-based kernel designed specifically to measure the similarity between the underlying nonlinear causal structures of different samples. Indeed, we prove that the corresponding feature map is a statistically consistent estimator of nonlinear independence structure, rendering the kernel itself a statistical test for the hypothesis that sets of samples come from different generating causal structures. Even stronger, we prove that the kernel space is isometric to the space of causal ancestral graphs, so that distance between samples in the kernel space is guaranteed to correspond to distance between their generating causal structures. This kernel thus enables us to perform clustering to identify the homogeneous subpopulations, for which we can then learn causal structures using existing methods. Though we focus on the theoretical aspects of the kernel, we also evaluate its performance on synthetic data and demonstrate its use on a real gene expression data set.
APA
Markham, A., Das, R. & Grosse-Wentrup, M.. (2022). A Distance Covariance-based Kernel for Nonlinear Causal Clustering in Heterogeneous Populations. Proceedings of the First Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 177:542-558 Available from https://proceedings.mlr.press/v177/markham22a.html.

Related Material