[edit]
On the Unlikelihood of D-Separation
Proceedings of The 12th International Conference on Probabilistic Graphical Models, PMLR 246:70-92, 2024.
Abstract
Causal discovery aims to recover a causal graph from data generated by it; constraint-based methods do so by searching for d-separating conditioning sets of nodes. In this paper, we provide analytic evidence that on large graphs, d-separation is a rare phenomenon, even when guaranteed to exist. Our analysis implies poor average-case performance of existing constraint-based methods, except on a vanishingly small class of extremely sparse graphs. We consider a set $V=\{v_1,\ldots,v_n\}$ of nodes, and generate a random DAG $G=(V,E)$ where $v_i \rightarrow v_j \in E$ with i.i.d probability $p_1$ if $i j$. For any d-separable pair of nodes $v_i$ and $v_j$, we provide upper bounds on the probability that a subset of $V\backslash\{v_i,v_j\}$ d-separates the pair, under different subset selection scenarios; our upper bounds decay exponentially fast to $0$ as $|V| \rightarrow \infty$ for any fixed expected density. We then analyze the average-case performance of constraint-based methods, including the PC Algorithm, a variant of the SGS Algorithm called UniformSGS, and also any constraint-based method limited to small conditioning sets (a limitation which holds in most of existing literature). We show that these algorithms usually suffer from low precision or exponential running time on all but extremely sparse graphs.