[edit]
Sample Complexity of Distinguishing Cause from Effect
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:10487-10504, 2023.
Abstract
We study the sample complexity of causal structure learning on a two-variable system with observational and experimental data. Specifically, for two variables $X$ and $Y$, we consider the classical scenario where either $X$ causes $Y$, $Y$ causes $X$, or there is an unmeasured confounder between $X$ and $Y$. Let $m_1$ be the number of observational samples of $(X,Y)$, and let $m_2$ be the number of interventional samples where either $X$ or $Y$ has been subject to an external intervention. We show that if $X$ and $Y$ are over a finite domain of size $k$ and are significantly correlated, the minimum $m_2$ needed is sublinear in $k$. Moreover, as $m_1$ grows, the minimum $m_2$ needed to identify the causal structure decreases. In fact, we can give a tight characterization of the tradeoff between $m_1$ and $m_2$ when $m_1 = O(k)$ or is sufficiently large. We build upon techniques for closeness testing when $m_1$ is small (e.g., sublinear in $k$), and for non-parametric density estimation when $m_2$ is large. Our hardness results are based on carefully constructing causal models whose marginal and interventional distributions form hard instances of canonical results on property testing.