[edit]
Sample Complexity of Distinguishing Cause from Effect
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:10487-10504, 2023.
Abstract
We study the sample complexity of causal structure learning on a two-variable system with observational and experimental data. Specifically, for two variables X and Y, we consider the classical scenario where either X causes Y, Y causes X, or there is an unmeasured confounder between X and Y. Let m1 be the number of observational samples of (X,Y), and let m2 be the number of interventional samples where either X or Y has been subject to an external intervention. We show that if X and Y are over a finite domain of size k and are significantly correlated, the minimum m2 needed is sublinear in k. Moreover, as m1 grows, the minimum m2 needed to identify the causal structure decreases. In fact, we can give a tight characterization of the tradeoff between m1 and m2 when m1=O(k) or is sufficiently large. We build upon techniques for closeness testing when m1 is small (e.g., sublinear in k), and for non-parametric density estimation when m2 is large. Our hardness results are based on carefully constructing causal models whose marginal and interventional distributions form hard instances of canonical results on property testing.