Sample Complexity of Distinguishing Cause from Effect

Jayadev Acharya, Sourbh Bhadane, Arnab Bhattacharyya, Saravanan Kandasamy, Ziteng Sun
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:10487-10504, 2023.

Abstract

We study the sample complexity of causal structure learning on a two-variable system with observational and experimental data. Specifically, for two variables $X$ and $Y$, we consider the classical scenario where either $X$ causes $Y$, $Y$ causes $X$, or there is an unmeasured confounder between $X$ and $Y$. Let $m_1$ be the number of observational samples of $(X,Y)$, and let $m_2$ be the number of interventional samples where either $X$ or $Y$ has been subject to an external intervention. We show that if $X$ and $Y$ are over a finite domain of size $k$ and are significantly correlated, the minimum $m_2$ needed is sublinear in $k$. Moreover, as $m_1$ grows, the minimum $m_2$ needed to identify the causal structure decreases. In fact, we can give a tight characterization of the tradeoff between $m_1$ and $m_2$ when $m_1 = O(k)$ or is sufficiently large. We build upon techniques for closeness testing when $m_1$ is small (e.g., sublinear in $k$), and for non-parametric density estimation when $m_2$ is large. Our hardness results are based on carefully constructing causal models whose marginal and interventional distributions form hard instances of canonical results on property testing.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-acharya23b, title = {Sample Complexity of Distinguishing Cause from Effect}, author = {Acharya, Jayadev and Bhadane, Sourbh and Bhattacharyya, Arnab and Kandasamy, Saravanan and Sun, Ziteng}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {10487--10504}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/acharya23b/acharya23b.pdf}, url = {https://proceedings.mlr.press/v206/acharya23b.html}, abstract = {We study the sample complexity of causal structure learning on a two-variable system with observational and experimental data. Specifically, for two variables $X$ and $Y$, we consider the classical scenario where either $X$ causes $Y$, $Y$ causes $X$, or there is an unmeasured confounder between $X$ and $Y$. Let $m_1$ be the number of observational samples of $(X,Y)$, and let $m_2$ be the number of interventional samples where either $X$ or $Y$ has been subject to an external intervention. We show that if $X$ and $Y$ are over a finite domain of size $k$ and are significantly correlated, the minimum $m_2$ needed is sublinear in $k$. Moreover, as $m_1$ grows, the minimum $m_2$ needed to identify the causal structure decreases. In fact, we can give a tight characterization of the tradeoff between $m_1$ and $m_2$ when $m_1 = O(k)$ or is sufficiently large. We build upon techniques for closeness testing when $m_1$ is small (e.g., sublinear in $k$), and for non-parametric density estimation when $m_2$ is large. Our hardness results are based on carefully constructing causal models whose marginal and interventional distributions form hard instances of canonical results on property testing.} }
Endnote
%0 Conference Paper %T Sample Complexity of Distinguishing Cause from Effect %A Jayadev Acharya %A Sourbh Bhadane %A Arnab Bhattacharyya %A Saravanan Kandasamy %A Ziteng Sun %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-acharya23b %I PMLR %P 10487--10504 %U https://proceedings.mlr.press/v206/acharya23b.html %V 206 %X We study the sample complexity of causal structure learning on a two-variable system with observational and experimental data. Specifically, for two variables $X$ and $Y$, we consider the classical scenario where either $X$ causes $Y$, $Y$ causes $X$, or there is an unmeasured confounder between $X$ and $Y$. Let $m_1$ be the number of observational samples of $(X,Y)$, and let $m_2$ be the number of interventional samples where either $X$ or $Y$ has been subject to an external intervention. We show that if $X$ and $Y$ are over a finite domain of size $k$ and are significantly correlated, the minimum $m_2$ needed is sublinear in $k$. Moreover, as $m_1$ grows, the minimum $m_2$ needed to identify the causal structure decreases. In fact, we can give a tight characterization of the tradeoff between $m_1$ and $m_2$ when $m_1 = O(k)$ or is sufficiently large. We build upon techniques for closeness testing when $m_1$ is small (e.g., sublinear in $k$), and for non-parametric density estimation when $m_2$ is large. Our hardness results are based on carefully constructing causal models whose marginal and interventional distributions form hard instances of canonical results on property testing.
APA
Acharya, J., Bhadane, S., Bhattacharyya, A., Kandasamy, S. & Sun, Z.. (2023). Sample Complexity of Distinguishing Cause from Effect. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:10487-10504 Available from https://proceedings.mlr.press/v206/acharya23b.html.

Related Material