[edit]
Optimal Sample Complexity Lower Bounds on Conditional Independence Testing
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:5828-5873, 2026.
Abstract
We study the sample complexity of conditional independence testing. In this problem, given i.i.d. samples from a discrete distribution $P_{ABC}$, the goal is to distinguish whether $A$ and $C$ are conditionally independent with respect to $B$, i.e., $P_{ABC}=P_{A|B}P_BP_{C|B}$, or whether $A$ and $C$ are conditionally dependent, $\Delta(P_{ABC},P_{A|B}P_BP_{C|B})\geq \varepsilon$ for some fixed threshold $\varepsilon$ and distance measure $\Delta$. We are interested in the cases where $\Delta$ is either the $\ell_1$ distance or the KL-divergence. The study for the case of $\ell_1$ distance was initiated by (Canonne et al., STOC 2018), and the KL-divergence was recently studied by (Seyfried et al., COLT 2025). Both works design algorithms whose sample complexities scale sublinearly in the dimensions of the subsystems, and showed tight lower bounds in some parameter regimes. While Canonne et al. derived partial lower bounds for the remaining regimes as well, the problem of fully resolving the sample complexity in all parameters remained open. In this work, we settle these open questions and prove optimal sample complexity lower bounds for both of these problems, thereby completely settling the sample complexities up to polylogarithmic factors.