Optimal Sample Complexity Lower Bounds on Conditional Independence Testing

Jan Seyfried, Neelkanth Mishra, Sayantan Sen, Marco Tomamichel
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:5828-5873, 2026.

Abstract

We study the sample complexity of conditional independence testing. In this problem, given i.i.d. samples from a discrete distribution $P_{ABC}$, the goal is to distinguish whether $A$ and $C$ are conditionally independent with respect to $B$, i.e., $P_{ABC}=P_{A|B}P_BP_{C|B}$, or whether $A$ and $C$ are conditionally dependent, $\Delta(P_{ABC},P_{A|B}P_BP_{C|B})\geq \varepsilon$ for some fixed threshold $\varepsilon$ and distance measure $\Delta$. We are interested in the cases where $\Delta$ is either the $\ell_1$ distance or the KL-divergence. The study for the case of $\ell_1$ distance was initiated by (Canonne et al., STOC 2018), and the KL-divergence was recently studied by (Seyfried et al., COLT 2025). Both works design algorithms whose sample complexities scale sublinearly in the dimensions of the subsystems, and showed tight lower bounds in some parameter regimes. While Canonne et al. derived partial lower bounds for the remaining regimes as well, the problem of fully resolving the sample complexity in all parameters remained open. In this work, we settle these open questions and prove optimal sample complexity lower bounds for both of these problems, thereby completely settling the sample complexities up to polylogarithmic factors.

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-seyfried26a, title = {Optimal Sample Complexity Lower Bounds on Conditional Independence Testing}, author = {Seyfried, Jan and Mishra, Neelkanth and Sen, Sayantan and Tomamichel, Marco}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {5828--5873}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/seyfried26a/seyfried26a.pdf}, url = {https://proceedings.mlr.press/v336/seyfried26a.html}, abstract = {We study the sample complexity of conditional independence testing. In this problem, given i.i.d. samples from a discrete distribution $P_{ABC}$, the goal is to distinguish whether $A$ and $C$ are conditionally independent with respect to $B$, i.e., $P_{ABC}=P_{A|B}P_BP_{C|B}$, or whether $A$ and $C$ are conditionally dependent, $\Delta(P_{ABC},P_{A|B}P_BP_{C|B})\geq \varepsilon$ for some fixed threshold $\varepsilon$ and distance measure $\Delta$. We are interested in the cases where $\Delta$ is either the $\ell_1$ distance or the KL-divergence. The study for the case of $\ell_1$ distance was initiated by (Canonne et al., STOC 2018), and the KL-divergence was recently studied by (Seyfried et al., COLT 2025). Both works design algorithms whose sample complexities scale sublinearly in the dimensions of the subsystems, and showed tight lower bounds in some parameter regimes. While Canonne et al. derived partial lower bounds for the remaining regimes as well, the problem of fully resolving the sample complexity in all parameters remained open. In this work, we settle these open questions and prove optimal sample complexity lower bounds for both of these problems, thereby completely settling the sample complexities up to polylogarithmic factors.} }
Endnote
%0 Conference Paper %T Optimal Sample Complexity Lower Bounds on Conditional Independence Testing %A Jan Seyfried %A Neelkanth Mishra %A Sayantan Sen %A Marco Tomamichel %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-seyfried26a %I PMLR %P 5828--5873 %U https://proceedings.mlr.press/v336/seyfried26a.html %V 336 %X We study the sample complexity of conditional independence testing. In this problem, given i.i.d. samples from a discrete distribution $P_{ABC}$, the goal is to distinguish whether $A$ and $C$ are conditionally independent with respect to $B$, i.e., $P_{ABC}=P_{A|B}P_BP_{C|B}$, or whether $A$ and $C$ are conditionally dependent, $\Delta(P_{ABC},P_{A|B}P_BP_{C|B})\geq \varepsilon$ for some fixed threshold $\varepsilon$ and distance measure $\Delta$. We are interested in the cases where $\Delta$ is either the $\ell_1$ distance or the KL-divergence. The study for the case of $\ell_1$ distance was initiated by (Canonne et al., STOC 2018), and the KL-divergence was recently studied by (Seyfried et al., COLT 2025). Both works design algorithms whose sample complexities scale sublinearly in the dimensions of the subsystems, and showed tight lower bounds in some parameter regimes. While Canonne et al. derived partial lower bounds for the remaining regimes as well, the problem of fully resolving the sample complexity in all parameters remained open. In this work, we settle these open questions and prove optimal sample complexity lower bounds for both of these problems, thereby completely settling the sample complexities up to polylogarithmic factors.
APA
Seyfried, J., Mishra, N., Sen, S. & Tomamichel, M.. (2026). Optimal Sample Complexity Lower Bounds on Conditional Independence Testing. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:5828-5873 Available from https://proceedings.mlr.press/v336/seyfried26a.html.

Related Material