Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

Maryam Aliakbarpour, Alireza Azizi, Ria Stevens
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:106-157, 2026.

Abstract

Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $\epsilon$-far from all product distributions in total variation distance. In the non-parametric finite-sample regime, this task is notoriously expensive, as the minimax sample complexity scales polynomially with the support size. In this work, we move beyond these worst-case limitations by leveraging the framework of augmented distribution testing. We design independence testers that incorporate auxiliary, but potentially untrustworthy, predictive information. Our framework ensures that the tester remains robust, maintaining worst-case validity regardless of the prediction’s quality, while significantly improving sample efficiency when the prediction is accurate. Our main contributions include: (i) a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error; (ii) a generalization to the high-dimensional multivariate setting for testing the independence of $d$ random variables; and (iii) matching minimax lower bounds demonstrating that our testers achieve optimal sample complexity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-aliakbarpour26a, title = {Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions}, author = {Aliakbarpour, Maryam and Azizi, Alireza and Stevens, Ria}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {106--157}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/aliakbarpour26a/aliakbarpour26a.pdf}, url = {https://proceedings.mlr.press/v336/aliakbarpour26a.html}, abstract = {Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $\epsilon$-far from all product distributions in total variation distance. In the non-parametric finite-sample regime, this task is notoriously expensive, as the minimax sample complexity scales polynomially with the support size. In this work, we move beyond these worst-case limitations by leveraging the framework of augmented distribution testing. We design independence testers that incorporate auxiliary, but potentially untrustworthy, predictive information. Our framework ensures that the tester remains robust, maintaining worst-case validity regardless of the prediction’s quality, while significantly improving sample efficiency when the prediction is accurate. Our main contributions include: (i) a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error; (ii) a generalization to the high-dimensional multivariate setting for testing the independence of $d$ random variables; and (iii) matching minimax lower bounds demonstrating that our testers achieve optimal sample complexity.} }
Endnote
%0 Conference Paper %T Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions %A Maryam Aliakbarpour %A Alireza Azizi %A Ria Stevens %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-aliakbarpour26a %I PMLR %P 106--157 %U https://proceedings.mlr.press/v336/aliakbarpour26a.html %V 336 %X Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $\epsilon$-far from all product distributions in total variation distance. In the non-parametric finite-sample regime, this task is notoriously expensive, as the minimax sample complexity scales polynomially with the support size. In this work, we move beyond these worst-case limitations by leveraging the framework of augmented distribution testing. We design independence testers that incorporate auxiliary, but potentially untrustworthy, predictive information. Our framework ensures that the tester remains robust, maintaining worst-case validity regardless of the prediction’s quality, while significantly improving sample efficiency when the prediction is accurate. Our main contributions include: (i) a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error; (ii) a generalization to the high-dimensional multivariate setting for testing the independence of $d$ random variables; and (iii) matching minimax lower bounds demonstrating that our testers achieve optimal sample complexity.
APA
Aliakbarpour, M., Azizi, A. & Stevens, R.. (2026). Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:106-157 Available from https://proceedings.mlr.press/v336/aliakbarpour26a.html.

Related Material