A Sample Efficient Conditional Independence Test in the Presence of Discretization

Boyang Sun, Yu Yao, Xinshuai Dong, Zongfang Liu, Tongliang Liu, Yumou Qiu, Kun Zhang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:57828-57853, 2025.

Abstract

Conditional independence (CI) test is a fundamental concept in statistics. In many real-world scenarios, some variables may be difficult to measure accurately, often leading to data being represented as discretized values. Applying CI tests directly to discretized data, however, can lead to incorrect conclusions about the independence of latent variables. To address this, recent advancements have sought to infer the correct CI relationship between the latent variables by binarizing the observed data. However, this process results in a loss of information, which degrades the test’s performance, particularly with small sample sizes. Motivated by this, this paper introduces a new sample-efficient CI test that does not rely on the binarization process. We find that the relationship can be established by addressing an over-identifying restriction problem with Generalized Method of Moments (GMM). Based on this finding, we have designed a new test statistic, and its asymptotic distribution has been derived. Empirical results across various datasets show that our method consistently outperforms existing ones.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-sun25y, title = {A Sample Efficient Conditional Independence Test in the Presence of Discretization}, author = {Sun, Boyang and Yao, Yu and Dong, Xinshuai and Liu, Zongfang and Liu, Tongliang and Qiu, Yumou and Zhang, Kun}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {57828--57853}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/sun25y/sun25y.pdf}, url = {https://proceedings.mlr.press/v267/sun25y.html}, abstract = {Conditional independence (CI) test is a fundamental concept in statistics. In many real-world scenarios, some variables may be difficult to measure accurately, often leading to data being represented as discretized values. Applying CI tests directly to discretized data, however, can lead to incorrect conclusions about the independence of latent variables. To address this, recent advancements have sought to infer the correct CI relationship between the latent variables by binarizing the observed data. However, this process results in a loss of information, which degrades the test’s performance, particularly with small sample sizes. Motivated by this, this paper introduces a new sample-efficient CI test that does not rely on the binarization process. We find that the relationship can be established by addressing an over-identifying restriction problem with Generalized Method of Moments (GMM). Based on this finding, we have designed a new test statistic, and its asymptotic distribution has been derived. Empirical results across various datasets show that our method consistently outperforms existing ones.} }
Endnote
%0 Conference Paper %T A Sample Efficient Conditional Independence Test in the Presence of Discretization %A Boyang Sun %A Yu Yao %A Xinshuai Dong %A Zongfang Liu %A Tongliang Liu %A Yumou Qiu %A Kun Zhang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-sun25y %I PMLR %P 57828--57853 %U https://proceedings.mlr.press/v267/sun25y.html %V 267 %X Conditional independence (CI) test is a fundamental concept in statistics. In many real-world scenarios, some variables may be difficult to measure accurately, often leading to data being represented as discretized values. Applying CI tests directly to discretized data, however, can lead to incorrect conclusions about the independence of latent variables. To address this, recent advancements have sought to infer the correct CI relationship between the latent variables by binarizing the observed data. However, this process results in a loss of information, which degrades the test’s performance, particularly with small sample sizes. Motivated by this, this paper introduces a new sample-efficient CI test that does not rely on the binarization process. We find that the relationship can be established by addressing an over-identifying restriction problem with Generalized Method of Moments (GMM). Based on this finding, we have designed a new test statistic, and its asymptotic distribution has been derived. Empirical results across various datasets show that our method consistently outperforms existing ones.
APA
Sun, B., Yao, Y., Dong, X., Liu, Z., Liu, T., Qiu, Y. & Zhang, K.. (2025). A Sample Efficient Conditional Independence Test in the Presence of Discretization. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:57828-57853 Available from https://proceedings.mlr.press/v267/sun25y.html.

Related Material