Permutation-based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data

Xinshuai Dong, Ignavier Ng, Boyang Sun, Haoyue Dai, Guang-Yuan Hao, Shunxing Fan, Peter Spirtes, Yumou Qiu, Kun Zhang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:14137-14151, 2025.

Abstract

Recent advances have shown that statistical tests for the rank of cross-covariance matrices play an important role in causal discovery. These rank tests include partial correlation tests as special cases and provide further graphical information about latent variables. Existing rank tests typically assume that all the continuous variables can be perfectly measured, and yet, in practice many variables can only be measured after discretization. For example, in psychometric studies, the continuous level of certain personality dimensions of a person can only be measured after being discretized into order-preserving options such as disagree, neutral, and agree. Motivated by this, we propose Mixed data Permutation-based Rank Test (MPRT), which properly controls the statistical errors even when some or all variables are discretized. Theoretically, we establish the exchangeability and estimate the asymptotic null distribution by permutations; as a consequence, MPRT can effectively control the Type I error in the presence of discretization while previous methods cannot. Empirically, our method is validated by extensive experiments on synthetic data and real-world data to demonstrate its effectiveness as well as applicability in causal discovery (code will be available at https://github.com/dongxinshuai/scm-identify).

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-dong25i, title = {Permutation-based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data}, author = {Dong, Xinshuai and Ng, Ignavier and Sun, Boyang and Dai, Haoyue and Hao, Guang-Yuan and Fan, Shunxing and Spirtes, Peter and Qiu, Yumou and Zhang, Kun}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {14137--14151}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/dong25i/dong25i.pdf}, url = {https://proceedings.mlr.press/v267/dong25i.html}, abstract = {Recent advances have shown that statistical tests for the rank of cross-covariance matrices play an important role in causal discovery. These rank tests include partial correlation tests as special cases and provide further graphical information about latent variables. Existing rank tests typically assume that all the continuous variables can be perfectly measured, and yet, in practice many variables can only be measured after discretization. For example, in psychometric studies, the continuous level of certain personality dimensions of a person can only be measured after being discretized into order-preserving options such as disagree, neutral, and agree. Motivated by this, we propose Mixed data Permutation-based Rank Test (MPRT), which properly controls the statistical errors even when some or all variables are discretized. Theoretically, we establish the exchangeability and estimate the asymptotic null distribution by permutations; as a consequence, MPRT can effectively control the Type I error in the presence of discretization while previous methods cannot. Empirically, our method is validated by extensive experiments on synthetic data and real-world data to demonstrate its effectiveness as well as applicability in causal discovery (code will be available at https://github.com/dongxinshuai/scm-identify).} }
Endnote
%0 Conference Paper %T Permutation-based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data %A Xinshuai Dong %A Ignavier Ng %A Boyang Sun %A Haoyue Dai %A Guang-Yuan Hao %A Shunxing Fan %A Peter Spirtes %A Yumou Qiu %A Kun Zhang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-dong25i %I PMLR %P 14137--14151 %U https://proceedings.mlr.press/v267/dong25i.html %V 267 %X Recent advances have shown that statistical tests for the rank of cross-covariance matrices play an important role in causal discovery. These rank tests include partial correlation tests as special cases and provide further graphical information about latent variables. Existing rank tests typically assume that all the continuous variables can be perfectly measured, and yet, in practice many variables can only be measured after discretization. For example, in psychometric studies, the continuous level of certain personality dimensions of a person can only be measured after being discretized into order-preserving options such as disagree, neutral, and agree. Motivated by this, we propose Mixed data Permutation-based Rank Test (MPRT), which properly controls the statistical errors even when some or all variables are discretized. Theoretically, we establish the exchangeability and estimate the asymptotic null distribution by permutations; as a consequence, MPRT can effectively control the Type I error in the presence of discretization while previous methods cannot. Empirically, our method is validated by extensive experiments on synthetic data and real-world data to demonstrate its effectiveness as well as applicability in causal discovery (code will be available at https://github.com/dongxinshuai/scm-identify).
APA
Dong, X., Ng, I., Sun, B., Dai, H., Hao, G., Fan, S., Spirtes, P., Qiu, Y. & Zhang, K.. (2025). Permutation-based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:14137-14151 Available from https://proceedings.mlr.press/v267/dong25i.html.

Related Material