Latent Variable Causal Discovery under Selection Bias

Haoyue Dai, Yiwen Qiu, Ignavier Ng, Xinshuai Dong, Peter Spirtes, Kun Zhang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:12161-12178, 2025.

Abstract

Addressing selection bias in latent variable causal discovery is important yet underexplored, largely due to a lack of suitable statistical tools: While various tools beyond basic conditional independencies have been developed to handle latent variables, none have been adapted for selection bias. We make an attempt by studying rank constraints, which, as a generalization to conditional independence constraints, exploits the ranks of covariance submatrices in linear Gaussian models. We show that although selection can significantly complicate the joint distribution, interestingly, the ranks in the biased covariance matrices still preserve meaningful information about both causal structures and selection mechanisms. We provide a graph-theoretic characterization of such rank constraints. Using this tool, we demonstrate that the one-factor model, a classical latent variable model, can be identified under selection bias. Simulations and real-world experiments confirm the effectiveness of using our rank constraints.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-dai25k, title = {Latent Variable Causal Discovery under Selection Bias}, author = {Dai, Haoyue and Qiu, Yiwen and Ng, Ignavier and Dong, Xinshuai and Spirtes, Peter and Zhang, Kun}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {12161--12178}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/dai25k/dai25k.pdf}, url = {https://proceedings.mlr.press/v267/dai25k.html}, abstract = {Addressing selection bias in latent variable causal discovery is important yet underexplored, largely due to a lack of suitable statistical tools: While various tools beyond basic conditional independencies have been developed to handle latent variables, none have been adapted for selection bias. We make an attempt by studying rank constraints, which, as a generalization to conditional independence constraints, exploits the ranks of covariance submatrices in linear Gaussian models. We show that although selection can significantly complicate the joint distribution, interestingly, the ranks in the biased covariance matrices still preserve meaningful information about both causal structures and selection mechanisms. We provide a graph-theoretic characterization of such rank constraints. Using this tool, we demonstrate that the one-factor model, a classical latent variable model, can be identified under selection bias. Simulations and real-world experiments confirm the effectiveness of using our rank constraints.} }
Endnote
%0 Conference Paper %T Latent Variable Causal Discovery under Selection Bias %A Haoyue Dai %A Yiwen Qiu %A Ignavier Ng %A Xinshuai Dong %A Peter Spirtes %A Kun Zhang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-dai25k %I PMLR %P 12161--12178 %U https://proceedings.mlr.press/v267/dai25k.html %V 267 %X Addressing selection bias in latent variable causal discovery is important yet underexplored, largely due to a lack of suitable statistical tools: While various tools beyond basic conditional independencies have been developed to handle latent variables, none have been adapted for selection bias. We make an attempt by studying rank constraints, which, as a generalization to conditional independence constraints, exploits the ranks of covariance submatrices in linear Gaussian models. We show that although selection can significantly complicate the joint distribution, interestingly, the ranks in the biased covariance matrices still preserve meaningful information about both causal structures and selection mechanisms. We provide a graph-theoretic characterization of such rank constraints. Using this tool, we demonstrate that the one-factor model, a classical latent variable model, can be identified under selection bias. Simulations and real-world experiments confirm the effectiveness of using our rank constraints.
APA
Dai, H., Qiu, Y., Ng, I., Dong, X., Spirtes, P. & Zhang, K.. (2025). Latent Variable Causal Discovery under Selection Bias. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:12161-12178 Available from https://proceedings.mlr.press/v267/dai25k.html.

Related Material