Distinguishing Cause from Effect on Categorical Data: The Uniform Channel Model

Mario A. T. Figueiredo, Catarina Oliveira
Proceedings of the Second Conference on Causal Learning and Reasoning, PMLR 213:122-141, 2023.

Abstract

Distinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional probability mass function (pmf) as a discrete memoryless channel. We select as the most likely causal direction the one in which the conditional pmf is closer to a uniform channel (UC). The rationale is that, in a UC, as in an ANM, the conditional entropy (of the effect given the cause) is independent of the cause distribution, in agreement with the principle of independence of cause and mechanism. Our approach, which we call the uniform channel model (UCM), thus extends the ANM rationale to categorical variables. To assess how close a conditional pmf (estimated from data) is to a UC, we use statistical testing, supported by a closed-form estimate of a UC channel. On the theoretical front, we prove identifiability of the UCM and show its equivalence with a structural causal model with a low-cardinality exogenous variable. Finally, the proposed method compares favorably with recent state-of-the-art alternatives in experiments on synthetic, benchmark, and real data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v213-figueiredo23a, title = {Distinguishing Cause from Effect on Categorical Data: The Uniform Channel Model}, author = {Figueiredo, Mario A. T. and Oliveira, Catarina}, booktitle = {Proceedings of the Second Conference on Causal Learning and Reasoning}, pages = {122--141}, year = {2023}, editor = {van der Schaar, Mihaela and Zhang, Cheng and Janzing, Dominik}, volume = {213}, series = {Proceedings of Machine Learning Research}, month = {11--14 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v213/figueiredo23a/figueiredo23a.pdf}, url = {https://proceedings.mlr.press/v213/figueiredo23a.html}, abstract = {Distinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional probability mass function (pmf) as a discrete memoryless channel. We select as the most likely causal direction the one in which the conditional pmf is closer to a uniform channel (UC). The rationale is that, in a UC, as in an ANM, the conditional entropy (of the effect given the cause) is independent of the cause distribution, in agreement with the principle of independence of cause and mechanism. Our approach, which we call the uniform channel model (UCM), thus extends the ANM rationale to categorical variables. To assess how close a conditional pmf (estimated from data) is to a UC, we use statistical testing, supported by a closed-form estimate of a UC channel. On the theoretical front, we prove identifiability of the UCM and show its equivalence with a structural causal model with a low-cardinality exogenous variable. Finally, the proposed method compares favorably with recent state-of-the-art alternatives in experiments on synthetic, benchmark, and real data.} }
Endnote
%0 Conference Paper %T Distinguishing Cause from Effect on Categorical Data: The Uniform Channel Model %A Mario A. T. Figueiredo %A Catarina Oliveira %B Proceedings of the Second Conference on Causal Learning and Reasoning %C Proceedings of Machine Learning Research %D 2023 %E Mihaela van der Schaar %E Cheng Zhang %E Dominik Janzing %F pmlr-v213-figueiredo23a %I PMLR %P 122--141 %U https://proceedings.mlr.press/v213/figueiredo23a.html %V 213 %X Distinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional probability mass function (pmf) as a discrete memoryless channel. We select as the most likely causal direction the one in which the conditional pmf is closer to a uniform channel (UC). The rationale is that, in a UC, as in an ANM, the conditional entropy (of the effect given the cause) is independent of the cause distribution, in agreement with the principle of independence of cause and mechanism. Our approach, which we call the uniform channel model (UCM), thus extends the ANM rationale to categorical variables. To assess how close a conditional pmf (estimated from data) is to a UC, we use statistical testing, supported by a closed-form estimate of a UC channel. On the theoretical front, we prove identifiability of the UCM and show its equivalence with a structural causal model with a low-cardinality exogenous variable. Finally, the proposed method compares favorably with recent state-of-the-art alternatives in experiments on synthetic, benchmark, and real data.
APA
Figueiredo, M.A.T. & Oliveira, C.. (2023). Distinguishing Cause from Effect on Categorical Data: The Uniform Channel Model. Proceedings of the Second Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 213:122-141 Available from https://proceedings.mlr.press/v213/figueiredo23a.html.

Related Material