Conditional independence testing based on a nearestneighbor estimator of conditional mutual information
[edit]
Proceedings of the TwentyFirst International Conference on Artificial Intelligence and Statistics, PMLR 84:938947, 2018.
Abstract
Conditional independence testing is a fundamental problem underlying causal discovery and a particularly challenging task in the presence of nonlinear dependencies. Here a fully nonparametric test for continuous data based on conditional mutual information combined with a local permutation scheme is presented. Numerical experiments covering sample sizes from $50$ to $2,000$ and dimensions up to $10$ demonstrate that the test reliably generates the null distribution. For smooth nonlinear dependencies, the test has higher power than kernelbased tests in lower dimensions and similar or slightly lower power in higher dimensions. For highly nonsmooth densities the dataadaptive nearest neighbor approach is particularly wellsuited while kernel methods yield much lower power. The experiments also show that kernel methods utilizing an analytical approximation of the null distribution are not wellcalibrated for sample sizes below $1,000$. Combining the local permutation scheme with these kernel tests leads to better calibration but lower power. For smaller sample sizes and lower dimensions, the proposed test is faster than random fourier featurebased kernel tests if (embarrassingly) parallelized, but the runtime increases more sharply with sample size and dimensionality. Thus, more theoretical research to analytically approximate the null distribution and speed up the estimation is desirable. As illustrated on real data here, the test is ideally suited in combination with causal discovery algorithms.
Related Material


