GPU Acceleration for Information-theoretic Constraint-based Causal Discovery
Proceedings of The KDD'22 Workshop on Causal Discovery, PMLR 185:30-60, 2022.
The discovery of causal relationships from observational data is an omnipresent task in data science. In real-world scenarios, observational data is often high-dimensional, and functional causal relationships can be nonlinear. To handle nonlinear relationships within constraint-based causal discovery, appropriate conditional independence tests (CI-tests) become necessary, e.g., non-parametric information-theory-based CI-tests. Both high- dimensional data and CI-tests for nonlinear relationships pose computational challenges. Existing work proposes parallel processing on Graphics Processing Units (GPUs) to address the computational demand resulting from high-dimensional data, in the case of discrete data or linear relationships. We extend this idea to cover CI-tests for nonlinear relationships in our work. Therefore, we develop GPUCMIknn, a GPU-accelerated version of an existing CI-test, which builds upon conditional mutual information (CMI) combined with a local permutation scheme. Further, we propose a version of the PC algorithm, called GPUCMIknn-Parallel, to process multiple instances of GPUCMIknn on the GPU in parallel. Experiments show that the performance of GPUCMIknn is mainly affected by the number of k-nearest-neighbors (knn) within the CMI estimation. Depending on the chosen number of knn, the achieved speedup of GPUCMIknn ranges between factors of 2.3 to 352. In causal discovery, our method GPUCMIknn-Parallel outperforms a single-threaded CPU version by factors of up to 1 000, a multi-threaded CPU version using eight cores by factors of up to 240, and a naive GPU version by up to a factor 3.