GPU Acceleration for Information-theoretic Constraint-based Causal Discovery

Christopher Hagedorn, Constantin Lange, Johannes Huegle, Rainer Schlosser
Proceedings of The KDD'22 Workshop on Causal Discovery, PMLR 185:30-60, 2022.

Abstract

The discovery of causal relationships from observational data is an omnipresent task in data science. In real-world scenarios, observational data is often high-dimensional, and functional causal relationships can be nonlinear. To handle nonlinear relationships within constraint-based causal discovery, appropriate conditional independence tests (CI-tests) become necessary, e.g., non-parametric information-theory-based CI-tests. Both high- dimensional data and CI-tests for nonlinear relationships pose computational challenges. Existing work proposes parallel processing on Graphics Processing Units (GPUs) to address the computational demand resulting from high-dimensional data, in the case of discrete data or linear relationships. We extend this idea to cover CI-tests for nonlinear relationships in our work. Therefore, we develop GPUCMIknn, a GPU-accelerated version of an existing CI-test, which builds upon conditional mutual information (CMI) combined with a local permutation scheme. Further, we propose a version of the PC algorithm, called GPUCMIknn-Parallel, to process multiple instances of GPUCMIknn on the GPU in parallel. Experiments show that the performance of GPUCMIknn is mainly affected by the number of k-nearest-neighbors (knn) within the CMI estimation. Depending on the chosen number of knn, the achieved speedup of GPUCMIknn ranges between factors of 2.3 to 352. In causal discovery, our method GPUCMIknn-Parallel outperforms a single-threaded CPU version by factors of up to 1 000, a multi-threaded CPU version using eight cores by factors of up to 240, and a naive GPU version by up to a factor 3.

Cite this Paper


BibTeX
@InProceedings{pmlr-v185-hagedorn22a, title = {GPU Acceleration for Information-theoretic Constraint-based Causal Discovery}, author = {Hagedorn, Christopher and Lange, Constantin and Huegle, Johannes and Schlosser, Rainer}, booktitle = {Proceedings of The KDD'22 Workshop on Causal Discovery}, pages = {30--60}, year = {2022}, editor = {Le, Thuc Duy and Liu, Lin and Kıcıman, Emre and Triantafyllou, Sofia and Liu, Huan}, volume = {185}, series = {Proceedings of Machine Learning Research}, month = {15 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v185/hagedorn22a/hagedorn22a.pdf}, url = {https://proceedings.mlr.press/v185/hagedorn22a.html}, abstract = {The discovery of causal relationships from observational data is an omnipresent task in data science. In real-world scenarios, observational data is often high-dimensional, and functional causal relationships can be nonlinear. To handle nonlinear relationships within constraint-based causal discovery, appropriate conditional independence tests (CI-tests) become necessary, e.g., non-parametric information-theory-based CI-tests. Both high- dimensional data and CI-tests for nonlinear relationships pose computational challenges. Existing work proposes parallel processing on Graphics Processing Units (GPUs) to address the computational demand resulting from high-dimensional data, in the case of discrete data or linear relationships. We extend this idea to cover CI-tests for nonlinear relationships in our work. Therefore, we develop GPUCMIknn, a GPU-accelerated version of an existing CI-test, which builds upon conditional mutual information (CMI) combined with a local permutation scheme. Further, we propose a version of the PC algorithm, called GPUCMIknn-Parallel, to process multiple instances of GPUCMIknn on the GPU in parallel. Experiments show that the performance of GPUCMIknn is mainly affected by the number of k-nearest-neighbors (knn) within the CMI estimation. Depending on the chosen number of knn, the achieved speedup of GPUCMIknn ranges between factors of 2.3 to 352. In causal discovery, our method GPUCMIknn-Parallel outperforms a single-threaded CPU version by factors of up to 1 000, a multi-threaded CPU version using eight cores by factors of up to 240, and a naive GPU version by up to a factor 3.} }
Endnote
%0 Conference Paper %T GPU Acceleration for Information-theoretic Constraint-based Causal Discovery %A Christopher Hagedorn %A Constantin Lange %A Johannes Huegle %A Rainer Schlosser %B Proceedings of The KDD'22 Workshop on Causal Discovery %C Proceedings of Machine Learning Research %D 2022 %E Thuc Duy Le %E Lin Liu %E Emre Kıcıman %E Sofia Triantafyllou %E Huan Liu %F pmlr-v185-hagedorn22a %I PMLR %P 30--60 %U https://proceedings.mlr.press/v185/hagedorn22a.html %V 185 %X The discovery of causal relationships from observational data is an omnipresent task in data science. In real-world scenarios, observational data is often high-dimensional, and functional causal relationships can be nonlinear. To handle nonlinear relationships within constraint-based causal discovery, appropriate conditional independence tests (CI-tests) become necessary, e.g., non-parametric information-theory-based CI-tests. Both high- dimensional data and CI-tests for nonlinear relationships pose computational challenges. Existing work proposes parallel processing on Graphics Processing Units (GPUs) to address the computational demand resulting from high-dimensional data, in the case of discrete data or linear relationships. We extend this idea to cover CI-tests for nonlinear relationships in our work. Therefore, we develop GPUCMIknn, a GPU-accelerated version of an existing CI-test, which builds upon conditional mutual information (CMI) combined with a local permutation scheme. Further, we propose a version of the PC algorithm, called GPUCMIknn-Parallel, to process multiple instances of GPUCMIknn on the GPU in parallel. Experiments show that the performance of GPUCMIknn is mainly affected by the number of k-nearest-neighbors (knn) within the CMI estimation. Depending on the chosen number of knn, the achieved speedup of GPUCMIknn ranges between factors of 2.3 to 352. In causal discovery, our method GPUCMIknn-Parallel outperforms a single-threaded CPU version by factors of up to 1 000, a multi-threaded CPU version using eight cores by factors of up to 240, and a naive GPU version by up to a factor 3.
APA
Hagedorn, C., Lange, C., Huegle, J. & Schlosser, R.. (2022). GPU Acceleration for Information-theoretic Constraint-based Causal Discovery. Proceedings of The KDD'22 Workshop on Causal Discovery, in Proceedings of Machine Learning Research 185:30-60 Available from https://proceedings.mlr.press/v185/hagedorn22a.html.

Related Material