Load-Balanced Parallel Constraint-Based Causal Structure Learning on Multi-Core Systems for High-Dimensional Data

Christopher Schmidt, Johannes Huegle, Philipp Bode, Matthias Uflacker
Proceedings of Machine Learning Research, PMLR 104:59-77, 2019.

Abstract

In the context of high-dimensional data state-of-the-art methods for constraint-based causal structure learning, such as the PC algorithm, are limited in their application through their worst case exponential computational complexity. To address the resulting long execution time, several parallel extensions have been developed to exploit modern multi-core systems. These extensions apply a static distribution of tasks to the execution units to achieve paral- lelism, which introduces the problem of load imbalance. In our work, we propose a parallel implementation that follows a dynamic task distribution in order to avoid situations of load imbalance and improve the execution time. On the basis of an experimental evaluation on real-world high dimensional datasets, we show that our implementation has a better load balancing compared to an existing parallel implementation in the context of multivariate normal distributed data. For datasets that introduce load imbalance, our dynamic task distribution approach outperforms existing static approaches by factors up to 2.4. Overall, we increase the speed up from factors of up to 27, for the static approach, to factors of up to 39 for the dynamic approach, when scaling to 80 cores compared to a non-parallel execution.

Cite this Paper


BibTeX
@InProceedings{pmlr-v104-schmidt19a, title = {Load-Balanced Parallel Constraint-Based Causal Structure Learning on Multi-Core Systems for High-Dimensional Data}, author = {Schmidt, Christopher and Huegle, Johannes and Bode, Philipp and Uflacker, Matthias}, booktitle = {Proceedings of Machine Learning Research}, pages = {59--77}, year = {2019}, editor = {}, volume = {104}, series = {Proceedings of Machine Learning Research}, month = {05 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v104/schmidt19a/schmidt19a.pdf}, url = {https://proceedings.mlr.press/v104/schmidt19a.html}, abstract = {In the context of high-dimensional data state-of-the-art methods for constraint-based causal structure learning, such as the PC algorithm, are limited in their application through their worst case exponential computational complexity. To address the resulting long execution time, several parallel extensions have been developed to exploit modern multi-core systems. These extensions apply a static distribution of tasks to the execution units to achieve paral- lelism, which introduces the problem of load imbalance. In our work, we propose a parallel implementation that follows a dynamic task distribution in order to avoid situations of load imbalance and improve the execution time. On the basis of an experimental evaluation on real-world high dimensional datasets, we show that our implementation has a better load balancing compared to an existing parallel implementation in the context of multivariate normal distributed data. For datasets that introduce load imbalance, our dynamic task distribution approach outperforms existing static approaches by factors up to 2.4. Overall, we increase the speed up from factors of up to 27, for the static approach, to factors of up to 39 for the dynamic approach, when scaling to 80 cores compared to a non-parallel execution.} }
Endnote
%0 Conference Paper %T Load-Balanced Parallel Constraint-Based Causal Structure Learning on Multi-Core Systems for High-Dimensional Data %A Christopher Schmidt %A Johannes Huegle %A Philipp Bode %A Matthias Uflacker %B Proceedings of Machine Learning Research %C Proceedings of Machine Learning Research %D 2019 %E %F pmlr-v104-schmidt19a %I PMLR %P 59--77 %U https://proceedings.mlr.press/v104/schmidt19a.html %V 104 %X In the context of high-dimensional data state-of-the-art methods for constraint-based causal structure learning, such as the PC algorithm, are limited in their application through their worst case exponential computational complexity. To address the resulting long execution time, several parallel extensions have been developed to exploit modern multi-core systems. These extensions apply a static distribution of tasks to the execution units to achieve paral- lelism, which introduces the problem of load imbalance. In our work, we propose a parallel implementation that follows a dynamic task distribution in order to avoid situations of load imbalance and improve the execution time. On the basis of an experimental evaluation on real-world high dimensional datasets, we show that our implementation has a better load balancing compared to an existing parallel implementation in the context of multivariate normal distributed data. For datasets that introduce load imbalance, our dynamic task distribution approach outperforms existing static approaches by factors up to 2.4. Overall, we increase the speed up from factors of up to 27, for the static approach, to factors of up to 39 for the dynamic approach, when scaling to 80 cores compared to a non-parallel execution.
APA
Schmidt, C., Huegle, J., Bode, P. & Uflacker, M.. (2019). Load-Balanced Parallel Constraint-Based Causal Structure Learning on Multi-Core Systems for High-Dimensional Data. Proceedings of Machine Learning Research, in Proceedings of Machine Learning Research 104:59-77 Available from https://proceedings.mlr.press/v104/schmidt19a.html.

Related Material