AutoCD: Automated Machine Learning for Causal Discovery Algorithms

Gerlise Chan, Tom Claassen, Holger H. Hoos, Tom Heskes, Mitra Baratchi
Proceedings of The 12th International Conference on Probabilistic Graphical Models, PMLR 246:106-132, 2024.

Abstract

This paper studies automated machine learning (AutoML) for causal discovery, the process of uncovering cause-and-effect relationships within data. Causal discovery is an unsupervised learning problem, as the target (the underlying ground truth causal model) is typically unknown. Therefore, the loss functions commonly used as an optimisation objective in AutoML systems developed for supervised learning problems are not applicable. We propose AutoCD, the first AutoML system utilising Bayesian optimisation based on a search space of causal discovery algorithms. In designing AutoCD, we study and compare the applicability of two different loss functions and post-hoc corrections. Additionally, based on the analysis of the performance of AutoCD, we propose an improved version called AutoCD_PC by warm-starting the search from the PC algorithm. Results from our experiments on datasets simulated from 45 graphical models demonstrate that AutoCD_PC performs better than the baselines by ranking the highest (avg. rank 3.69) compared to the best causal tuning baseline (avg. rank 5.21) and the best fine-tuned individual algorithm (avg. rank 4.36).

Cite this Paper


BibTeX
@InProceedings{pmlr-v246-chan24a, title = {{AutoCD}: Automated Machine Learning for Causal Discovery Algorithms}, author = {Chan, Gerlise and Claassen, Tom and Hoos, Holger H. and Heskes, Tom and Baratchi, Mitra}, booktitle = {Proceedings of The 12th International Conference on Probabilistic Graphical Models}, pages = {106--132}, year = {2024}, editor = {Kwisthout, Johan and Renooij, Silja}, volume = {246}, series = {Proceedings of Machine Learning Research}, month = {11--13 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v246/main/assets/chan24a/chan24a.pdf}, url = {https://proceedings.mlr.press/v246/chan24a.html}, abstract = {This paper studies automated machine learning (AutoML) for causal discovery, the process of uncovering cause-and-effect relationships within data. Causal discovery is an unsupervised learning problem, as the target (the underlying ground truth causal model) is typically unknown. Therefore, the loss functions commonly used as an optimisation objective in AutoML systems developed for supervised learning problems are not applicable. We propose AutoCD, the first AutoML system utilising Bayesian optimisation based on a search space of causal discovery algorithms. In designing AutoCD, we study and compare the applicability of two different loss functions and post-hoc corrections. Additionally, based on the analysis of the performance of AutoCD, we propose an improved version called AutoCD_PC by warm-starting the search from the PC algorithm. Results from our experiments on datasets simulated from 45 graphical models demonstrate that AutoCD_PC performs better than the baselines by ranking the highest (avg. rank 3.69) compared to the best causal tuning baseline (avg. rank 5.21) and the best fine-tuned individual algorithm (avg. rank 4.36).} }
Endnote
%0 Conference Paper %T AutoCD: Automated Machine Learning for Causal Discovery Algorithms %A Gerlise Chan %A Tom Claassen %A Holger H. Hoos %A Tom Heskes %A Mitra Baratchi %B Proceedings of The 12th International Conference on Probabilistic Graphical Models %C Proceedings of Machine Learning Research %D 2024 %E Johan Kwisthout %E Silja Renooij %F pmlr-v246-chan24a %I PMLR %P 106--132 %U https://proceedings.mlr.press/v246/chan24a.html %V 246 %X This paper studies automated machine learning (AutoML) for causal discovery, the process of uncovering cause-and-effect relationships within data. Causal discovery is an unsupervised learning problem, as the target (the underlying ground truth causal model) is typically unknown. Therefore, the loss functions commonly used as an optimisation objective in AutoML systems developed for supervised learning problems are not applicable. We propose AutoCD, the first AutoML system utilising Bayesian optimisation based on a search space of causal discovery algorithms. In designing AutoCD, we study and compare the applicability of two different loss functions and post-hoc corrections. Additionally, based on the analysis of the performance of AutoCD, we propose an improved version called AutoCD_PC by warm-starting the search from the PC algorithm. Results from our experiments on datasets simulated from 45 graphical models demonstrate that AutoCD_PC performs better than the baselines by ranking the highest (avg. rank 3.69) compared to the best causal tuning baseline (avg. rank 5.21) and the best fine-tuned individual algorithm (avg. rank 4.36).
APA
Chan, G., Claassen, T., Hoos, H.H., Heskes, T. & Baratchi, M.. (2024). AutoCD: Automated Machine Learning for Causal Discovery Algorithms. Proceedings of The 12th International Conference on Probabilistic Graphical Models, in Proceedings of Machine Learning Research 246:106-132 Available from https://proceedings.mlr.press/v246/chan24a.html.

Related Material