Exploring One Million Machine Learning Pipelines: A Benchmarking Study

Edesio Alcobaça, Andre Carlos Ponce de Leon Ferreira De Carvalho
Proceedings of the Fourth International Conference on Automated Machine Learning, PMLR 293:22/1-34, 2025.

Abstract

Machine learning solutions are largely affected by the values of the hyperparameters of their algorithms. This has motivated a large number of recent research projects on hyperparameter tuning, with the proposal of several, and highly diverse, tuning approaches. Rather than proposing a new approach or identifying the most effective hyperparameter tuning approach, this paper looks for good machine learning solutions by exploring machine learning pipelines. For such, it benchmarks pipelines focusing on the interaction between feature preprocessing techniques and classification models. The study evaluates the effectiveness of pipeline combinations, identifying high-performing and underperforming combinations. Additionally, it provides meta-knowledge datasets without any optimization selection bias to foster research contributions in meta-learning, accelerating the development of meta-models. The findings provide insights into the most effective preprocessing and modeling combination, guiding practitioners and researchers in their selection processes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v293-alcobaca25a, title = {Exploring One Million Machine Learning Pipelines: A Benchmarking Study}, author = {Alcoba\c{c}a, Edesio and Carvalho, Andre Carlos Ponce de Leon Ferreira De}, booktitle = {Proceedings of the Fourth International Conference on Automated Machine Learning}, pages = {22/1--34}, year = {2025}, editor = {Akoglu, Leman and Doerr, Carola and van Rijn, Jan N. and Garnett, Roman and Gardner, Jacob R.}, volume = {293}, series = {Proceedings of Machine Learning Research}, month = {08--11 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v293/main/assets/alcobaca25a/alcobaca25a.pdf}, url = {https://proceedings.mlr.press/v293/alcobaca25a.html}, abstract = {Machine learning solutions are largely affected by the values of the hyperparameters of their algorithms. This has motivated a large number of recent research projects on hyperparameter tuning, with the proposal of several, and highly diverse, tuning approaches. Rather than proposing a new approach or identifying the most effective hyperparameter tuning approach, this paper looks for good machine learning solutions by exploring machine learning pipelines. For such, it benchmarks pipelines focusing on the interaction between feature preprocessing techniques and classification models. The study evaluates the effectiveness of pipeline combinations, identifying high-performing and underperforming combinations. Additionally, it provides meta-knowledge datasets without any optimization selection bias to foster research contributions in meta-learning, accelerating the development of meta-models. The findings provide insights into the most effective preprocessing and modeling combination, guiding practitioners and researchers in their selection processes.} }
Endnote
%0 Conference Paper %T Exploring One Million Machine Learning Pipelines: A Benchmarking Study %A Edesio Alcobaça %A Andre Carlos Ponce de Leon Ferreira De Carvalho %B Proceedings of the Fourth International Conference on Automated Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Leman Akoglu %E Carola Doerr %E Jan N. van Rijn %E Roman Garnett %E Jacob R. Gardner %F pmlr-v293-alcobaca25a %I PMLR %P 22/1--34 %U https://proceedings.mlr.press/v293/alcobaca25a.html %V 293 %X Machine learning solutions are largely affected by the values of the hyperparameters of their algorithms. This has motivated a large number of recent research projects on hyperparameter tuning, with the proposal of several, and highly diverse, tuning approaches. Rather than proposing a new approach or identifying the most effective hyperparameter tuning approach, this paper looks for good machine learning solutions by exploring machine learning pipelines. For such, it benchmarks pipelines focusing on the interaction between feature preprocessing techniques and classification models. The study evaluates the effectiveness of pipeline combinations, identifying high-performing and underperforming combinations. Additionally, it provides meta-knowledge datasets without any optimization selection bias to foster research contributions in meta-learning, accelerating the development of meta-models. The findings provide insights into the most effective preprocessing and modeling combination, guiding practitioners and researchers in their selection processes.
APA
Alcobaça, E. & Carvalho, A.C.P.d.L.F.D.. (2025). Exploring One Million Machine Learning Pipelines: A Benchmarking Study. Proceedings of the Fourth International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 293:22/1-34 Available from https://proceedings.mlr.press/v293/alcobaca25a.html.

Related Material