ABLATOR: Robust Horizontal-Scaling of Machine Learning Ablation Experiments

Iordanis Fostiropoulos, Laurent Itti
Proceedings of the Second International Conference on Automated Machine Learning, PMLR 224:19/1-15, 2023.

Abstract

Understanding the efficacy of a method requires ablation experiments. Current Machine Learning (ML) workflows emphasize the vertical scaling of large models with paradigms such as ‘data-parallelism’ or ‘model-parallelism’. As a consequence, there is a lack of methods for horizontal scaling of multiple experimental trials. Horizontal scaling is labor intensive when different tools are used for different experiment stages, such as for hyper-parameter optimization, distributed execution, or the consolidation of artifacts. We identify that errors in earlier stages of experimentation propagate to the analysis. Based on our observations, experimental results, and the current literature, we provide recommendations on best practices to prevent errors. To reduce the effort required to perform an accurate analysis and address common errors when scaling the execution of multiple experiments, we introduce ABLATOR. Our framework uses a stateful experiment design paradigm that provides experiment persistence and is robust to errors. Our actionable analysis artifacts are automatically produced by the experiment state and reduce the time to evaluate a hypothesis. We evaluate ABLATOR with ablation studies on a Transformer model, ‘Tablator’, where we study the effect of 6 architectural components, 8 model hyperparameters, 3 training hyperparameters, and 4 dataset preprocessing methodologies on 11 tabular datasets. We performed the largest ablation experiment for tabular data on Transformer models to date, evaluating 2,337 models in total. Finally, we open source ABLATOR; \url{https://github.com/fostiropoulos/ablator}

Cite this Paper


BibTeX
@InProceedings{pmlr-v224-fostiropoulos23a, title = {ABLATOR: Robust Horizontal-Scaling of Machine Learning Ablation Experiments}, author = {Fostiropoulos, Iordanis and Itti, Laurent}, booktitle = {Proceedings of the Second International Conference on Automated Machine Learning}, pages = {19/1--15}, year = {2023}, editor = {Faust, Aleksandra and Garnett, Roman and White, Colin and Hutter, Frank and Gardner, Jacob R.}, volume = {224}, series = {Proceedings of Machine Learning Research}, month = {12--15 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v224/fostiropoulos23a/fostiropoulos23a.pdf}, url = {https://proceedings.mlr.press/v224/fostiropoulos23a.html}, abstract = {Understanding the efficacy of a method requires ablation experiments. Current Machine Learning (ML) workflows emphasize the vertical scaling of large models with paradigms such as ‘data-parallelism’ or ‘model-parallelism’. As a consequence, there is a lack of methods for horizontal scaling of multiple experimental trials. Horizontal scaling is labor intensive when different tools are used for different experiment stages, such as for hyper-parameter optimization, distributed execution, or the consolidation of artifacts. We identify that errors in earlier stages of experimentation propagate to the analysis. Based on our observations, experimental results, and the current literature, we provide recommendations on best practices to prevent errors. To reduce the effort required to perform an accurate analysis and address common errors when scaling the execution of multiple experiments, we introduce ABLATOR. Our framework uses a stateful experiment design paradigm that provides experiment persistence and is robust to errors. Our actionable analysis artifacts are automatically produced by the experiment state and reduce the time to evaluate a hypothesis. We evaluate ABLATOR with ablation studies on a Transformer model, ‘Tablator’, where we study the effect of 6 architectural components, 8 model hyperparameters, 3 training hyperparameters, and 4 dataset preprocessing methodologies on 11 tabular datasets. We performed the largest ablation experiment for tabular data on Transformer models to date, evaluating 2,337 models in total. Finally, we open source ABLATOR; \url{https://github.com/fostiropoulos/ablator}} }
Endnote
%0 Conference Paper %T ABLATOR: Robust Horizontal-Scaling of Machine Learning Ablation Experiments %A Iordanis Fostiropoulos %A Laurent Itti %B Proceedings of the Second International Conference on Automated Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Aleksandra Faust %E Roman Garnett %E Colin White %E Frank Hutter %E Jacob R. Gardner %F pmlr-v224-fostiropoulos23a %I PMLR %P 19/1--15 %U https://proceedings.mlr.press/v224/fostiropoulos23a.html %V 224 %X Understanding the efficacy of a method requires ablation experiments. Current Machine Learning (ML) workflows emphasize the vertical scaling of large models with paradigms such as ‘data-parallelism’ or ‘model-parallelism’. As a consequence, there is a lack of methods for horizontal scaling of multiple experimental trials. Horizontal scaling is labor intensive when different tools are used for different experiment stages, such as for hyper-parameter optimization, distributed execution, or the consolidation of artifacts. We identify that errors in earlier stages of experimentation propagate to the analysis. Based on our observations, experimental results, and the current literature, we provide recommendations on best practices to prevent errors. To reduce the effort required to perform an accurate analysis and address common errors when scaling the execution of multiple experiments, we introduce ABLATOR. Our framework uses a stateful experiment design paradigm that provides experiment persistence and is robust to errors. Our actionable analysis artifacts are automatically produced by the experiment state and reduce the time to evaluate a hypothesis. We evaluate ABLATOR with ablation studies on a Transformer model, ‘Tablator’, where we study the effect of 6 architectural components, 8 model hyperparameters, 3 training hyperparameters, and 4 dataset preprocessing methodologies on 11 tabular datasets. We performed the largest ablation experiment for tabular data on Transformer models to date, evaluating 2,337 models in total. Finally, we open source ABLATOR; \url{https://github.com/fostiropoulos/ablator}
APA
Fostiropoulos, I. & Itti, L.. (2023). ABLATOR: Robust Horizontal-Scaling of Machine Learning Ablation Experiments. Proceedings of the Second International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 224:19/1-15 Available from https://proceedings.mlr.press/v224/fostiropoulos23a.html.

Related Material