Optimal Resource Allocation for Early Stopping-based Neural Architecture Search Methods

Marcel Aach, Eray Inanc, Rakesh Sarma, Morris Riedel, Andreas Lintermann
Proceedings of the Second International Conference on Automated Machine Learning, PMLR 224:12/1-17, 2023.

Abstract

The field of NAS has been significantly benefiting from the increased availability of parallel compute resources, as optimization algorithms typically require sampling and evaluating hundreds of model configurations. Consequently, to make use of these resources, the most commonly used early stopping-based NAS methods are suitable for running multiple trials in parallel. At the same time, also the training time of single model configurations can be reduced, e.g., by employing data-parallel training using multiple GPUs. This paper investigates the optimal allocation of a fixed amount of parallel workers for conducting NAS. In practice, users have to decide if the computational resources are primarily used to assign more workers to the training of individual trials or to increase the number of trials executed in parallel. The first option accelerates the speed of the individual trials (exploitation) but reduces the parallelism of the NAS loop, whereas for the second option, the runtime of the trials is longer but a larger number of simultaneously processed trials in the NAS loop is achieved (exploration). Our study encompasses both large- and small-scale scenarios, including tuning models in parallel on a single GPU, with data-parallel training on up to 16 GPUs, and measuring the scalability of NAS on up to 64 GPUs. Our empirical results using the HyperBand, Asynchronous Successive Halving, and Bayesian Optimization HyperBand methods offer valuable insights for users seeking to run NAS on both small and large computational budgets. By selecting the appropriate number of parallel evaluations, the NAS process can be accelerated by factors of ${\approx}$2–5 while preserving the test set accuracy compared to non-optimal resource allocations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v224-aach23a, title = {Optimal Resource Allocation for Early Stopping-based Neural Architecture Search Methods}, author = {Aach, Marcel and Inanc, Eray and Sarma, Rakesh and Riedel, Morris and Lintermann, Andreas}, booktitle = {Proceedings of the Second International Conference on Automated Machine Learning}, pages = {12/1--17}, year = {2023}, editor = {Faust, Aleksandra and Garnett, Roman and White, Colin and Hutter, Frank and Gardner, Jacob R.}, volume = {224}, series = {Proceedings of Machine Learning Research}, month = {12--15 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v224/aach23a/aach23a.pdf}, url = {https://proceedings.mlr.press/v224/aach23a.html}, abstract = {The field of NAS has been significantly benefiting from the increased availability of parallel compute resources, as optimization algorithms typically require sampling and evaluating hundreds of model configurations. Consequently, to make use of these resources, the most commonly used early stopping-based NAS methods are suitable for running multiple trials in parallel. At the same time, also the training time of single model configurations can be reduced, e.g., by employing data-parallel training using multiple GPUs. This paper investigates the optimal allocation of a fixed amount of parallel workers for conducting NAS. In practice, users have to decide if the computational resources are primarily used to assign more workers to the training of individual trials or to increase the number of trials executed in parallel. The first option accelerates the speed of the individual trials (exploitation) but reduces the parallelism of the NAS loop, whereas for the second option, the runtime of the trials is longer but a larger number of simultaneously processed trials in the NAS loop is achieved (exploration). Our study encompasses both large- and small-scale scenarios, including tuning models in parallel on a single GPU, with data-parallel training on up to 16 GPUs, and measuring the scalability of NAS on up to 64 GPUs. Our empirical results using the HyperBand, Asynchronous Successive Halving, and Bayesian Optimization HyperBand methods offer valuable insights for users seeking to run NAS on both small and large computational budgets. By selecting the appropriate number of parallel evaluations, the NAS process can be accelerated by factors of ${\approx}$2–5 while preserving the test set accuracy compared to non-optimal resource allocations.} }
Endnote
%0 Conference Paper %T Optimal Resource Allocation for Early Stopping-based Neural Architecture Search Methods %A Marcel Aach %A Eray Inanc %A Rakesh Sarma %A Morris Riedel %A Andreas Lintermann %B Proceedings of the Second International Conference on Automated Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Aleksandra Faust %E Roman Garnett %E Colin White %E Frank Hutter %E Jacob R. Gardner %F pmlr-v224-aach23a %I PMLR %P 12/1--17 %U https://proceedings.mlr.press/v224/aach23a.html %V 224 %X The field of NAS has been significantly benefiting from the increased availability of parallel compute resources, as optimization algorithms typically require sampling and evaluating hundreds of model configurations. Consequently, to make use of these resources, the most commonly used early stopping-based NAS methods are suitable for running multiple trials in parallel. At the same time, also the training time of single model configurations can be reduced, e.g., by employing data-parallel training using multiple GPUs. This paper investigates the optimal allocation of a fixed amount of parallel workers for conducting NAS. In practice, users have to decide if the computational resources are primarily used to assign more workers to the training of individual trials or to increase the number of trials executed in parallel. The first option accelerates the speed of the individual trials (exploitation) but reduces the parallelism of the NAS loop, whereas for the second option, the runtime of the trials is longer but a larger number of simultaneously processed trials in the NAS loop is achieved (exploration). Our study encompasses both large- and small-scale scenarios, including tuning models in parallel on a single GPU, with data-parallel training on up to 16 GPUs, and measuring the scalability of NAS on up to 64 GPUs. Our empirical results using the HyperBand, Asynchronous Successive Halving, and Bayesian Optimization HyperBand methods offer valuable insights for users seeking to run NAS on both small and large computational budgets. By selecting the appropriate number of parallel evaluations, the NAS process can be accelerated by factors of ${\approx}$2–5 while preserving the test set accuracy compared to non-optimal resource allocations.
APA
Aach, M., Inanc, E., Sarma, R., Riedel, M. & Lintermann, A.. (2023). Optimal Resource Allocation for Early Stopping-based Neural Architecture Search Methods. Proceedings of the Second International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 224:12/1-17 Available from https://proceedings.mlr.press/v224/aach23a.html.

Related Material