Frozen Layers: Memory-efficient Many-fidelity Hyperparameter Optimization

Timur Carstensen, Neeratyoy Mallik, Frank Hutter, Martin Rapp
Proceedings of the Fourth International Conference on Automated Machine Learning, PMLR 293:4/1-24, 2025.

Abstract

As model sizes grow, finding efficient and cost-effective hyperparameter optimization (HPO) methods becomes increasingly crucial for deep learning pipelines. While multi-fidelity HPO (MF-HPO) trades off computational resources required for DL training with lower fidelity estimations, existing fidelity sources often fail under lower compute and memory constraints. We propose a novel fidelity source: the number of layers that are trained or frozen during training. For deep networks, this approach offers significant compute and memory savings while preserving rank correlations between hyperparameters at low fidelities compared to full model training. We demonstrate this in our empirical evaluation across MLPs, ResNets, and Transformers and additionally analyze the utility of frozen layers as fidelity in using GPU resources as fidelity in HPO, and for a combined MF-HPO with other fidelity sources. This contribution opens new applications for MF-HPO with hardware resources as fidelity and creates opportunities for improved algorithms navigating joint fidelity spaces.

Cite this Paper


BibTeX
@InProceedings{pmlr-v293-carstensen25a, title = {Frozen Layers: Memory-efficient Many-fidelity Hyperparameter Optimization}, author = {Carstensen, Timur and Mallik, Neeratyoy and Hutter, Frank and Rapp, Martin}, booktitle = {Proceedings of the Fourth International Conference on Automated Machine Learning}, pages = {4/1--24}, year = {2025}, editor = {Akoglu, Leman and Doerr, Carola and van Rijn, Jan N. and Garnett, Roman and Gardner, Jacob R.}, volume = {293}, series = {Proceedings of Machine Learning Research}, month = {08--11 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v293/main/assets/carstensen25a/carstensen25a.pdf}, url = {https://proceedings.mlr.press/v293/carstensen25a.html}, abstract = {As model sizes grow, finding efficient and cost-effective hyperparameter optimization (HPO) methods becomes increasingly crucial for deep learning pipelines. While multi-fidelity HPO (MF-HPO) trades off computational resources required for DL training with lower fidelity estimations, existing fidelity sources often fail under lower compute and memory constraints. We propose a novel fidelity source: the number of layers that are trained or frozen during training. For deep networks, this approach offers significant compute and memory savings while preserving rank correlations between hyperparameters at low fidelities compared to full model training. We demonstrate this in our empirical evaluation across MLPs, ResNets, and Transformers and additionally analyze the utility of frozen layers as fidelity in using GPU resources as fidelity in HPO, and for a combined MF-HPO with other fidelity sources. This contribution opens new applications for MF-HPO with hardware resources as fidelity and creates opportunities for improved algorithms navigating joint fidelity spaces.} }
Endnote
%0 Conference Paper %T Frozen Layers: Memory-efficient Many-fidelity Hyperparameter Optimization %A Timur Carstensen %A Neeratyoy Mallik %A Frank Hutter %A Martin Rapp %B Proceedings of the Fourth International Conference on Automated Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Leman Akoglu %E Carola Doerr %E Jan N. van Rijn %E Roman Garnett %E Jacob R. Gardner %F pmlr-v293-carstensen25a %I PMLR %P 4/1--24 %U https://proceedings.mlr.press/v293/carstensen25a.html %V 293 %X As model sizes grow, finding efficient and cost-effective hyperparameter optimization (HPO) methods becomes increasingly crucial for deep learning pipelines. While multi-fidelity HPO (MF-HPO) trades off computational resources required for DL training with lower fidelity estimations, existing fidelity sources often fail under lower compute and memory constraints. We propose a novel fidelity source: the number of layers that are trained or frozen during training. For deep networks, this approach offers significant compute and memory savings while preserving rank correlations between hyperparameters at low fidelities compared to full model training. We demonstrate this in our empirical evaluation across MLPs, ResNets, and Transformers and additionally analyze the utility of frozen layers as fidelity in using GPU resources as fidelity in HPO, and for a combined MF-HPO with other fidelity sources. This contribution opens new applications for MF-HPO with hardware resources as fidelity and creates opportunities for improved algorithms navigating joint fidelity spaces.
APA
Carstensen, T., Mallik, N., Hutter, F. & Rapp, M.. (2025). Frozen Layers: Memory-efficient Many-fidelity Hyperparameter Optimization. Proceedings of the Fourth International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 293:4/1-24 Available from https://proceedings.mlr.press/v293/carstensen25a.html.

Related Material