Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning

Jian Wu, Saul Toscano-Palmerin, Peter I. Frazier, Andrew Gordon Wilson
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:788-798, 2020.

Abstract

Bayesian optimization is popular for optimizing time-consuming black-box objectives. Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck. Multi-fidelity optimization promises relief using cheaper proxies to such objectives — for example, validation error for a network trained using a subset of the training points or fewer iterations than required for convergence. We propose a highly flexible and practical approach to multi-fidelity Bayesian optimization, focused on efficiently optimizing hyperparameters for iteratively trained supervised learning models. We introduce a new acquisition function, the trace-aware knowledge-gradient, which efficiently leverages both multiple continuous fidelity controls and trace observations — values of the objective at a sequence of fidelities, available when varying fidelity using training iterations. We provide a provably convergent method for optimizing our acquisition function and show it outperforms state-of-the-art alternatives for hyperparameter tuning of deep neural networks and large-scale kernel learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v115-wu20a, title = {Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning}, author = {Wu, Jian and Toscano-Palmerin, Saul and Frazier, Peter I. and Wilson, Andrew Gordon}, booktitle = {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference}, pages = {788--798}, year = {2020}, editor = {Adams, Ryan P. and Gogate, Vibhav}, volume = {115}, series = {Proceedings of Machine Learning Research}, month = {22--25 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v115/wu20a/wu20a.pdf}, url = {https://proceedings.mlr.press/v115/wu20a.html}, abstract = {Bayesian optimization is popular for optimizing time-consuming black-box objectives. Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck. Multi-fidelity optimization promises relief using cheaper proxies to such objectives — for example, validation error for a network trained using a subset of the training points or fewer iterations than required for convergence. We propose a highly flexible and practical approach to multi-fidelity Bayesian optimization, focused on efficiently optimizing hyperparameters for iteratively trained supervised learning models. We introduce a new acquisition function, the trace-aware knowledge-gradient, which efficiently leverages both multiple continuous fidelity controls and trace observations — values of the objective at a sequence of fidelities, available when varying fidelity using training iterations. We provide a provably convergent method for optimizing our acquisition function and show it outperforms state-of-the-art alternatives for hyperparameter tuning of deep neural networks and large-scale kernel learning.} }
Endnote
%0 Conference Paper %T Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning %A Jian Wu %A Saul Toscano-Palmerin %A Peter I. Frazier %A Andrew Gordon Wilson %B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference %C Proceedings of Machine Learning Research %D 2020 %E Ryan P. Adams %E Vibhav Gogate %F pmlr-v115-wu20a %I PMLR %P 788--798 %U https://proceedings.mlr.press/v115/wu20a.html %V 115 %X Bayesian optimization is popular for optimizing time-consuming black-box objectives. Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck. Multi-fidelity optimization promises relief using cheaper proxies to such objectives — for example, validation error for a network trained using a subset of the training points or fewer iterations than required for convergence. We propose a highly flexible and practical approach to multi-fidelity Bayesian optimization, focused on efficiently optimizing hyperparameters for iteratively trained supervised learning models. We introduce a new acquisition function, the trace-aware knowledge-gradient, which efficiently leverages both multiple continuous fidelity controls and trace observations — values of the objective at a sequence of fidelities, available when varying fidelity using training iterations. We provide a provably convergent method for optimizing our acquisition function and show it outperforms state-of-the-art alternatives for hyperparameter tuning of deep neural networks and large-scale kernel learning.
APA
Wu, J., Toscano-Palmerin, S., Frazier, P.I. & Wilson, A.G.. (2020). Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:788-798 Available from https://proceedings.mlr.press/v115/wu20a.html.

Related Material