Input Warping for Bayesian Optimization of Non-Stationary Functions

Jasper Snoek, Kevin Swersky, Rich Zemel, Ryan Adams
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1674-1682, 2014.

Abstract

Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions. The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization. Although Gaussian processes provide a flexible prior over functions, there are various classes of functions that remain difficult to model. One of the most frequently occurring of these is the class of non-stationary functions. The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in "log-space", to mitigate the effects of spatially-varying length scale. We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function. We further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the state-of-the-art, producing better results faster and more reliably.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-snoek14, title = {Input Warping for Bayesian Optimization of Non-Stationary Functions}, author = {Snoek, Jasper and Swersky, Kevin and Zemel, Rich and Adams, Ryan}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {1674--1682}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/snoek14.pdf}, url = {https://proceedings.mlr.press/v32/snoek14.html}, abstract = {Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions. The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization. Although Gaussian processes provide a flexible prior over functions, there are various classes of functions that remain difficult to model. One of the most frequently occurring of these is the class of non-stationary functions. The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in "log-space", to mitigate the effects of spatially-varying length scale. We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function. We further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the state-of-the-art, producing better results faster and more reliably.} }
Endnote
%0 Conference Paper %T Input Warping for Bayesian Optimization of Non-Stationary Functions %A Jasper Snoek %A Kevin Swersky %A Rich Zemel %A Ryan Adams %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-snoek14 %I PMLR %P 1674--1682 %U https://proceedings.mlr.press/v32/snoek14.html %V 32 %N 2 %X Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions. The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization. Although Gaussian processes provide a flexible prior over functions, there are various classes of functions that remain difficult to model. One of the most frequently occurring of these is the class of non-stationary functions. The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in "log-space", to mitigate the effects of spatially-varying length scale. We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function. We further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the state-of-the-art, producing better results faster and more reliably.
RIS
TY - CPAPER TI - Input Warping for Bayesian Optimization of Non-Stationary Functions AU - Jasper Snoek AU - Kevin Swersky AU - Rich Zemel AU - Ryan Adams BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/06/18 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-snoek14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 2 SP - 1674 EP - 1682 L1 - http://proceedings.mlr.press/v32/snoek14.pdf UR - https://proceedings.mlr.press/v32/snoek14.html AB - Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions. The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization. Although Gaussian processes provide a flexible prior over functions, there are various classes of functions that remain difficult to model. One of the most frequently occurring of these is the class of non-stationary functions. The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in "log-space", to mitigate the effects of spatially-varying length scale. We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function. We further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the state-of-the-art, producing better results faster and more reliably. ER -
APA
Snoek, J., Swersky, K., Zemel, R. & Adams, R.. (2014). Input Warping for Bayesian Optimization of Non-Stationary Functions. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1674-1682 Available from https://proceedings.mlr.press/v32/snoek14.html.

Related Material