Input Warping for Bayesian Optimization of Non-Stationary Functions

Jasper Snoek; Kevin Swersky; Rich Zemel; Ryan Adams

Input Warping for Bayesian Optimization of Non-Stationary Functions

Jasper Snoek, Kevin Swersky, Rich Zemel, Ryan Adams

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1674-1682, 2014.

Abstract

Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions. The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization. Although Gaussian processes provide a flexible prior over functions, there are various classes of functions that remain difficult to model. One of the most frequently occurring of these is the class of non-stationary functions. The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in "log-space", to mitigate the effects of spatially-varying length scale. We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function. We further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the state-of-the-art, producing better results faster and more reliably.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-snoek14,
  title = 	 {Input Warping for Bayesian Optimization of Non-Stationary Functions},
  author = 	 {Snoek, Jasper and Swersky, Kevin and Zemel, Rich and Adams, Ryan},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {1674--1682},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/snoek14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/snoek14.html},
  abstract = 	 {Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions.  The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization.  Although Gaussian processes provide a flexible prior over functions, there are various classes of functions that remain difficult to model.  One of the most frequently occurring of these is the class of non-stationary functions.  The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in "log-space", to mitigate the effects of spatially-varying length scale.  We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function.  We further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the state-of-the-art, producing better results faster and more reliably.}
}

Endnote

%0 Conference Paper
%T Input Warping for Bayesian Optimization of Non-Stationary Functions
%A Jasper Snoek
%A Kevin Swersky
%A Rich Zemel
%A Ryan Adams
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-snoek14
%I PMLR
%P 1674--1682
%U https://proceedings.mlr.press/v32/snoek14.html
%V 32
%N 2
%X Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions.  The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization.  Although Gaussian processes provide a flexible prior over functions, there are various classes of functions that remain difficult to model.  One of the most frequently occurring of these is the class of non-stationary functions.  The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in "log-space", to mitigate the effects of spatially-varying length scale.  We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function.  We further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the state-of-the-art, producing better results faster and more reliably.

RIS


TY  - CPAPER
TI  - Input Warping for Bayesian Optimization of Non-Stationary Functions
AU  - Jasper Snoek
AU  - Kevin Swersky
AU  - Rich Zemel
AU  - Ryan Adams
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-snoek14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 1674
EP  - 1682
L1  - http://proceedings.mlr.press/v32/snoek14.pdf
UR  - https://proceedings.mlr.press/v32/snoek14.html
AB  - Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions.  The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization.  Although Gaussian processes provide a flexible prior over functions, there are various classes of functions that remain difficult to model.  One of the most frequently occurring of these is the class of non-stationary functions.  The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in "log-space", to mitigate the effects of spatially-varying length scale.  We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function.  We further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the state-of-the-art, producing better results faster and more reliably.
ER  -

APA


Snoek, J., Swersky, K., Zemel, R. & Adams, R.. (2014). Input Warping for Bayesian Optimization of Non-Stationary Functions. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1674-1682 Available from https://proceedings.mlr.press/v32/snoek14.html.

Related Material

Download PDF