Safe Exploration for Optimization with Gaussian Processes

Yanan Sui, Alkis Gotovos, Joel Burdick, Andreas Krause
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:997-1005, 2015.

Abstract

We consider sequential decision problems under uncertainty, where we seek to optimize an unknown function from noisy samples. This requires balancing exploration (learning about the objective) and exploitation (localizing the maximum), a problem well-studied in the multi-armed bandit literature. In many applications, however, we require that the sampled function values exceed some prespecified "safety" threshold, a requirement that existing algorithms fail to meet. Examples include medical applications where patient comfort must be guaranteed, recommender systems aiming to avoid user dissatisfaction, and robotic control, where one seeks to avoid controls causing physical harm to the platform. We tackle this novel, yet rich, set of problems under the assumption that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop an efficient algorithm called SafeOpt, and theoretically guarantee its convergence to a natural notion of optimum reachable under safety constraints. We evaluate SafeOpt on synthetic data, as well as two real applications: movie recommendation, and therapeutic spinal cord stimulation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-sui15, title = {Safe Exploration for Optimization with Gaussian Processes}, author = {Sui, Yanan and Gotovos, Alkis and Burdick, Joel and Krause, Andreas}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {997--1005}, year = {2015}, editor = {Bach, Francis and Blei, David}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/sui15.pdf}, url = { http://proceedings.mlr.press/v37/sui15.html }, abstract = {We consider sequential decision problems under uncertainty, where we seek to optimize an unknown function from noisy samples. This requires balancing exploration (learning about the objective) and exploitation (localizing the maximum), a problem well-studied in the multi-armed bandit literature. In many applications, however, we require that the sampled function values exceed some prespecified "safety" threshold, a requirement that existing algorithms fail to meet. Examples include medical applications where patient comfort must be guaranteed, recommender systems aiming to avoid user dissatisfaction, and robotic control, where one seeks to avoid controls causing physical harm to the platform. We tackle this novel, yet rich, set of problems under the assumption that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop an efficient algorithm called SafeOpt, and theoretically guarantee its convergence to a natural notion of optimum reachable under safety constraints. We evaluate SafeOpt on synthetic data, as well as two real applications: movie recommendation, and therapeutic spinal cord stimulation.} }
Endnote
%0 Conference Paper %T Safe Exploration for Optimization with Gaussian Processes %A Yanan Sui %A Alkis Gotovos %A Joel Burdick %A Andreas Krause %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-sui15 %I PMLR %P 997--1005 %U http://proceedings.mlr.press/v37/sui15.html %V 37 %X We consider sequential decision problems under uncertainty, where we seek to optimize an unknown function from noisy samples. This requires balancing exploration (learning about the objective) and exploitation (localizing the maximum), a problem well-studied in the multi-armed bandit literature. In many applications, however, we require that the sampled function values exceed some prespecified "safety" threshold, a requirement that existing algorithms fail to meet. Examples include medical applications where patient comfort must be guaranteed, recommender systems aiming to avoid user dissatisfaction, and robotic control, where one seeks to avoid controls causing physical harm to the platform. We tackle this novel, yet rich, set of problems under the assumption that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop an efficient algorithm called SafeOpt, and theoretically guarantee its convergence to a natural notion of optimum reachable under safety constraints. We evaluate SafeOpt on synthetic data, as well as two real applications: movie recommendation, and therapeutic spinal cord stimulation.
RIS
TY - CPAPER TI - Safe Exploration for Optimization with Gaussian Processes AU - Yanan Sui AU - Alkis Gotovos AU - Joel Burdick AU - Andreas Krause BT - Proceedings of the 32nd International Conference on Machine Learning DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-sui15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 37 SP - 997 EP - 1005 L1 - http://proceedings.mlr.press/v37/sui15.pdf UR - http://proceedings.mlr.press/v37/sui15.html AB - We consider sequential decision problems under uncertainty, where we seek to optimize an unknown function from noisy samples. This requires balancing exploration (learning about the objective) and exploitation (localizing the maximum), a problem well-studied in the multi-armed bandit literature. In many applications, however, we require that the sampled function values exceed some prespecified "safety" threshold, a requirement that existing algorithms fail to meet. Examples include medical applications where patient comfort must be guaranteed, recommender systems aiming to avoid user dissatisfaction, and robotic control, where one seeks to avoid controls causing physical harm to the platform. We tackle this novel, yet rich, set of problems under the assumption that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop an efficient algorithm called SafeOpt, and theoretically guarantee its convergence to a natural notion of optimum reachable under safety constraints. We evaluate SafeOpt on synthetic data, as well as two real applications: movie recommendation, and therapeutic spinal cord stimulation. ER -
APA
Sui, Y., Gotovos, A., Burdick, J. & Krause, A.. (2015). Safe Exploration for Optimization with Gaussian Processes. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:997-1005 Available from http://proceedings.mlr.press/v37/sui15.html .

Related Material