Regret Bound for Safe Gaussian Process Bandit Optimization

Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis
Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:158-159, 2020.

Abstract

Many applications require a learner to make sequential decisions given uncertainty regarding both the system’s payoff function and safety constraints. When learning algorithms are used in safety-critical systems, it is paramount that the learner’s actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the system’s unknown payoff and constraint functions are sampled from Gaussian Processes (GPs). We develop a safe variant of the proposed algorithm by Srinivas et al. (2010), GP-UCB, called SGP-UCB, with necessary modifications to respect safety constraints at every round. Our most important contribution is to derive the first sub-linear regret bounds for this problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v120-amani20a, title = {Regret Bound for Safe Gaussian Process Bandit Optimization}, author = {Amani, Sanae and Alizadeh, Mahnoosh and Thrampoulidis, Christos}, booktitle = {Proceedings of the 2nd Conference on Learning for Dynamics and Control}, pages = {158--159}, year = {2020}, editor = {Bayen, Alexandre M. and Jadbabaie, Ali and Pappas, George and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire and Zeilinger, Melanie}, volume = {120}, series = {Proceedings of Machine Learning Research}, month = {10--11 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v120/amani20a/amani20a.pdf}, url = {https://proceedings.mlr.press/v120/amani20a.html}, abstract = {Many applications require a learner to make sequential decisions given uncertainty regarding both the system’s payoff function and safety constraints. When learning algorithms are used in safety-critical systems, it is paramount that the learner’s actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the system’s unknown payoff and constraint functions are sampled from Gaussian Processes (GPs). We develop a safe variant of the proposed algorithm by Srinivas et al. (2010), GP-UCB, called SGP-UCB, with necessary modifications to respect safety constraints at every round. Our most important contribution is to derive the first sub-linear regret bounds for this problem.} }
Endnote
%0 Conference Paper %T Regret Bound for Safe Gaussian Process Bandit Optimization %A Sanae Amani %A Mahnoosh Alizadeh %A Christos Thrampoulidis %B Proceedings of the 2nd Conference on Learning for Dynamics and Control %C Proceedings of Machine Learning Research %D 2020 %E Alexandre M. Bayen %E Ali Jadbabaie %E George Pappas %E Pablo A. Parrilo %E Benjamin Recht %E Claire Tomlin %E Melanie Zeilinger %F pmlr-v120-amani20a %I PMLR %P 158--159 %U https://proceedings.mlr.press/v120/amani20a.html %V 120 %X Many applications require a learner to make sequential decisions given uncertainty regarding both the system’s payoff function and safety constraints. When learning algorithms are used in safety-critical systems, it is paramount that the learner’s actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the system’s unknown payoff and constraint functions are sampled from Gaussian Processes (GPs). We develop a safe variant of the proposed algorithm by Srinivas et al. (2010), GP-UCB, called SGP-UCB, with necessary modifications to respect safety constraints at every round. Our most important contribution is to derive the first sub-linear regret bounds for this problem.
APA
Amani, S., Alizadeh, M. & Thrampoulidis, C.. (2020). Regret Bound for Safe Gaussian Process Bandit Optimization. Proceedings of the 2nd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 120:158-159 Available from https://proceedings.mlr.press/v120/amani20a.html.

Related Material