Lenient Regret and Good-Action Identification in Gaussian Process Bandits

Xu Cai, Selwyn Gomes, Jonathan Scarlett
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:1183-1192, 2021.

Abstract

In this paper, we study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is “good enough”. On the theoretical side, we study various {\em lenient regret} notions in which all near-optimal actions incur zero penalty, and provide upper bounds on the lenient regret for GP-UCB and an elimination algorithm, circumventing the usual O(T) term (with time horizon T) resulting from zooming extremely close towards the function maximum. In addition, we complement these upper bounds with algorithm-independent lower bounds. On the practical side, we consider the problem of finding a single “good action” according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold. We experimentally find that such algorithms can typically find a good action faster than standard optimization-based approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-cai21c, title = {Lenient Regret and Good-Action Identification in Gaussian Process Bandits}, author = {Cai, Xu and Gomes, Selwyn and Scarlett, Jonathan}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {1183--1192}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/cai21c/cai21c.pdf}, url = {https://proceedings.mlr.press/v139/cai21c.html}, abstract = {In this paper, we study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is “good enough”. On the theoretical side, we study various {\em lenient regret} notions in which all near-optimal actions incur zero penalty, and provide upper bounds on the lenient regret for GP-UCB and an elimination algorithm, circumventing the usual $O(\sqrt{T})$ term (with time horizon $T$) resulting from zooming extremely close towards the function maximum. In addition, we complement these upper bounds with algorithm-independent lower bounds. On the practical side, we consider the problem of finding a single “good action” according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold. We experimentally find that such algorithms can typically find a good action faster than standard optimization-based approaches.} }
Endnote
%0 Conference Paper %T Lenient Regret and Good-Action Identification in Gaussian Process Bandits %A Xu Cai %A Selwyn Gomes %A Jonathan Scarlett %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-cai21c %I PMLR %P 1183--1192 %U https://proceedings.mlr.press/v139/cai21c.html %V 139 %X In this paper, we study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is “good enough”. On the theoretical side, we study various {\em lenient regret} notions in which all near-optimal actions incur zero penalty, and provide upper bounds on the lenient regret for GP-UCB and an elimination algorithm, circumventing the usual $O(\sqrt{T})$ term (with time horizon $T$) resulting from zooming extremely close towards the function maximum. In addition, we complement these upper bounds with algorithm-independent lower bounds. On the practical side, we consider the problem of finding a single “good action” according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold. We experimentally find that such algorithms can typically find a good action faster than standard optimization-based approaches.
APA
Cai, X., Gomes, S. & Scarlett, J.. (2021). Lenient Regret and Good-Action Identification in Gaussian Process Bandits. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:1183-1192 Available from https://proceedings.mlr.press/v139/cai21c.html.

Related Material