On Information Gain and Regret Bounds in Gaussian Process Bandits

Sattar Vakili, Kia Khezeli, Victor Picheny
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:82-90, 2021.

Abstract

Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function f from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when f is a sample from a Gaussian process (GP)) and a frequentist (when f lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on the maximal information gain γT between T observations and the underlying GP (surrogate) model. We provide general bounds on γT based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels improves the existing bounds on γT, and subsequently the regret bounds relying on γT under numerous settings. For the Matérn family of kernels, where the lower bounds on γT, and regret under the frequentist setting, are known, our results close a huge polynomial in T gap between the upper and lower bounds (up to logarithmic in T factors).

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-vakili21a, title = { On Information Gain and Regret Bounds in Gaussian Process Bandits }, author = {Vakili, Sattar and Khezeli, Kia and Picheny, Victor}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {82--90}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/vakili21a/vakili21a.pdf}, url = {https://proceedings.mlr.press/v130/vakili21a.html}, abstract = { Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function $f$ from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when $f$ is a sample from a Gaussian process (GP)) and a frequentist (when $f$ lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on the maximal information gain $\gamma_T$ between $T$ observations and the underlying GP (surrogate) model. We provide general bounds on $\gamma_T$ based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels improves the existing bounds on $\gamma_T$, and subsequently the regret bounds relying on $\gamma_T$ under numerous settings. For the Matérn family of kernels, where the lower bounds on $\gamma_T$, and regret under the frequentist setting, are known, our results close a huge polynomial in $T$ gap between the upper and lower bounds (up to logarithmic in $T$ factors). } }
Endnote
%0 Conference Paper %T On Information Gain and Regret Bounds in Gaussian Process Bandits %A Sattar Vakili %A Kia Khezeli %A Victor Picheny %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-vakili21a %I PMLR %P 82--90 %U https://proceedings.mlr.press/v130/vakili21a.html %V 130 %X Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function $f$ from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when $f$ is a sample from a Gaussian process (GP)) and a frequentist (when $f$ lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on the maximal information gain $\gamma_T$ between $T$ observations and the underlying GP (surrogate) model. We provide general bounds on $\gamma_T$ based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels improves the existing bounds on $\gamma_T$, and subsequently the regret bounds relying on $\gamma_T$ under numerous settings. For the Matérn family of kernels, where the lower bounds on $\gamma_T$, and regret under the frequentist setting, are known, our results close a huge polynomial in $T$ gap between the upper and lower bounds (up to logarithmic in $T$ factors).
APA
Vakili, S., Khezeli, K. & Picheny, V.. (2021). On Information Gain and Regret Bounds in Gaussian Process Bandits . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:82-90 Available from https://proceedings.mlr.press/v130/vakili21a.html.

Related Material