Regret Bounds for Gaussian Process Bandit Problems

Steffen Grünewälder, Jean–Yves Audibert, Manfred Opper, John Shawe–Taylor
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:273-280, 2010.

Abstract

Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v9-grunewalder10a, title = {Regret Bounds for Gaussian Process Bandit Problems}, author = {Grünewälder, Steffen and Audibert, Jean–Yves and Opper, Manfred and Shawe–Taylor, John}, booktitle = {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics}, pages = {273--280}, year = {2010}, editor = {Teh, Yee Whye and Titterington, Mike}, volume = {9}, series = {Proceedings of Machine Learning Research}, address = {Chia Laguna Resort, Sardinia, Italy}, month = {13--15 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v9/grunewalder10a/grunewalder10a.pdf}, url = {https://proceedings.mlr.press/v9/grunewalder10a.html}, abstract = {Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.} }
Endnote
%0 Conference Paper %T Regret Bounds for Gaussian Process Bandit Problems %A Steffen Grünewälder %A Jean–Yves Audibert %A Manfred Opper %A John Shawe–Taylor %B Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2010 %E Yee Whye Teh %E Mike Titterington %F pmlr-v9-grunewalder10a %I PMLR %P 273--280 %U https://proceedings.mlr.press/v9/grunewalder10a.html %V 9 %X Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.
RIS
TY - CPAPER TI - Regret Bounds for Gaussian Process Bandit Problems AU - Steffen Grünewälder AU - Jean–Yves Audibert AU - Manfred Opper AU - John Shawe–Taylor BT - Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics DA - 2010/03/31 ED - Yee Whye Teh ED - Mike Titterington ID - pmlr-v9-grunewalder10a PB - PMLR DP - Proceedings of Machine Learning Research VL - 9 SP - 273 EP - 280 L1 - http://proceedings.mlr.press/v9/grunewalder10a/grunewalder10a.pdf UR - https://proceedings.mlr.press/v9/grunewalder10a.html AB - Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds. ER -
APA
Grünewälder, S., Audibert, J., Opper, M. & Shawe–Taylor, J.. (2010). Regret Bounds for Gaussian Process Bandit Problems. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 9:273-280 Available from https://proceedings.mlr.press/v9/grunewalder10a.html.

Related Material