Regret Bounds for Gaussian Process Bandit Problems

Steffen Grünewälder; Jean–Yves Audibert; Manfred Opper; John Shawe–Taylor

Regret Bounds for Gaussian Process Bandit Problems

Steffen Grünewälder, Jean–Yves Audibert, Manfred Opper, John Shawe–Taylor

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:273-280, 2010.

Abstract

Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.

Cite this Paper

BibTeX


@InProceedings{pmlr-v9-grunewalder10a,
  title = 	 {Regret Bounds for Gaussian Process Bandit Problems},
  author = 	 {Grünewälder, Steffen and Audibert, Jean–Yves and Opper, Manfred and Shawe–Taylor, John},
  booktitle = 	 {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {273--280},
  year = 	 {2010},
  editor = 	 {Teh, Yee Whye and Titterington, Mike},
  volume = 	 {9},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Chia Laguna Resort, Sardinia, Italy},
  month = 	 {13--15 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v9/grunewalder10a/grunewalder10a.pdf},
  url = 	 {https://proceedings.mlr.press/v9/grunewalder10a.html},
  abstract = 	 {Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.}
}

Endnote

%0 Conference Paper
%T Regret Bounds for Gaussian Process Bandit Problems
%A Steffen Grünewälder
%A Jean–Yves Audibert
%A Manfred Opper
%A John Shawe–Taylor
%B Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2010
%E Yee Whye Teh
%E Mike Titterington	
%F pmlr-v9-grunewalder10a
%I PMLR
%P 273--280
%U https://proceedings.mlr.press/v9/grunewalder10a.html
%V 9
%X Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.

RIS


TY  - CPAPER
TI  - Regret Bounds for Gaussian Process Bandit Problems
AU  - Steffen Grünewälder
AU  - Jean–Yves Audibert
AU  - Manfred Opper
AU  - John Shawe–Taylor
BT  - Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
DA  - 2010/03/31
ED  - Yee Whye Teh
ED  - Mike Titterington	
ID  - pmlr-v9-grunewalder10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 9
SP  - 273
EP  - 280
L1  - http://proceedings.mlr.press/v9/grunewalder10a/grunewalder10a.pdf
UR  - https://proceedings.mlr.press/v9/grunewalder10a.html
AB  - Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.
ER  -

APA


Grünewälder, S., Audibert, J., Opper, M. & Shawe–Taylor, J.. (2010). Regret Bounds for Gaussian Process Bandit Problems. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 9:273-280 Available from https://proceedings.mlr.press/v9/grunewalder10a.html.

Related Material

Download PDF