Weighted Gaussian Process Bandits for Non-stationary Environments

Yuntian Deng, Xingyu Zhou, Baekjin Kim, Ambuj Tewari, Abhishek Gupta, Ness Shroff
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:6909-6932, 2022.

Abstract

In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-deng22b, title = { Weighted Gaussian Process Bandits for Non-stationary Environments }, author = {Deng, Yuntian and Zhou, Xingyu and Kim, Baekjin and Tewari, Ambuj and Gupta, Abhishek and Shroff, Ness}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {6909--6932}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/deng22b/deng22b.pdf}, url = {https://proceedings.mlr.press/v151/deng22b.html}, abstract = { In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods. } }
Endnote
%0 Conference Paper %T Weighted Gaussian Process Bandits for Non-stationary Environments %A Yuntian Deng %A Xingyu Zhou %A Baekjin Kim %A Ambuj Tewari %A Abhishek Gupta %A Ness Shroff %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-deng22b %I PMLR %P 6909--6932 %U https://proceedings.mlr.press/v151/deng22b.html %V 151 %X In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.
APA
Deng, Y., Zhou, X., Kim, B., Tewari, A., Gupta, A. & Shroff, N.. (2022). Weighted Gaussian Process Bandits for Non-stationary Environments . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:6909-6932 Available from https://proceedings.mlr.press/v151/deng22b.html.

Related Material