An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

Kihyuk Hong; Yuhang Li; Ambuj Tewari

An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

Kihyuk Hong, Yuhang Li, Ambuj Tewari

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:3048-3085, 2023.

Abstract

We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and exploitation. It adapts to non-stationarity by restarting when a change in the reward function is detected. Our algorithm enjoys a tighter dynamic regret bound than previous work on non- stationary kernel bandits. Moreover, when applied to the non-stationary linear bandits by us- ing a linear kernel, our algorithm is nearly minimax optimal, solving an open problem in the non-stationary linear bandit literature. We extend our algorithm to use a neural network for dynamically adapting the feature mapping to observed data. We prove a dynamic regret bound of the extension using the neural tangent kernel theory. We demonstrate empirically that our algorithm and the extension can adapt to varying degrees of non-stationarity.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-hong23b,
  title = 	 {An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge},
  author =       {Hong, Kihyuk and Li, Yuhang and Tewari, Ambuj},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3048--3085},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/hong23b/hong23b.pdf},
  url = 	 {https://proceedings.mlr.press/v206/hong23b.html},
  abstract = 	 {We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and exploitation. It adapts to non-stationarity by restarting when a change in the reward function is detected. Our algorithm enjoys a tighter dynamic regret bound than previous work on non- stationary kernel bandits. Moreover, when applied to the non-stationary linear bandits by us- ing a linear kernel, our algorithm is nearly minimax optimal, solving an open problem in the non-stationary linear bandit literature. We extend our algorithm to use a neural network for dynamically adapting the feature mapping to observed data. We prove a dynamic regret bound of the extension using the neural tangent kernel theory. We demonstrate empirically that our algorithm and the extension can adapt to varying degrees of non-stationarity.}
}

Endnote

%0 Conference Paper
%T An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge
%A Kihyuk Hong
%A Yuhang Li
%A Ambuj Tewari
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-hong23b
%I PMLR
%P 3048--3085
%U https://proceedings.mlr.press/v206/hong23b.html
%V 206
%X We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and exploitation. It adapts to non-stationarity by restarting when a change in the reward function is detected. Our algorithm enjoys a tighter dynamic regret bound than previous work on non- stationary kernel bandits. Moreover, when applied to the non-stationary linear bandits by us- ing a linear kernel, our algorithm is nearly minimax optimal, solving an open problem in the non-stationary linear bandit literature. We extend our algorithm to use a neural network for dynamically adapting the feature mapping to observed data. We prove a dynamic regret bound of the extension using the neural tangent kernel theory. We demonstrate empirically that our algorithm and the extension can adapt to varying degrees of non-stationarity.

APA


Hong, K., Li, Y. & Tewari, A.. (2023). An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:3048-3085 Available from https://proceedings.mlr.press/v206/hong23b.html.

An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

Abstract

Cite this Paper

Related Material