An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

Kihyuk Hong, Yuhang Li, Ambuj Tewari
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:3048-3085, 2023.

Abstract

We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and exploitation. It adapts to non-stationarity by restarting when a change in the reward function is detected. Our algorithm enjoys a tighter dynamic regret bound than previous work on non- stationary kernel bandits. Moreover, when applied to the non-stationary linear bandits by us- ing a linear kernel, our algorithm is nearly minimax optimal, solving an open problem in the non-stationary linear bandit literature. We extend our algorithm to use a neural network for dynamically adapting the feature mapping to observed data. We prove a dynamic regret bound of the extension using the neural tangent kernel theory. We demonstrate empirically that our algorithm and the extension can adapt to varying degrees of non-stationarity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-hong23b, title = {An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge}, author = {Hong, Kihyuk and Li, Yuhang and Tewari, Ambuj}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {3048--3085}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/hong23b/hong23b.pdf}, url = {https://proceedings.mlr.press/v206/hong23b.html}, abstract = {We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and exploitation. It adapts to non-stationarity by restarting when a change in the reward function is detected. Our algorithm enjoys a tighter dynamic regret bound than previous work on non- stationary kernel bandits. Moreover, when applied to the non-stationary linear bandits by us- ing a linear kernel, our algorithm is nearly minimax optimal, solving an open problem in the non-stationary linear bandit literature. We extend our algorithm to use a neural network for dynamically adapting the feature mapping to observed data. We prove a dynamic regret bound of the extension using the neural tangent kernel theory. We demonstrate empirically that our algorithm and the extension can adapt to varying degrees of non-stationarity.} }
Endnote
%0 Conference Paper %T An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge %A Kihyuk Hong %A Yuhang Li %A Ambuj Tewari %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-hong23b %I PMLR %P 3048--3085 %U https://proceedings.mlr.press/v206/hong23b.html %V 206 %X We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and exploitation. It adapts to non-stationarity by restarting when a change in the reward function is detected. Our algorithm enjoys a tighter dynamic regret bound than previous work on non- stationary kernel bandits. Moreover, when applied to the non-stationary linear bandits by us- ing a linear kernel, our algorithm is nearly minimax optimal, solving an open problem in the non-stationary linear bandit literature. We extend our algorithm to use a neural network for dynamically adapting the feature mapping to observed data. We prove a dynamic regret bound of the extension using the neural tangent kernel theory. We demonstrate empirically that our algorithm and the extension can adapt to varying degrees of non-stationarity.
APA
Hong, K., Li, Y. & Tewari, A.. (2023). An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:3048-3085 Available from https://proceedings.mlr.press/v206/hong23b.html.

Related Material