Delayed Feedback in Kernel Bandits

Sattar Vakili; Danyal Ahmed; Alberto Bernacchia; Ciara Pike-Burke

Delayed Feedback in Kernel Bandits

Sattar Vakili, Danyal Ahmed, Alberto Bernacchia, Ciara Pike-Burke

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:34779-34792, 2023.

Abstract

Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based bandit problem (also known as Bayesian optimisation), where a learner aims at optimising a kernelized function through sequential noisy observations. The existing work predominantly assumes feedback is immediately available; an assumption which fails in many real world situations, including recommendation systems, clinical trials and hyperparameter tuning. We consider a kernel bandit problem under stochastically delayed feedback, and propose an algorithm with

$\tilde{\mathcal{O}}\left(\sqrt{\Gamma_k(T) T}+\mathbb{E}[\tau]\right)$ regret, where

$T$ is the number of time steps,

$\Gamma_k(T)$ is the maximum information gain of the kernel with

$T$ observations, and

$\tau$ is the delay random variable. This represents a significant improvement over the state of the art regret bound of

$\tilde{\mathcal{O}}\left(\Gamma_k(T)\sqrt{ T}+\mathbb{E}[\tau]\Gamma_k(T)\right)$ reported in (Verma et al., 2022). In particular, for very non-smooth kernels, the information gain grows almost linearly in time, trivializing the existing results. We also validate our theoretical results with simulations.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-vakili23a,
  title = 	 {Delayed Feedback in Kernel Bandits},
  author =       {Vakili, Sattar and Ahmed, Danyal and Bernacchia, Alberto and Pike-Burke, Ciara},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {34779--34792},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/vakili23a/vakili23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/vakili23a.html},
  abstract = 	 {Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based bandit problem (also known as Bayesian optimisation), where a learner aims at optimising a kernelized function through sequential noisy observations. The existing work predominantly assumes feedback is immediately available; an assumption which fails in many real world situations, including recommendation systems, clinical trials and hyperparameter tuning. We consider a kernel bandit problem under stochastically delayed feedback, and propose an algorithm with $\tilde{\mathcal{O}}\left(\sqrt{\Gamma_k(T) T}+\mathbb{E}[\tau]\right)$ regret, where $T$ is the number of time steps, $\Gamma_k(T)$ is the maximum information gain of the kernel with $T$ observations, and $\tau$ is the delay random variable. This represents a significant improvement over the state of the art regret bound of $\tilde{\mathcal{O}}\left(\Gamma_k(T)\sqrt{ T}+\mathbb{E}[\tau]\Gamma_k(T)\right)$ reported in (Verma et al., 2022). In particular, for very non-smooth kernels, the information gain grows almost linearly in time, trivializing the existing results. We also validate our theoretical results with simulations.}
}

Endnote

%0 Conference Paper
%T Delayed Feedback in Kernel Bandits
%A Sattar Vakili
%A Danyal Ahmed
%A Alberto Bernacchia
%A Ciara Pike-Burke
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-vakili23a
%I PMLR
%P 34779--34792
%U https://proceedings.mlr.press/v202/vakili23a.html
%V 202
%X Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based bandit problem (also known as Bayesian optimisation), where a learner aims at optimising a kernelized function through sequential noisy observations. The existing work predominantly assumes feedback is immediately available; an assumption which fails in many real world situations, including recommendation systems, clinical trials and hyperparameter tuning. We consider a kernel bandit problem under stochastically delayed feedback, and propose an algorithm with $\tilde{\mathcal{O}}\left(\sqrt{\Gamma_k(T) T}+\mathbb{E}[\tau]\right)$ regret, where $T$ is the number of time steps, $\Gamma_k(T)$ is the maximum information gain of the kernel with $T$ observations, and $\tau$ is the delay random variable. This represents a significant improvement over the state of the art regret bound of $\tilde{\mathcal{O}}\left(\Gamma_k(T)\sqrt{ T}+\mathbb{E}[\tau]\Gamma_k(T)\right)$ reported in (Verma et al., 2022). In particular, for very non-smooth kernels, the information gain grows almost linearly in time, trivializing the existing results. We also validate our theoretical results with simulations.

APA


Vakili, S., Ahmed, D., Bernacchia, A. & Pike-Burke, C.. (2023). Delayed Feedback in Kernel Bandits. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:34779-34792 Available from https://proceedings.mlr.press/v202/vakili23a.html.

Delayed Feedback in Kernel Bandits

Abstract

Cite this Paper

Related Material