[edit]
Quantum Kernelized Bandits
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:1640-1657, 2024.
Abstract
We consider the quantum kernelized bandit problem, where the player observes information of rewards through quantum circuits termed the quantum reward oracle, and the mean reward function belongs to a reproducing kernel Hilbert space (RKHS). We propose a UCB-type algorithm that utilizes the quantum Monte Carlo (QMC) method and provide regret bounds in terms of the decay rate of eigenvalues of the Mercer operator of the kernel. Our algorithm achieves $\widetilde{O}\left( T^{\frac{3}{1 + \beta_p}} \log\left(\frac{1}{\delta} \right)\right)$ and $\widetilde{O} \left( \log^{3(1 + \beta_e^{-1})/2} (T) \log\left(\frac{1 }{\delta} \right) \right)$ cumulative regret bounds with probability at least $1-\delta$ if the kernel has a $\beta_p$-polynomial eigendecay and $\beta_e$-exponential eigendecay, respectively. In particular, in the case of the exponential eigendecay, our regret bounds exponentially improve that of classical algorithms. Moreover, our results indicate that our regret bound is better than the lower bound in the classical kernelized bandit problem if the rate of decay is sufficiently fast.