Quantum Kernelized Bandits

Yasunari Hikima, Kazunori Murao, Sho Takemori, Yuhei Umeda
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:1640-1657, 2024.

Abstract

We consider the quantum kernelized bandit problem, where the player observes information of rewards through quantum circuits termed the quantum reward oracle, and the mean reward function belongs to a reproducing kernel Hilbert space (RKHS). We propose a UCB-type algorithm that utilizes the quantum Monte Carlo (QMC) method and provide regret bounds in terms of the decay rate of eigenvalues of the Mercer operator of the kernel. Our algorithm achieves $\widetilde{O}\left( T^{\frac{3}{1 + \beta_p}} \log\left(\frac{1}{\delta} \right)\right)$ and $\widetilde{O} \left( \log^{3(1 + \beta_e^{-1})/2} (T) \log\left(\frac{1 }{\delta} \right) \right)$ cumulative regret bounds with probability at least $1-\delta$ if the kernel has a $\beta_p$-polynomial eigendecay and $\beta_e$-exponential eigendecay, respectively. In particular, in the case of the exponential eigendecay, our regret bounds exponentially improve that of classical algorithms. Moreover, our results indicate that our regret bound is better than the lower bound in the classical kernelized bandit problem if the rate of decay is sufficiently fast.

Cite this Paper


BibTeX
@InProceedings{pmlr-v244-hikima24a, title = {Quantum Kernelized Bandits}, author = {Hikima, Yasunari and Murao, Kazunori and Takemori, Sho and Umeda, Yuhei}, booktitle = {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence}, pages = {1640--1657}, year = {2024}, editor = {Kiyavash, Negar and Mooij, Joris M.}, volume = {244}, series = {Proceedings of Machine Learning Research}, month = {15--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v244/main/assets/hikima24a/hikima24a.pdf}, url = {https://proceedings.mlr.press/v244/hikima24a.html}, abstract = {We consider the quantum kernelized bandit problem, where the player observes information of rewards through quantum circuits termed the quantum reward oracle, and the mean reward function belongs to a reproducing kernel Hilbert space (RKHS). We propose a UCB-type algorithm that utilizes the quantum Monte Carlo (QMC) method and provide regret bounds in terms of the decay rate of eigenvalues of the Mercer operator of the kernel. Our algorithm achieves $\widetilde{O}\left( T^{\frac{3}{1 + \beta_p}} \log\left(\frac{1}{\delta} \right)\right)$ and $\widetilde{O} \left( \log^{3(1 + \beta_e^{-1})/2} (T) \log\left(\frac{1 }{\delta} \right) \right)$ cumulative regret bounds with probability at least $1-\delta$ if the kernel has a $\beta_p$-polynomial eigendecay and $\beta_e$-exponential eigendecay, respectively. In particular, in the case of the exponential eigendecay, our regret bounds exponentially improve that of classical algorithms. Moreover, our results indicate that our regret bound is better than the lower bound in the classical kernelized bandit problem if the rate of decay is sufficiently fast.} }
Endnote
%0 Conference Paper %T Quantum Kernelized Bandits %A Yasunari Hikima %A Kazunori Murao %A Sho Takemori %A Yuhei Umeda %B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2024 %E Negar Kiyavash %E Joris M. Mooij %F pmlr-v244-hikima24a %I PMLR %P 1640--1657 %U https://proceedings.mlr.press/v244/hikima24a.html %V 244 %X We consider the quantum kernelized bandit problem, where the player observes information of rewards through quantum circuits termed the quantum reward oracle, and the mean reward function belongs to a reproducing kernel Hilbert space (RKHS). We propose a UCB-type algorithm that utilizes the quantum Monte Carlo (QMC) method and provide regret bounds in terms of the decay rate of eigenvalues of the Mercer operator of the kernel. Our algorithm achieves $\widetilde{O}\left( T^{\frac{3}{1 + \beta_p}} \log\left(\frac{1}{\delta} \right)\right)$ and $\widetilde{O} \left( \log^{3(1 + \beta_e^{-1})/2} (T) \log\left(\frac{1 }{\delta} \right) \right)$ cumulative regret bounds with probability at least $1-\delta$ if the kernel has a $\beta_p$-polynomial eigendecay and $\beta_e$-exponential eigendecay, respectively. In particular, in the case of the exponential eigendecay, our regret bounds exponentially improve that of classical algorithms. Moreover, our results indicate that our regret bound is better than the lower bound in the classical kernelized bandit problem if the rate of decay is sufficiently fast.
APA
Hikima, Y., Murao, K., Takemori, S. & Umeda, Y.. (2024). Quantum Kernelized Bandits. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:1640-1657 Available from https://proceedings.mlr.press/v244/hikima24a.html.

Related Material