Differentially Private Kernelized Contextual Bandits

Nikola Pavlovic; Sudeep Salgia; Qing Zhao

Differentially Private Kernelized Contextual Bandits

Nikola Pavlovic, Sudeep Salgia, Qing Zhao

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:4618-4626, 2025.

Abstract

We consider the problem of contextual kernel bandits with stochastic contexts, where the underlying reward function belongs to a known Reproducing Kernel Hilbert Space (RKHS). We study this problem under the additional constraint of joint differential privacy, where the agents needs to ensure that the sequence of query points is differentially private with respect to both the sequence of contexts and rewards. We propose a novel algorithm that improves upon the state of the art and achieves an error rate of $\mathcal{O}\left(\sqrt{\dfrac{\gamma_T}{T}} + \dfrac{\gamma_T}{T \varepsilon}\right)$ after $T$ queries for a large class of kernel families, where $\gamma_T$ represents the effective dimensionality of the kernel and $\varepsilon > 0$ is the privacy parameter. Our results are based on novel estimator for the reward function that simultaneously enjoys high utility along with a low-sensitivity to observed rewards and contexts, which is crucial to obtain an improved performance.

Cite this Paper

BibTeX

@InProceedings{pmlr-v258-pavlovic25a,
  title = 	 {Differentially Private Kernelized Contextual Bandits},
  author =       {Pavlovic, Nikola and Salgia, Sudeep and Zhao, Qing},
  booktitle = 	 {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {4618--4626},
  year = 	 {2025},
  editor = 	 {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz},
  volume = 	 {258},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--05 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v258/main/assets/pavlovic25a/pavlovic25a.pdf},
  url = 	 {https://proceedings.mlr.press/v258/pavlovic25a.html},
  abstract = 	 {We consider the problem of contextual kernel bandits with stochastic contexts, where the underlying reward function belongs to a known Reproducing Kernel Hilbert Space (RKHS). We study this problem under the additional constraint of joint differential privacy, where the agents needs to ensure that the sequence of query points is differentially private with respect to both the sequence of contexts and rewards. We propose a novel algorithm that improves upon the state of the art  and achieves an error rate of $\mathcal{O}\left(\sqrt{\dfrac{\gamma_T}{T}} + \dfrac{\gamma_T}{T \varepsilon}\right)$ after $T$ queries for a large class of kernel families, where $\gamma_T$ represents the effective dimensionality of the kernel and $\varepsilon > 0$ is the privacy parameter. Our results are based on novel estimator for the reward function that simultaneously enjoys high utility along with a low-sensitivity to observed rewards and contexts, which is crucial to obtain an improved performance.}
}

Endnote

%0 Conference Paper
%T Differentially Private Kernelized Contextual Bandits
%A Nikola Pavlovic
%A Sudeep Salgia
%A Qing Zhao
%B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2025
%E Yingzhen Li
%E Stephan Mandt
%E Shipra Agrawal
%E Emtiyaz Khan	
%F pmlr-v258-pavlovic25a
%I PMLR
%P 4618--4626
%U https://proceedings.mlr.press/v258/pavlovic25a.html
%V 258
%X We consider the problem of contextual kernel bandits with stochastic contexts, where the underlying reward function belongs to a known Reproducing Kernel Hilbert Space (RKHS). We study this problem under the additional constraint of joint differential privacy, where the agents needs to ensure that the sequence of query points is differentially private with respect to both the sequence of contexts and rewards. We propose a novel algorithm that improves upon the state of the art  and achieves an error rate of $\mathcal{O}\left(\sqrt{\dfrac{\gamma_T}{T}} + \dfrac{\gamma_T}{T \varepsilon}\right)$ after $T$ queries for a large class of kernel families, where $\gamma_T$ represents the effective dimensionality of the kernel and $\varepsilon > 0$ is the privacy parameter. Our results are based on novel estimator for the reward function that simultaneously enjoys high utility along with a low-sensitivity to observed rewards and contexts, which is crucial to obtain an improved performance.

APA

Pavlovic, N., Salgia, S. & Zhao, Q.. (2025). Differentially Private Kernelized Contextual Bandits. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:4618-4626 Available from https://proceedings.mlr.press/v258/pavlovic25a.html.

Differentially Private Kernelized Contextual Bandits

Abstract

Cite this Paper

Related Material