Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret

Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias Lécuyer, Nidhi Hegde
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:24403-24423, 2025.

Abstract

We address differentially private stochastic bandit problems by leveraging Thompson Sampling with Gaussian priors and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private algorithm that enables trading off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{0.25(1-\alpha)}\right)$-GDP and achieves $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bounds, where $K$ is the number of arms, $ \Delta$ is the sub-optimality gap, $T$ is the learning horizon, and $\alpha \in [0,1]$ controls the trade-off between privacy and regret. Theoretically, DP-TS-UCB relies on anti-concentration bounds for the Gaussian distributions, linking the exploration mechanisms of Thompson Sampling and Upper Confidence Bound, which may be of independent research interest.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-hu25k, title = {Connecting Thompson Sampling and {UCB}: Towards More Efficient Trade-offs Between Privacy and Regret}, author = {Hu, Bingshan and Huang, Zhiming and Zhang, Tianyue H. and L\'{e}cuyer, Mathias and Hegde, Nidhi}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {24403--24423}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/hu25k/hu25k.pdf}, url = {https://proceedings.mlr.press/v267/hu25k.html}, abstract = {We address differentially private stochastic bandit problems by leveraging Thompson Sampling with Gaussian priors and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private algorithm that enables trading off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{0.25(1-\alpha)}\right)$-GDP and achieves $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bounds, where $K$ is the number of arms, $ \Delta$ is the sub-optimality gap, $T$ is the learning horizon, and $\alpha \in [0,1]$ controls the trade-off between privacy and regret. Theoretically, DP-TS-UCB relies on anti-concentration bounds for the Gaussian distributions, linking the exploration mechanisms of Thompson Sampling and Upper Confidence Bound, which may be of independent research interest.} }
Endnote
%0 Conference Paper %T Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret %A Bingshan Hu %A Zhiming Huang %A Tianyue H. Zhang %A Mathias Lécuyer %A Nidhi Hegde %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-hu25k %I PMLR %P 24403--24423 %U https://proceedings.mlr.press/v267/hu25k.html %V 267 %X We address differentially private stochastic bandit problems by leveraging Thompson Sampling with Gaussian priors and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private algorithm that enables trading off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{0.25(1-\alpha)}\right)$-GDP and achieves $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bounds, where $K$ is the number of arms, $ \Delta$ is the sub-optimality gap, $T$ is the learning horizon, and $\alpha \in [0,1]$ controls the trade-off between privacy and regret. Theoretically, DP-TS-UCB relies on anti-concentration bounds for the Gaussian distributions, linking the exploration mechanisms of Thompson Sampling and Upper Confidence Bound, which may be of independent research interest.
APA
Hu, B., Huang, Z., Zhang, T.H., Lécuyer, M. & Hegde, N.. (2025). Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:24403-24423 Available from https://proceedings.mlr.press/v267/hu25k.html.

Related Material