[edit]
Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:24403-24423, 2025.
Abstract
We address differentially private stochastic bandit problems by leveraging Thompson Sampling with Gaussian priors and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private algorithm that enables trading off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{0.25(1-\alpha)}\right)$-GDP and achieves $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bounds, where $K$ is the number of arms, $ \Delta$ is the sub-optimality gap, $T$ is the learning horizon, and $\alpha \in [0,1]$ controls the trade-off between privacy and regret. Theoretically, DP-TS-UCB relies on anti-concentration bounds for the Gaussian distributions, linking the exploration mechanisms of Thompson Sampling and Upper Confidence Bound, which may be of independent research interest.