[edit]
AdaPrivate-TS: Private Thompson Sampling for Contextual Bandits with Privacy Amplification
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:723-734, 2026.
Abstract
We present AdaPrivate-TS, a differentially private contextual bandit algorithm that combines Thompson Sampling with batched zCDP composition. Our key insight is that differential privacy noise inflates the posterior covariance in a structured way—adding N(0, $\sigma$$^2$ I) noise to b yields sampling covariance v$^2$ A$^{-}$$^1$ + $\sigma$$^2$ A$^{-}$$^2$, which Thompson Sampling interprets as increased uncertainty rather than pure corruption. Under event-level privacy (protecting individual interactions) with stochastic contexts, we prove that the privacy cost is only O($\sqrt{}$d \cdot log T/$\sqrt{}$$\rho$)—logarithmic in T—because parallel composition amortizes noise across batches. Additionally, we explore privacy amplification via Poisson subsampling, which can reduce effective noise at stringent privacy budgets. Experiments on synthetic and real-world datasets demonstrate: (1) AdaPrivate-TS achieves 93-99% of non-private performance at $\varepsilon$ $\in$ [0.5, 5], outperforming UCB by 0.5-3.7% and up to 18% with tuned adaptive exploration at extreme $\varepsilon$; (2) privacy amplification provides additional 2-5% gains at low $\varepsilon$; (3) on MovieLens and Jester, AdaPrivate-TS achieves the best overall performance among event-level baselines, dominating at $\varepsilon$ $\geq$ 2; (4) under DP-SVD private features, TS’s advantage over UCB grows to +11%, confirming noise-as-uncertainty is not limited to reward privacy. We provide rigorous proofs for privacy guarantees under interactive zCDP composition and comprehensive evaluation including convergence curves, 12-seed CIs, and DP-SVD feature ablation. Keywords: Differential Privacy, Thompson Sampling, Contextual Bandits, Privacy Amplification, zCDP