AdaPrivate-TS: Private Thompson Sampling for Contextual Bandits with Privacy Amplification

Mohammadreza Riyazat; Eranga Ukwatta

AdaPrivate-TS: Private Thompson Sampling for Contextual Bandits with Privacy Amplification

Mohammadreza Riyazat, Eranga Ukwatta

Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:723-734, 2026.

Abstract

We present AdaPrivate-TS, a differentially private contextual bandit algorithm that combines Thompson Sampling with batched zCDP composition. Our key insight is that differential privacy noise inflates the posterior covariance in a structured way—adding N(0, $\sigma$$^2$ I) noise to b yields sampling covariance v$^2$ A$^{-}$$^1$ + $\sigma$$^2$ A$^{-}$$^2$, which Thompson Sampling interprets as increased uncertainty rather than pure corruption. Under event-level privacy (protecting individual interactions) with stochastic contexts, we prove that the privacy cost is only O($\sqrt{}$d \cdot log T/$\sqrt{}$$\rho$)—logarithmic in T—because parallel composition amortizes noise across batches. Additionally, we explore privacy amplification via Poisson subsampling, which can reduce effective noise at stringent privacy budgets. Experiments on synthetic and real-world datasets demonstrate: (1) AdaPrivate-TS achieves 93-99% of non-private performance at $\varepsilon$ $\in$ [0.5, 5], outperforming UCB by 0.5-3.7% and up to 18% with tuned adaptive exploration at extreme $\varepsilon$; (2) privacy amplification provides additional 2-5% gains at low $\varepsilon$; (3) on MovieLens and Jester, AdaPrivate-TS achieves the best overall performance among event-level baselines, dominating at $\varepsilon$ $\geq$ 2; (4) under DP-SVD private features, TS’s advantage over UCB grows to +11%, confirming noise-as-uncertainty is not limited to reward privacy. We provide rigorous proofs for privacy guarantees under interactive zCDP composition and comprehensive evaluation including convergence curves, 12-seed CIs, and DP-SVD feature ablation. Keywords: Differential Privacy, Thompson Sampling, Contextual Bandits, Privacy Amplification, zCDP

Cite this Paper

BibTeX

@InProceedings{pmlr-v318-riyazat26a,
  title = 	 {AdaPrivate-TS: Private Thompson Sampling for Contextual Bandits with Privacy Amplification},
  author =       {Riyazat, Mohammadreza and Ukwatta, Eranga},
  booktitle = 	 {Proceedings of the The 39th Canadian Conference on Artificial Intelligence},
  pages = 	 {723--734},
  year = 	 {2026},
  editor = 	 {Bouzar-Benlabiod, Lydia and Leung, Carson},
  volume = 	 {318},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--29 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v318/main/assets/riyazat26a/riyazat26a.pdf},
  url = 	 {https://proceedings.mlr.press/v318/riyazat26a.html},
  abstract = 	 {We present AdaPrivate-TS, a differentially private contextual bandit algorithm that combines Thompson Sampling with batched zCDP composition. Our key insight is that differential privacy noise inflates the posterior covariance in a structured way—adding N(0, $\sigma$$^2$ I) noise to b yields sampling covariance v$^2$ A$^{-}$$^1$ + $\sigma$$^2$ A$^{-}$$^2$, which Thompson Sampling interprets as increased uncertainty rather than pure corruption. Under event-level privacy (protecting individual interactions) with stochastic contexts, we prove that the privacy cost is only O($\sqrt{}$d \cdot log T/$\sqrt{}$$\rho$)—logarithmic in T—because parallel composition amortizes noise across batches. Additionally, we explore privacy amplification via Poisson subsampling, which can reduce effective noise at stringent privacy budgets. Experiments on synthetic and real-world datasets demonstrate: (1) AdaPrivate-TS achieves 93-99% of non-private performance at $\varepsilon$ $\in$ [0.5, 5], outperforming UCB by 0.5-3.7% and up to 18% with tuned adaptive exploration at extreme $\varepsilon$; (2) privacy amplification provides additional 2-5% gains at low $\varepsilon$; (3) on MovieLens and Jester, AdaPrivate-TS achieves the best overall performance among event-level baselines, dominating at $\varepsilon$ $\geq$ 2; (4) under DP-SVD private features, TS’s advantage over UCB grows to +11%, confirming noise-as-uncertainty is not limited to reward privacy. We provide rigorous proofs for privacy guarantees under interactive zCDP composition and comprehensive evaluation including convergence curves, 12-seed CIs, and DP-SVD feature ablation. Keywords: Differential Privacy, Thompson Sampling, Contextual Bandits, Privacy Amplification, zCDP}
}

Endnote

%0 Conference Paper
%T AdaPrivate-TS: Private Thompson Sampling for Contextual Bandits with Privacy Amplification
%A Mohammadreza Riyazat
%A Eranga Ukwatta
%B Proceedings of the The 39th Canadian Conference on Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2026
%E Lydia Bouzar-Benlabiod
%E Carson Leung	
%F pmlr-v318-riyazat26a
%I PMLR
%P 723--734
%U https://proceedings.mlr.press/v318/riyazat26a.html
%V 318
%X We present AdaPrivate-TS, a differentially private contextual bandit algorithm that combines Thompson Sampling with batched zCDP composition. Our key insight is that differential privacy noise inflates the posterior covariance in a structured way—adding N(0, $\sigma$$^2$ I) noise to b yields sampling covariance v$^2$ A$^{-}$$^1$ + $\sigma$$^2$ A$^{-}$$^2$, which Thompson Sampling interprets as increased uncertainty rather than pure corruption. Under event-level privacy (protecting individual interactions) with stochastic contexts, we prove that the privacy cost is only O($\sqrt{}$d \cdot log T/$\sqrt{}$$\rho$)—logarithmic in T—because parallel composition amortizes noise across batches. Additionally, we explore privacy amplification via Poisson subsampling, which can reduce effective noise at stringent privacy budgets. Experiments on synthetic and real-world datasets demonstrate: (1) AdaPrivate-TS achieves 93-99% of non-private performance at $\varepsilon$ $\in$ [0.5, 5], outperforming UCB by 0.5-3.7% and up to 18% with tuned adaptive exploration at extreme $\varepsilon$; (2) privacy amplification provides additional 2-5% gains at low $\varepsilon$; (3) on MovieLens and Jester, AdaPrivate-TS achieves the best overall performance among event-level baselines, dominating at $\varepsilon$ $\geq$ 2; (4) under DP-SVD private features, TS’s advantage over UCB grows to +11%, confirming noise-as-uncertainty is not limited to reward privacy. We provide rigorous proofs for privacy guarantees under interactive zCDP composition and comprehensive evaluation including convergence curves, 12-seed CIs, and DP-SVD feature ablation. Keywords: Differential Privacy, Thompson Sampling, Contextual Bandits, Privacy Amplification, zCDP

APA

Riyazat, M. & Ukwatta, E.. (2026). AdaPrivate-TS: Private Thompson Sampling for Contextual Bandits with Privacy Amplification. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:723-734 Available from https://proceedings.mlr.press/v318/riyazat26a.html.

Related Material

Download PDF