Thompson Sampling Itself is Differentially Private

Tingting Ou, Rachel Cummings, Marco Avella
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:1576-1584, 2024.

Abstract

In this work we first show that the classical Thompson sampling algorithm for multi-arm bandits is differentially private as-is, without any modification. We provide per-round privacy guarantees as a function of problem parameters and show composition over $T$ rounds; since the algorithm is unchanged, existing $O(\sqrt{NT\log N})$ regret bounds still hold and there is no loss in performance due to privacy. We then show that simple modifications – such as pre-pulling all arms a fixed number of times, increasing the sampling variance – can provide tighter privacy guarantees. We again provide privacy guarantees that now depend on the new parameters introduced in the modification, which allows the analyst to tune the privacy guarantee as desired. We also provide a novel regret analysis for this new algorithm, and show how the new parameters also impact expected regret. Finally, we empirically validate and illustrate our theoretical findings in two parameter regimes and demonstrate that tuning the new parameters substantially improve the privacy-regret tradeoff.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-ou24a, title = { Thompson Sampling Itself is Differentially Private }, author = {Ou, Tingting and Cummings, Rachel and Avella, Marco}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {1576--1584}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/ou24a/ou24a.pdf}, url = {https://proceedings.mlr.press/v238/ou24a.html}, abstract = { In this work we first show that the classical Thompson sampling algorithm for multi-arm bandits is differentially private as-is, without any modification. We provide per-round privacy guarantees as a function of problem parameters and show composition over $T$ rounds; since the algorithm is unchanged, existing $O(\sqrt{NT\log N})$ regret bounds still hold and there is no loss in performance due to privacy. We then show that simple modifications – such as pre-pulling all arms a fixed number of times, increasing the sampling variance – can provide tighter privacy guarantees. We again provide privacy guarantees that now depend on the new parameters introduced in the modification, which allows the analyst to tune the privacy guarantee as desired. We also provide a novel regret analysis for this new algorithm, and show how the new parameters also impact expected regret. Finally, we empirically validate and illustrate our theoretical findings in two parameter regimes and demonstrate that tuning the new parameters substantially improve the privacy-regret tradeoff. } }
Endnote
%0 Conference Paper %T Thompson Sampling Itself is Differentially Private %A Tingting Ou %A Rachel Cummings %A Marco Avella %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-ou24a %I PMLR %P 1576--1584 %U https://proceedings.mlr.press/v238/ou24a.html %V 238 %X In this work we first show that the classical Thompson sampling algorithm for multi-arm bandits is differentially private as-is, without any modification. We provide per-round privacy guarantees as a function of problem parameters and show composition over $T$ rounds; since the algorithm is unchanged, existing $O(\sqrt{NT\log N})$ regret bounds still hold and there is no loss in performance due to privacy. We then show that simple modifications – such as pre-pulling all arms a fixed number of times, increasing the sampling variance – can provide tighter privacy guarantees. We again provide privacy guarantees that now depend on the new parameters introduced in the modification, which allows the analyst to tune the privacy guarantee as desired. We also provide a novel regret analysis for this new algorithm, and show how the new parameters also impact expected regret. Finally, we empirically validate and illustrate our theoretical findings in two parameter regimes and demonstrate that tuning the new parameters substantially improve the privacy-regret tradeoff.
APA
Ou, T., Cummings, R. & Avella, M.. (2024). Thompson Sampling Itself is Differentially Private . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:1576-1584 Available from https://proceedings.mlr.press/v238/ou24a.html.

Related Material