Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits

Bingshan Hu, Nidhi Hegde
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:844-852, 2022.

Abstract

We address differentially private stochastic bandits. We present two (near)-optimal Thompson Sampling-based learning algorithms: DP-TS and Lazy-DP-TS. The core idea in achieving optimality is the principle of optimism in the face of uncertainty. We reshape the posterior distribution in an optimistic way as compared to the non-private Thompson Sampling. Our DP-TS achieves a $\sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\} )} \log \left(\frac{\log(T)}{\epsilon \cdot \Delta_j} \right) \right)$ regret bound, where $\mathcal{A}$ is the arm set, $\Delta_j$ is the sub-optimality gap of a sub-optimal arm $j$, and $\epsilon$ is the privacy parameter. Our Lazy-DP-TS gets rid of the extra $\log$ factor by using the idea of dropping observations. The regret of Lazy-DP-TS is $ \sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\}} \right)$, which matches the regret lower bound. Additionally, we conduct experiments to compare the empirical performance of our proposed algorithms with the existing optimal algorithms for differentially private stochastic bandits.

Cite this Paper


BibTeX
@InProceedings{pmlr-v180-hu22a, title = {Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits}, author = {Hu, Bingshan and Hegde, Nidhi}, booktitle = {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence}, pages = {844--852}, year = {2022}, editor = {Cussens, James and Zhang, Kun}, volume = {180}, series = {Proceedings of Machine Learning Research}, month = {01--05 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v180/hu22a/hu22a.pdf}, url = {https://proceedings.mlr.press/v180/hu22a.html}, abstract = {We address differentially private stochastic bandits. We present two (near)-optimal Thompson Sampling-based learning algorithms: DP-TS and Lazy-DP-TS. The core idea in achieving optimality is the principle of optimism in the face of uncertainty. We reshape the posterior distribution in an optimistic way as compared to the non-private Thompson Sampling. Our DP-TS achieves a $\sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\} )} \log \left(\frac{\log(T)}{\epsilon \cdot \Delta_j} \right) \right)$ regret bound, where $\mathcal{A}$ is the arm set, $\Delta_j$ is the sub-optimality gap of a sub-optimal arm $j$, and $\epsilon$ is the privacy parameter. Our Lazy-DP-TS gets rid of the extra $\log$ factor by using the idea of dropping observations. The regret of Lazy-DP-TS is $ \sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\}} \right)$, which matches the regret lower bound. Additionally, we conduct experiments to compare the empirical performance of our proposed algorithms with the existing optimal algorithms for differentially private stochastic bandits.} }
Endnote
%0 Conference Paper %T Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits %A Bingshan Hu %A Nidhi Hegde %B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2022 %E James Cussens %E Kun Zhang %F pmlr-v180-hu22a %I PMLR %P 844--852 %U https://proceedings.mlr.press/v180/hu22a.html %V 180 %X We address differentially private stochastic bandits. We present two (near)-optimal Thompson Sampling-based learning algorithms: DP-TS and Lazy-DP-TS. The core idea in achieving optimality is the principle of optimism in the face of uncertainty. We reshape the posterior distribution in an optimistic way as compared to the non-private Thompson Sampling. Our DP-TS achieves a $\sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\} )} \log \left(\frac{\log(T)}{\epsilon \cdot \Delta_j} \right) \right)$ regret bound, where $\mathcal{A}$ is the arm set, $\Delta_j$ is the sub-optimality gap of a sub-optimal arm $j$, and $\epsilon$ is the privacy parameter. Our Lazy-DP-TS gets rid of the extra $\log$ factor by using the idea of dropping observations. The regret of Lazy-DP-TS is $ \sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\}} \right)$, which matches the regret lower bound. Additionally, we conduct experiments to compare the empirical performance of our proposed algorithms with the existing optimal algorithms for differentially private stochastic bandits.
APA
Hu, B. & Hegde, N.. (2022). Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:844-852 Available from https://proceedings.mlr.press/v180/hu22a.html.

Related Material