Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits

Bingshan Hu; Nidhi Hegde

Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits

Bingshan Hu, Nidhi Hegde

Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:844-852, 2022.

Abstract

We address differentially private stochastic bandits. We present two (near)-optimal Thompson Sampling-based learning algorithms: DP-TS and Lazy-DP-TS. The core idea in achieving optimality is the principle of optimism in the face of uncertainty. We reshape the posterior distribution in an optimistic way as compared to the non-private Thompson Sampling. Our DP-TS achieves a $\sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\} )} \log \left(\frac{\log(T)}{\epsilon \cdot \Delta_j} \right) \right)$ regret bound, where $\mathcal{A}$ is the arm set, $\Delta_j$ is the sub-optimality gap of a sub-optimal arm $j$, and $\epsilon$ is the privacy parameter. Our Lazy-DP-TS gets rid of the extra $\log$ factor by using the idea of dropping observations. The regret of Lazy-DP-TS is $ \sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\}} \right)$, which matches the regret lower bound. Additionally, we conduct experiments to compare the empirical performance of our proposed algorithms with the existing optimal algorithms for differentially private stochastic bandits.

Cite this Paper

BibTeX


@InProceedings{pmlr-v180-hu22a,
  title = 	 {Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits},
  author =       {Hu, Bingshan and Hegde, Nidhi},
  booktitle = 	 {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {844--852},
  year = 	 {2022},
  editor = 	 {Cussens, James and Zhang, Kun},
  volume = 	 {180},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {01--05 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v180/hu22a/hu22a.pdf},
  url = 	 {https://proceedings.mlr.press/v180/hu22a.html},
  abstract = 	 {We address differentially private stochastic bandits. We present two (near)-optimal  Thompson Sampling-based learning algorithms: DP-TS and Lazy-DP-TS. The core idea in achieving optimality  is  the principle of optimism in the face of uncertainty. We reshape the posterior distribution in an optimistic way as compared to the  non-private Thompson Sampling. Our DP-TS achieves a $\sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\} )} \log \left(\frac{\log(T)}{\epsilon \cdot \Delta_j} \right) \right)$ regret bound, where $\mathcal{A}$ is the arm set, $\Delta_j$ is the sub-optimality gap of a sub-optimal arm $j$, and $\epsilon$ is the  privacy parameter.  Our Lazy-DP-TS gets rid of the extra $\log$ factor by using the idea of dropping observations. The regret of Lazy-DP-TS  is  $ \sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\}} \right)$, which matches the  regret lower bound. Additionally, we conduct experiments to compare the empirical performance of our proposed  algorithms with the existing optimal  algorithms for differentially private stochastic bandits.}
}

Endnote

%0 Conference Paper
%T Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits
%A Bingshan Hu
%A Nidhi Hegde
%B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2022
%E James Cussens
%E Kun Zhang	
%F pmlr-v180-hu22a
%I PMLR
%P 844--852
%U https://proceedings.mlr.press/v180/hu22a.html
%V 180
%X We address differentially private stochastic bandits. We present two (near)-optimal  Thompson Sampling-based learning algorithms: DP-TS and Lazy-DP-TS. The core idea in achieving optimality  is  the principle of optimism in the face of uncertainty. We reshape the posterior distribution in an optimistic way as compared to the  non-private Thompson Sampling. Our DP-TS achieves a $\sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\} )} \log \left(\frac{\log(T)}{\epsilon \cdot \Delta_j} \right) \right)$ regret bound, where $\mathcal{A}$ is the arm set, $\Delta_j$ is the sub-optimality gap of a sub-optimal arm $j$, and $\epsilon$ is the  privacy parameter.  Our Lazy-DP-TS gets rid of the extra $\log$ factor by using the idea of dropping observations. The regret of Lazy-DP-TS  is  $ \sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\}} \right)$, which matches the  regret lower bound. Additionally, we conduct experiments to compare the empirical performance of our proposed  algorithms with the existing optimal  algorithms for differentially private stochastic bandits.

APA


Hu, B. & Hegde, N.. (2022). Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:844-852 Available from https://proceedings.mlr.press/v180/hu22a.html.

Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits

Abstract

Cite this Paper

Related Material