[edit]
Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:844-852, 2022.
Abstract
We address differentially private stochastic bandits. We present two (near)-optimal Thompson Sampling-based learning algorithms: DP-TS and Lazy-DP-TS. The core idea in achieving optimality is the principle of optimism in the face of uncertainty. We reshape the posterior distribution in an optimistic way as compared to the non-private Thompson Sampling. Our DP-TS achieves a ∑j∈A:Δj>0O(log(T)min regret bound, where \mathcal{A} is the arm set, \Delta_j is the sub-optimality gap of a sub-optimal arm j, and \epsilon is the privacy parameter. Our Lazy-DP-TS gets rid of the extra \log factor by using the idea of dropping observations. The regret of Lazy-DP-TS is \sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\}} \right), which matches the regret lower bound. Additionally, we conduct experiments to compare the empirical performance of our proposed algorithms with the existing optimal algorithms for differentially private stochastic bandits.