An Optimal Private StochasticMAB Algorithm based on Optimal Private Stopping Rule
[edit]
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:55795588, 2019.
Abstract
We present a provably optimal differentially private algorithm for the stochastic multiarm bandit problem, as opposed to the private analogue of the UCBalgorithm (Mishra and Thakurta, 2015; Tossou and Dimitrakakis, 2016) which doesn’t meet the recently discovered lowerbound of $\Omega \left(\frac{K\log(T)}{\epsilon} \right)$ (Shariff and Sheffet, 2018). Our construction is based on a different algorithm, Successive Elimination (EvenDar et al., 2002), that repeatedly pulls all remaining arms until an arm is found to be suboptimal and is then eliminated. In order to devise a private analogue of Successive Elimination we visit the problem of private stopping rule, that takes as input a stream of i.i.d samples from an unknown distribution and returns a multiplicative $(1 \pm \alpha)$approximation of the distribution’s mean, and prove the optimality of our private stopping rule. We then present the private Successive Elimination algorithm which meets both the nonprivate lower bound (Lai and Robbins, 1985) and the abovementioned private lower bound. We also compare empirically the performance of our algorithm with the private UCB algorithm.
Related Material


