[edit]
The Price of Differential Privacy for Online Learning
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:32-40, 2017.
Abstract
We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal O(T0.5) regret bounds. In the full-information setting, our results demonstrate that ϵ-differential privacy may be ensured for free – in particular, the regret bounds scale as O(T0.5+1/ϵ). For bandit linear optimization, and as a special case, for non-stochastic multi-armed bandits, the proposed algorithm achieves a regret of O(T0.5/ϵ), while the previously best known regret bound was O(T2/3/ϵ).