[edit]
Open Problem: What is the Complexity of Joint Differential Privacy in Linear Contextual Bandits?
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:5306-5311, 2024.
Abstract
Contextual bandits serve as a theoretical framework to design recommender systems, which often rely on user-sensitive data, making privacy a critical concern. However, a significant gap remains between the known upper and lower bounds on the regret achievable in linear contextual bandits under Joint Differential Privacy (JDP), which is a popular privacy definition used in this setting. In particular, the best regret upper bound is known to be O(d√Tlog(T)+\textcolorblued3/4√Tlog(1/δ)/√ϵ), while the lower bound is Ω(√dTlog(K)+\textcolorblued/(ϵ+δ)). We discuss the recent progress on this problem, both from the algorithm design and lower bound techniques, and posit the open questions.