[edit]

# Differentially Private Algorithms for Efficient Online Matroid Optimization

*Proceedings of The 2nd Conference on Lifelong Learning Agents*, PMLR 232:66-88, 2023.

#### Abstract

A matroid bandit is the online version of combinatorial optimization on a matroid, in which the learner chooses $K$ actions from a set of $L$ actions that can form a matroid basis. Many real-world applications such as recommendation systems can be modeled as matroid bandits. In such learning problems, the revealed data may involve sensitive user information. Therefore, privacy considerations are crucial. We propose two simple and practical differentially private algorithms for matroid bandits built upon the ideas of Upper Confidence Bound and Thompson Sampling. The key idea behind our first algorithm, Differentially Private Upper Confidence Bound for Matroid Bandits (DPUCB-MAT), is to construct differentially private upper confidence bounds. The second algorithm, Differentially Private Thompson Sampling for Matroid Bandits (DPTS-MAT), is based on the idea of drawing random samples from differentially private posterior distributions. Both algorithms achieve $O\left( L\ln(n)/\Delta + LK\ln(n) \min\left\{K, \ln(n) \right\}/\varepsilon \right)$ regret bounds, where $\Delta$ denotes the mean reward gap and $\varepsilon$ is the required privacy parameter. Our derived regret bounds rely on novel technical arguments that deeply explore the special structure of matroids. We show a novel way to construct ordered pairs between the played actions and the optimal actions, which contributes to decomposing a matroid bandit problem into $K$ stochastic multi-armed bandit problems. Finally, we conduct experiments to demonstrate the empirical performance of our proposed learning algorithms by using both synthetic and real-world movie-recommendation datasets.