Private Reinforcement Learning with PAC and Regret Guarantees

Giuseppe Vietri, Borja Balle, Akshay Krishnamurthy, Steven Wu
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9754-9764, 2020.

Abstract

Motivated by high-stakes decision-making domains like personalized medicine where user information is inherently sensitive, we design privacy preserving exploration policies for episodic reinforcement learning (RL). We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)–a strong variant of differential privacy for settings where each user receives their own sets of output (e.g., policy recommendations). We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee. Our algorithm only pays for a moderate privacy cost on exploration: in comparison to the non-private bounds, the privacy parameter only appears in lower-order terms. Finally, we present lower bounds on sample complexity and regret for reinforcement learning subject to JDP.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-vietri20a, title = {Private Reinforcement Learning with {PAC} and Regret Guarantees}, author = {Vietri, Giuseppe and Balle, Borja and Krishnamurthy, Akshay and Wu, Steven}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9754--9764}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/vietri20a/vietri20a.pdf}, url = {http://proceedings.mlr.press/v119/vietri20a.html}, abstract = {Motivated by high-stakes decision-making domains like personalized medicine where user information is inherently sensitive, we design privacy preserving exploration policies for episodic reinforcement learning (RL). We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)–a strong variant of differential privacy for settings where each user receives their own sets of output (e.g., policy recommendations). We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee. Our algorithm only pays for a moderate privacy cost on exploration: in comparison to the non-private bounds, the privacy parameter only appears in lower-order terms. Finally, we present lower bounds on sample complexity and regret for reinforcement learning subject to JDP.} }
Endnote
%0 Conference Paper %T Private Reinforcement Learning with PAC and Regret Guarantees %A Giuseppe Vietri %A Borja Balle %A Akshay Krishnamurthy %A Steven Wu %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-vietri20a %I PMLR %P 9754--9764 %U http://proceedings.mlr.press/v119/vietri20a.html %V 119 %X Motivated by high-stakes decision-making domains like personalized medicine where user information is inherently sensitive, we design privacy preserving exploration policies for episodic reinforcement learning (RL). We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)–a strong variant of differential privacy for settings where each user receives their own sets of output (e.g., policy recommendations). We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee. Our algorithm only pays for a moderate privacy cost on exploration: in comparison to the non-private bounds, the privacy parameter only appears in lower-order terms. Finally, we present lower bounds on sample complexity and regret for reinforcement learning subject to JDP.
APA
Vietri, G., Balle, B., Krishnamurthy, A. & Wu, S.. (2020). Private Reinforcement Learning with PAC and Regret Guarantees. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9754-9764 Available from http://proceedings.mlr.press/v119/vietri20a.html.

Related Material