Improved Regret for Differentially Private Exploration in Linear MDP

Dung Daniel T Ngo, Giuseppe Vietri, Steven Wu
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:16529-16552, 2022.

Abstract

We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to (Luyo et al., 2021) achieves a regret rate that has a dependence of O(K^{3/5}) on the number of episodes K. We provide a private algorithm with an improved regret rate with an optimal dependence of O($\sqrt{}$K) on the number of episodes. The key recipe for our stronger regret guarantee is the adaptivity in the policy update schedule, in which an update only occurs when sufficient changes in the data are detected. As a result, our algorithm benefits from low switching cost and only performs O(log(K)) updates, which greatly reduces the amount of privacy noise. Finally, in the most prevalent privacy regimes where the privacy parameter \epsilon is a constant, our algorithm incurs negligible privacy cost{—}in comparison with the existing non-private regret bounds, the additional regret due to privacy appears in lower-order terms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-ngo22a, title = {Improved Regret for Differentially Private Exploration in Linear {MDP}}, author = {Ngo, Dung Daniel T and Vietri, Giuseppe and Wu, Steven}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {16529--16552}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/ngo22a/ngo22a.pdf}, url = {https://proceedings.mlr.press/v162/ngo22a.html}, abstract = {We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to (Luyo et al., 2021) achieves a regret rate that has a dependence of O(K^{3/5}) on the number of episodes K. We provide a private algorithm with an improved regret rate with an optimal dependence of O($\sqrt{}$K) on the number of episodes. The key recipe for our stronger regret guarantee is the adaptivity in the policy update schedule, in which an update only occurs when sufficient changes in the data are detected. As a result, our algorithm benefits from low switching cost and only performs O(log(K)) updates, which greatly reduces the amount of privacy noise. Finally, in the most prevalent privacy regimes where the privacy parameter \epsilon is a constant, our algorithm incurs negligible privacy cost{—}in comparison with the existing non-private regret bounds, the additional regret due to privacy appears in lower-order terms.} }
Endnote
%0 Conference Paper %T Improved Regret for Differentially Private Exploration in Linear MDP %A Dung Daniel T Ngo %A Giuseppe Vietri %A Steven Wu %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-ngo22a %I PMLR %P 16529--16552 %U https://proceedings.mlr.press/v162/ngo22a.html %V 162 %X We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to (Luyo et al., 2021) achieves a regret rate that has a dependence of O(K^{3/5}) on the number of episodes K. We provide a private algorithm with an improved regret rate with an optimal dependence of O($\sqrt{}$K) on the number of episodes. The key recipe for our stronger regret guarantee is the adaptivity in the policy update schedule, in which an update only occurs when sufficient changes in the data are detected. As a result, our algorithm benefits from low switching cost and only performs O(log(K)) updates, which greatly reduces the amount of privacy noise. Finally, in the most prevalent privacy regimes where the privacy parameter \epsilon is a constant, our algorithm incurs negligible privacy cost{—}in comparison with the existing non-private regret bounds, the additional regret due to privacy appears in lower-order terms.
APA
Ngo, D.D.T., Vietri, G. & Wu, S.. (2022). Improved Regret for Differentially Private Exploration in Linear MDP. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:16529-16552 Available from https://proceedings.mlr.press/v162/ngo22a.html.

Related Material