Differentially Private No-regret Exploration in Adversarial Markov Decision Processes

Shaojie Bai, Lanting Zeng, Chengcheng Zhao, Xiaoming Duan, Mohammad Sadegh Talebi, Peng Cheng, Jiming Chen
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:235-272, 2024.

Abstract

We study learning adversarial Markov decision process (MDP) in the episodic setting under the constraint of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in non-stationary and even adversarial scenarios, where protecting users’ sensitive information is vital. We first propose two efficient frameworks for adversarial MDPs, spanning full-information and bandit settings. Within each framework, we consider both Joint DP (JDP), where a central agent is trusted to protect the sensitive data, and Local DP (LDP), where the information is protected directly on the user side. Then, we design novel privacy mechanisms to privatize the stochastic transition and adversarial losses. By instantiating such privacy mechanisms to satisfy JDP and LDP requirements, we obtain near-optimal regret guarantees for both frameworks. To our knowledge, these are the first algorithms to tackle the challenge of private learning in adversarial MDPs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v244-bai24a, title = {Differentially Private No-regret Exploration in Adversarial Markov Decision Processes}, author = {Bai, Shaojie and Zeng, Lanting and Zhao, Chengcheng and Duan, Xiaoming and Sadegh Talebi, Mohammad and Cheng, Peng and Chen, Jiming}, booktitle = {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence}, pages = {235--272}, year = {2024}, editor = {Kiyavash, Negar and Mooij, Joris M.}, volume = {244}, series = {Proceedings of Machine Learning Research}, month = {15--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v244/main/assets/bai24a/bai24a.pdf}, url = {https://proceedings.mlr.press/v244/bai24a.html}, abstract = {We study learning adversarial Markov decision process (MDP) in the episodic setting under the constraint of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in non-stationary and even adversarial scenarios, where protecting users’ sensitive information is vital. We first propose two efficient frameworks for adversarial MDPs, spanning full-information and bandit settings. Within each framework, we consider both Joint DP (JDP), where a central agent is trusted to protect the sensitive data, and Local DP (LDP), where the information is protected directly on the user side. Then, we design novel privacy mechanisms to privatize the stochastic transition and adversarial losses. By instantiating such privacy mechanisms to satisfy JDP and LDP requirements, we obtain near-optimal regret guarantees for both frameworks. To our knowledge, these are the first algorithms to tackle the challenge of private learning in adversarial MDPs.} }
Endnote
%0 Conference Paper %T Differentially Private No-regret Exploration in Adversarial Markov Decision Processes %A Shaojie Bai %A Lanting Zeng %A Chengcheng Zhao %A Xiaoming Duan %A Mohammad Sadegh Talebi %A Peng Cheng %A Jiming Chen %B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2024 %E Negar Kiyavash %E Joris M. Mooij %F pmlr-v244-bai24a %I PMLR %P 235--272 %U https://proceedings.mlr.press/v244/bai24a.html %V 244 %X We study learning adversarial Markov decision process (MDP) in the episodic setting under the constraint of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in non-stationary and even adversarial scenarios, where protecting users’ sensitive information is vital. We first propose two efficient frameworks for adversarial MDPs, spanning full-information and bandit settings. Within each framework, we consider both Joint DP (JDP), where a central agent is trusted to protect the sensitive data, and Local DP (LDP), where the information is protected directly on the user side. Then, we design novel privacy mechanisms to privatize the stochastic transition and adversarial losses. By instantiating such privacy mechanisms to satisfy JDP and LDP requirements, we obtain near-optimal regret guarantees for both frameworks. To our knowledge, these are the first algorithms to tackle the challenge of private learning in adversarial MDPs.
APA
Bai, S., Zeng, L., Zhao, C., Duan, X., Sadegh Talebi, M., Cheng, P. & Chen, J.. (2024). Differentially Private No-regret Exploration in Adversarial Markov Decision Processes. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:235-272 Available from https://proceedings.mlr.press/v244/bai24a.html.

Related Material