Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients

Chris J Cundy; Rishi Desai; Stefano Ermon

Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients

Chris J Cundy, Rishi Desai, Stefano Ermon

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2809-2817, 2024.

Abstract

As reinforcement learning techniques are increasingly applied to real-world decision problems, attention has turned to how these algorithms use potentially sensitive information. We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions. We give examples of how this setting covers real-world problems in privacy for sequential decision-making. We solve this problem in the policy gradients framework by introducing a regularizer based on the mutual information (MI) between the sensitive state and the actions. We develop a model-based stochastic gradient estimator for optimization of privacy-constrained policies. We also discuss an alternative MI regularizer that serves as an upper bound to our main MI regularizer and can be optimized in a model-free setting, and a powerful direct estimator that can be used in an environment with differentiable dynamics. We contrast previous work in differentially-private RL to our mutual-information formulation of information disclosure. Experimental results show that our training method results in policies that hide the sensitive state, even in challenging high-dimensional tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v238-j-cundy24a,
  title = 	 { Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients },
  author =       {J Cundy, Chris and Desai, Rishi and Ermon, Stefano},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2809--2817},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/j-cundy24a/j-cundy24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/j-cundy24a.html},
  abstract = 	 { As reinforcement learning techniques are increasingly applied to real-world decision problems, attention has turned to how these algorithms use potentially sensitive information. We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions. We give examples of how this setting covers real-world problems in privacy for sequential decision-making. We solve this problem in the policy gradients framework by introducing a regularizer based on the mutual information (MI) between the sensitive state and the actions. We develop a model-based stochastic gradient estimator for optimization of privacy-constrained policies. We also discuss an alternative MI regularizer that serves as an upper bound to our main MI regularizer and can be optimized in a model-free setting, and a powerful direct estimator that can be used in an environment with differentiable dynamics. We contrast previous work in differentially-private RL to our mutual-information formulation of information disclosure. Experimental results show that our training method results in policies that hide the sensitive state, even in challenging high-dimensional tasks. }
}

Endnote

%0 Conference Paper
%T  Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients 
%A Chris J Cundy
%A Rishi Desai
%A Stefano Ermon
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-j-cundy24a
%I PMLR
%P 2809--2817
%U https://proceedings.mlr.press/v238/j-cundy24a.html
%V 238
%X  As reinforcement learning techniques are increasingly applied to real-world decision problems, attention has turned to how these algorithms use potentially sensitive information. We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions. We give examples of how this setting covers real-world problems in privacy for sequential decision-making. We solve this problem in the policy gradients framework by introducing a regularizer based on the mutual information (MI) between the sensitive state and the actions. We develop a model-based stochastic gradient estimator for optimization of privacy-constrained policies. We also discuss an alternative MI regularizer that serves as an upper bound to our main MI regularizer and can be optimized in a model-free setting, and a powerful direct estimator that can be used in an environment with differentiable dynamics. We contrast previous work in differentially-private RL to our mutual-information formulation of information disclosure. Experimental results show that our training method results in policies that hide the sensitive state, even in challenging high-dimensional tasks.

APA


J Cundy, C., Desai, R. & Ermon, S.. (2024).  Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2809-2817 Available from https://proceedings.mlr.press/v238/j-cundy24a.html.

Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients

Abstract

Cite this Paper

Related Material