Certified Data Removal from Machine Learning Models

Chuan Guo; Tom Goldstein; Awni Hannun; Laurens Van Der Maaten

Certified Data Removal from Machine Learning Models

Chuan Guo, Tom Goldstein, Awni Hannun, Laurens Van Der Maaten

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:3832-3842, 2020.

Abstract

Good data stewardship requires removal of data at the request of the data’s owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to “remove” data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-guo20c,
  title = 	 {Certified Data Removal from Machine Learning Models},
  author =       {Guo, Chuan and Goldstein, Tom and Hannun, Awni and Van Der Maaten, Laurens},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {3832--3842},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/guo20c/guo20c.pdf},
  url = 	 {https://proceedings.mlr.press/v119/guo20c.html},
  abstract = 	 {Good data stewardship requires removal of data at the request of the data’s owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to “remove” data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.}
}

Endnote

%0 Conference Paper
%T Certified Data Removal from Machine Learning Models
%A Chuan Guo
%A Tom Goldstein
%A Awni Hannun
%A Laurens Van Der Maaten
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-guo20c
%I PMLR
%P 3832--3842
%U https://proceedings.mlr.press/v119/guo20c.html
%V 119
%X Good data stewardship requires removal of data at the request of the data’s owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to “remove” data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.

APA

Guo, C., Goldstein, T., Hannun, A. & Van Der Maaten, L.. (2020). Certified Data Removal from Machine Learning Models. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:3832-3842 Available from https://proceedings.mlr.press/v119/guo20c.html.

Certified Data Removal from Machine Learning Models

Abstract

Cite this Paper

Related Material