Certified Data Removal from Machine Learning Models

Chuan Guo, Tom Goldstein, Awni Hannun, Laurens Van Der Maaten
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:3832-3842, 2020.

Abstract

Good data stewardship requires removal of data at the request of the data’s owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to “remove” data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-guo20c, title = {Certified Data Removal from Machine Learning Models}, author = {Guo, Chuan and Goldstein, Tom and Hannun, Awni and Van Der Maaten, Laurens}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {3832--3842}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/guo20c/guo20c.pdf}, url = {https://proceedings.mlr.press/v119/guo20c.html}, abstract = {Good data stewardship requires removal of data at the request of the data’s owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to “remove” data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.} }
Endnote
%0 Conference Paper %T Certified Data Removal from Machine Learning Models %A Chuan Guo %A Tom Goldstein %A Awni Hannun %A Laurens Van Der Maaten %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-guo20c %I PMLR %P 3832--3842 %U https://proceedings.mlr.press/v119/guo20c.html %V 119 %X Good data stewardship requires removal of data at the request of the data’s owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to “remove” data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.
APA
Guo, C., Goldstein, T., Hannun, A. & Van Der Maaten, L.. (2020). Certified Data Removal from Machine Learning Models. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:3832-3842 Available from https://proceedings.mlr.press/v119/guo20c.html.

Related Material