Forget Unlearning: Towards True Data-Deletion in Machine Learning

Rishav Chourasia, Neil Shah
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:6028-6073, 2023.

Abstract

Unlearning algorithms aim to remove deleted data’s influence from trained models at a cost lower than full retraining. However, prior guarantees of unlearning in literature are flawed and don’t protect the privacy of deleted records. We show that when people delete their data as a function of published models, records in a database become interdependent. So, even retraining a fresh model after deletion of a record doesn’t ensure its privacy. Secondly, unlearning algorithms that cache partial computations to speed up the processing can leak deleted information over a series of releases, violating the privacy of deleted records in the long run. To address these, we propose a sound deletion guarantee and show that ensuring the privacy of existing records is necessary for the privacy of deleted records. Under this notion, we propose an optimal, computationally efficient, and sound machine unlearning algorithm based on noisy gradient descent.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-chourasia23a, title = {Forget Unlearning: Towards True Data-Deletion in Machine Learning}, author = {Chourasia, Rishav and Shah, Neil}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {6028--6073}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/chourasia23a/chourasia23a.pdf}, url = {https://proceedings.mlr.press/v202/chourasia23a.html}, abstract = {Unlearning algorithms aim to remove deleted data’s influence from trained models at a cost lower than full retraining. However, prior guarantees of unlearning in literature are flawed and don’t protect the privacy of deleted records. We show that when people delete their data as a function of published models, records in a database become interdependent. So, even retraining a fresh model after deletion of a record doesn’t ensure its privacy. Secondly, unlearning algorithms that cache partial computations to speed up the processing can leak deleted information over a series of releases, violating the privacy of deleted records in the long run. To address these, we propose a sound deletion guarantee and show that ensuring the privacy of existing records is necessary for the privacy of deleted records. Under this notion, we propose an optimal, computationally efficient, and sound machine unlearning algorithm based on noisy gradient descent.} }
Endnote
%0 Conference Paper %T Forget Unlearning: Towards True Data-Deletion in Machine Learning %A Rishav Chourasia %A Neil Shah %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-chourasia23a %I PMLR %P 6028--6073 %U https://proceedings.mlr.press/v202/chourasia23a.html %V 202 %X Unlearning algorithms aim to remove deleted data’s influence from trained models at a cost lower than full retraining. However, prior guarantees of unlearning in literature are flawed and don’t protect the privacy of deleted records. We show that when people delete their data as a function of published models, records in a database become interdependent. So, even retraining a fresh model after deletion of a record doesn’t ensure its privacy. Secondly, unlearning algorithms that cache partial computations to speed up the processing can leak deleted information over a series of releases, violating the privacy of deleted records in the long run. To address these, we propose a sound deletion guarantee and show that ensuring the privacy of existing records is necessary for the privacy of deleted records. Under this notion, we propose an optimal, computationally efficient, and sound machine unlearning algorithm based on noisy gradient descent.
APA
Chourasia, R. & Shah, N.. (2023). Forget Unlearning: Towards True Data-Deletion in Machine Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:6028-6073 Available from https://proceedings.mlr.press/v202/chourasia23a.html.

Related Material