A Certified Unlearning Approach without Access to Source Data

Umit Yigit Basaran, Sk Miraj Ahmed, Amit Roy-Chowdhury, Basak Guler
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:3071-3090, 2025.

Abstract

With the growing adoption of data privacy regulations, the ability to erase private or copyrighted information from trained models has become a crucial requirement. Traditional unlearning methods often assume access to the complete training dataset, which is unrealistic in scenarios where the source data is no longer available. To address this challenge, we propose a certified unlearning framework that enables effective data removal without access to the original training data samples. Our approach utilizes a surrogate dataset that approximates the statistical properties of the source data, allowing for controlled noise scaling based on the statistical distance between the two. While our theoretical guarantees assume knowledge of the exact statistical distance, practical implementations typically approximate this distance, resulting in potentially weaker but still meaningful privacy guarantees. This ensures strong guarantees on the model’s behavior post-unlearning while maintaining its overall utility. We establish theoretical bounds, introduce practical noise calibration techniques, and validate our method through extensive experiments on both synthetic and real-world datasets. The results demonstrate the effectiveness and reliability of our approach in privacy-sensitive settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-basaran25a, title = {A Certified Unlearning Approach without Access to Source Data}, author = {Basaran, Umit Yigit and Ahmed, Sk Miraj and Roy-Chowdhury, Amit and Guler, Basak}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {3071--3090}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/basaran25a/basaran25a.pdf}, url = {https://proceedings.mlr.press/v267/basaran25a.html}, abstract = {With the growing adoption of data privacy regulations, the ability to erase private or copyrighted information from trained models has become a crucial requirement. Traditional unlearning methods often assume access to the complete training dataset, which is unrealistic in scenarios where the source data is no longer available. To address this challenge, we propose a certified unlearning framework that enables effective data removal without access to the original training data samples. Our approach utilizes a surrogate dataset that approximates the statistical properties of the source data, allowing for controlled noise scaling based on the statistical distance between the two. While our theoretical guarantees assume knowledge of the exact statistical distance, practical implementations typically approximate this distance, resulting in potentially weaker but still meaningful privacy guarantees. This ensures strong guarantees on the model’s behavior post-unlearning while maintaining its overall utility. We establish theoretical bounds, introduce practical noise calibration techniques, and validate our method through extensive experiments on both synthetic and real-world datasets. The results demonstrate the effectiveness and reliability of our approach in privacy-sensitive settings.} }
Endnote
%0 Conference Paper %T A Certified Unlearning Approach without Access to Source Data %A Umit Yigit Basaran %A Sk Miraj Ahmed %A Amit Roy-Chowdhury %A Basak Guler %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-basaran25a %I PMLR %P 3071--3090 %U https://proceedings.mlr.press/v267/basaran25a.html %V 267 %X With the growing adoption of data privacy regulations, the ability to erase private or copyrighted information from trained models has become a crucial requirement. Traditional unlearning methods often assume access to the complete training dataset, which is unrealistic in scenarios where the source data is no longer available. To address this challenge, we propose a certified unlearning framework that enables effective data removal without access to the original training data samples. Our approach utilizes a surrogate dataset that approximates the statistical properties of the source data, allowing for controlled noise scaling based on the statistical distance between the two. While our theoretical guarantees assume knowledge of the exact statistical distance, practical implementations typically approximate this distance, resulting in potentially weaker but still meaningful privacy guarantees. This ensures strong guarantees on the model’s behavior post-unlearning while maintaining its overall utility. We establish theoretical bounds, introduce practical noise calibration techniques, and validate our method through extensive experiments on both synthetic and real-world datasets. The results demonstrate the effectiveness and reliability of our approach in privacy-sensitive settings.
APA
Basaran, U.Y., Ahmed, S.M., Roy-Chowdhury, A. & Guler, B.. (2025). A Certified Unlearning Approach without Access to Source Data. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:3071-3090 Available from https://proceedings.mlr.press/v267/basaran25a.html.

Related Material