Privacy Backdoors: Stealing Data with Corrupted Pretrained Models

Shanglun Feng, Florian Tramèr
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:13326-13364, 2024.

Abstract

Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. We show that this practice introduces a new risk of privacy backdoors. By tampering with a pretrained model’s weights, an attacker can fully compromise the privacy of the finetuning data. We show how to build privacy backdoors for a variety of models, including transformers, which enable an attacker to reconstruct individual finetuning samples, with a guaranteed success! We further show that backdoored models allow for tight privacy attacks on models trained with differential privacy (DP). The common optimistic practice of training DP models with loose privacy guarantees is thus insecure if the model is not trusted. Overall, our work highlights a crucial and overlooked supply chain attack on machine learning privacy.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-feng24h, title = {Privacy Backdoors: Stealing Data with Corrupted Pretrained Models}, author = {Feng, Shanglun and Tram\`{e}r, Florian}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {13326--13364}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/feng24h/feng24h.pdf}, url = {https://proceedings.mlr.press/v235/feng24h.html}, abstract = {Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. We show that this practice introduces a new risk of privacy backdoors. By tampering with a pretrained model’s weights, an attacker can fully compromise the privacy of the finetuning data. We show how to build privacy backdoors for a variety of models, including transformers, which enable an attacker to reconstruct individual finetuning samples, with a guaranteed success! We further show that backdoored models allow for tight privacy attacks on models trained with differential privacy (DP). The common optimistic practice of training DP models with loose privacy guarantees is thus insecure if the model is not trusted. Overall, our work highlights a crucial and overlooked supply chain attack on machine learning privacy.} }
Endnote
%0 Conference Paper %T Privacy Backdoors: Stealing Data with Corrupted Pretrained Models %A Shanglun Feng %A Florian Tramèr %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-feng24h %I PMLR %P 13326--13364 %U https://proceedings.mlr.press/v235/feng24h.html %V 235 %X Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. We show that this practice introduces a new risk of privacy backdoors. By tampering with a pretrained model’s weights, an attacker can fully compromise the privacy of the finetuning data. We show how to build privacy backdoors for a variety of models, including transformers, which enable an attacker to reconstruct individual finetuning samples, with a guaranteed success! We further show that backdoored models allow for tight privacy attacks on models trained with differential privacy (DP). The common optimistic practice of training DP models with loose privacy guarantees is thus insecure if the model is not trusted. Overall, our work highlights a crucial and overlooked supply chain attack on machine learning privacy.
APA
Feng, S. & Tramèr, F.. (2024). Privacy Backdoors: Stealing Data with Corrupted Pretrained Models. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:13326-13364 Available from https://proceedings.mlr.press/v235/feng24h.html.

Related Material