Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining

Florian Tramèr, Gautam Kamath, Nicholas Carlini
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:48453-48467, 2024.

Abstract

The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets. We critically review this approach. We primarily question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving. We further scrutinize whether existing machine learning benchmarks are appropriate for measuring the ability of pretrained models to generalize to sensitive domains. Finally, we observe that reliance on large pretrained models may lose other forms of privacy, requiring data to be outsourced to a more compute-powerful third party.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-tramer24a, title = {Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining}, author = {Tram\`{e}r, Florian and Kamath, Gautam and Carlini, Nicholas}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {48453--48467}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/tramer24a/tramer24a.pdf}, url = {https://proceedings.mlr.press/v235/tramer24a.html}, abstract = {The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets. We critically review this approach. We primarily question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving. We further scrutinize whether existing machine learning benchmarks are appropriate for measuring the ability of pretrained models to generalize to sensitive domains. Finally, we observe that reliance on large pretrained models may lose other forms of privacy, requiring data to be outsourced to a more compute-powerful third party.} }
Endnote
%0 Conference Paper %T Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining %A Florian Tramèr %A Gautam Kamath %A Nicholas Carlini %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-tramer24a %I PMLR %P 48453--48467 %U https://proceedings.mlr.press/v235/tramer24a.html %V 235 %X The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets. We critically review this approach. We primarily question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving. We further scrutinize whether existing machine learning benchmarks are appropriate for measuring the ability of pretrained models to generalize to sensitive domains. Finally, we observe that reliance on large pretrained models may lose other forms of privacy, requiring data to be outsourced to a more compute-powerful third party.
APA
Tramèr, F., Kamath, G. & Carlini, N.. (2024). Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:48453-48467 Available from https://proceedings.mlr.press/v235/tramer24a.html.

Related Material