TAN Without a Burn: Scaling Laws of DP-SGD

Tom Sander, Pierre Stock, Alexandre Sablayrolles
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:29937-29949, 2023.

Abstract

Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently, in particular with the use of massive batches and aggregated data augmentations for a large number of training steps. These techniques require much more computing resources than their non-private counterparts, shifting the traditional privacy-accuracy trade-off to a privacy-accuracy-compute trade-off and making hyper-parameter search virtually impossible for realistic scenarios. In this work, we decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements. We first use the tools of Renyi Differential Privacy (RDP) to highlight that the privacy budget, when not overcharged, only depends on the total amount of noise (TAN) injected throughout training. We then derive scaling laws for training models with DP-SGD to optimize hyper-parameters with more than a $100\times$ reduction in computational budget. We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a $+9$ points gain in top-1 accuracy for a privacy budget $\varepsilon=8$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-sander23b, title = {{TAN} Without a Burn: Scaling Laws of {DP}-{SGD}}, author = {Sander, Tom and Stock, Pierre and Sablayrolles, Alexandre}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {29937--29949}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/sander23b/sander23b.pdf}, url = {https://proceedings.mlr.press/v202/sander23b.html}, abstract = {Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently, in particular with the use of massive batches and aggregated data augmentations for a large number of training steps. These techniques require much more computing resources than their non-private counterparts, shifting the traditional privacy-accuracy trade-off to a privacy-accuracy-compute trade-off and making hyper-parameter search virtually impossible for realistic scenarios. In this work, we decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements. We first use the tools of Renyi Differential Privacy (RDP) to highlight that the privacy budget, when not overcharged, only depends on the total amount of noise (TAN) injected throughout training. We then derive scaling laws for training models with DP-SGD to optimize hyper-parameters with more than a $100\times$ reduction in computational budget. We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a $+9$ points gain in top-1 accuracy for a privacy budget $\varepsilon=8$.} }
Endnote
%0 Conference Paper %T TAN Without a Burn: Scaling Laws of DP-SGD %A Tom Sander %A Pierre Stock %A Alexandre Sablayrolles %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-sander23b %I PMLR %P 29937--29949 %U https://proceedings.mlr.press/v202/sander23b.html %V 202 %X Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently, in particular with the use of massive batches and aggregated data augmentations for a large number of training steps. These techniques require much more computing resources than their non-private counterparts, shifting the traditional privacy-accuracy trade-off to a privacy-accuracy-compute trade-off and making hyper-parameter search virtually impossible for realistic scenarios. In this work, we decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements. We first use the tools of Renyi Differential Privacy (RDP) to highlight that the privacy budget, when not overcharged, only depends on the total amount of noise (TAN) injected throughout training. We then derive scaling laws for training models with DP-SGD to optimize hyper-parameters with more than a $100\times$ reduction in computational budget. We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a $+9$ points gain in top-1 accuracy for a privacy budget $\varepsilon=8$.
APA
Sander, T., Stock, P. & Sablayrolles, A.. (2023). TAN Without a Burn: Scaling Laws of DP-SGD. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:29937-29949 Available from https://proceedings.mlr.press/v202/sander23b.html.

Related Material