Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training

Tom Sander, Maxime Sylvestre, Alain Durmus
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3295-3303, 2024.

Abstract

Training Deep Neural Networks (DNNs) with small batches using Stochastic Gradient Descent (SGD) often results in superior test performance compared to larger batches. This implicit bias is attributed to the specific noise structure inherent to SGD. When ensuring Differential Privacy (DP) in DNNs’ training, DP-SGD adds Gaussian noise to the clipped gradients. However, large-batch training still leads to a significant performance decrease, posing a challenge as strong DP guarantees necessitate the use of massive batches. Our study first demonstrates that this phenomenon extends to Noisy-SGD (DP-SGD without clipping), suggesting that the stochasticity, not the clipping, is responsible for this implicit bias, even with additional isotropic Gaussian noise. We then theoretically analyze the solutions obtained with continuous versions of Noisy-SGD for the Linear Least Square and Diagonal Linear Network settings. Our analysis reveals that the additional noise indeed amplifies the implicit bias. It suggests that the performance issues of private training stem from the same underlying principles as SGD, offering hope for improvements in large batch training strategies.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-sander24a, title = { Implicit Bias in Noisy-{SGD}: With Applications to Differentially Private Training }, author = {Sander, Tom and Sylvestre, Maxime and Durmus, Alain}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {3295--3303}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/sander24a/sander24a.pdf}, url = {https://proceedings.mlr.press/v238/sander24a.html}, abstract = { Training Deep Neural Networks (DNNs) with small batches using Stochastic Gradient Descent (SGD) often results in superior test performance compared to larger batches. This implicit bias is attributed to the specific noise structure inherent to SGD. When ensuring Differential Privacy (DP) in DNNs’ training, DP-SGD adds Gaussian noise to the clipped gradients. However, large-batch training still leads to a significant performance decrease, posing a challenge as strong DP guarantees necessitate the use of massive batches. Our study first demonstrates that this phenomenon extends to Noisy-SGD (DP-SGD without clipping), suggesting that the stochasticity, not the clipping, is responsible for this implicit bias, even with additional isotropic Gaussian noise. We then theoretically analyze the solutions obtained with continuous versions of Noisy-SGD for the Linear Least Square and Diagonal Linear Network settings. Our analysis reveals that the additional noise indeed amplifies the implicit bias. It suggests that the performance issues of private training stem from the same underlying principles as SGD, offering hope for improvements in large batch training strategies. } }
Endnote
%0 Conference Paper %T Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training %A Tom Sander %A Maxime Sylvestre %A Alain Durmus %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-sander24a %I PMLR %P 3295--3303 %U https://proceedings.mlr.press/v238/sander24a.html %V 238 %X Training Deep Neural Networks (DNNs) with small batches using Stochastic Gradient Descent (SGD) often results in superior test performance compared to larger batches. This implicit bias is attributed to the specific noise structure inherent to SGD. When ensuring Differential Privacy (DP) in DNNs’ training, DP-SGD adds Gaussian noise to the clipped gradients. However, large-batch training still leads to a significant performance decrease, posing a challenge as strong DP guarantees necessitate the use of massive batches. Our study first demonstrates that this phenomenon extends to Noisy-SGD (DP-SGD without clipping), suggesting that the stochasticity, not the clipping, is responsible for this implicit bias, even with additional isotropic Gaussian noise. We then theoretically analyze the solutions obtained with continuous versions of Noisy-SGD for the Linear Least Square and Diagonal Linear Network settings. Our analysis reveals that the additional noise indeed amplifies the implicit bias. It suggests that the performance issues of private training stem from the same underlying principles as SGD, offering hope for improvements in large batch training strategies.
APA
Sander, T., Sylvestre, M. & Durmus, A.. (2024). Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3295-3303 Available from https://proceedings.mlr.press/v238/sander24a.html.

Related Material