Delving into Differentially Private Transformer

Youlong Ding, Xueyang Wu, Yining Meng, Yonggang Luo, Hao Wang, Weike Pan
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:11049-11071, 2024.

Abstract

Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to ’reduce’ the problem of training DP Transformer to the more basic problem of training DP vanilla neural nets. The latter is better understood and amenable to many model-agnostic methods. Such ’reduction’ is done by first identifying the hardness unique to DP Transformer training: the attention distraction phenomenon and a lack of compatibility with existing techniques for efficient gradient clipping. To deal with these two issues, we propose the Re-Attention Mechanism and Phantom Clipping, respectively. We believe that our work not only casts new light on training DP Transformers but also promotes a modular treatment to advance research in the field of differentially private deep learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-ding24g, title = {Delving into Differentially Private Transformer}, author = {Ding, Youlong and Wu, Xueyang and Meng, Yining and Luo, Yonggang and Wang, Hao and Pan, Weike}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {11049--11071}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ding24g/ding24g.pdf}, url = {https://proceedings.mlr.press/v235/ding24g.html}, abstract = {Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to ’reduce’ the problem of training DP Transformer to the more basic problem of training DP vanilla neural nets. The latter is better understood and amenable to many model-agnostic methods. Such ’reduction’ is done by first identifying the hardness unique to DP Transformer training: the attention distraction phenomenon and a lack of compatibility with existing techniques for efficient gradient clipping. To deal with these two issues, we propose the Re-Attention Mechanism and Phantom Clipping, respectively. We believe that our work not only casts new light on training DP Transformers but also promotes a modular treatment to advance research in the field of differentially private deep learning.} }
Endnote
%0 Conference Paper %T Delving into Differentially Private Transformer %A Youlong Ding %A Xueyang Wu %A Yining Meng %A Yonggang Luo %A Hao Wang %A Weike Pan %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-ding24g %I PMLR %P 11049--11071 %U https://proceedings.mlr.press/v235/ding24g.html %V 235 %X Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to ’reduce’ the problem of training DP Transformer to the more basic problem of training DP vanilla neural nets. The latter is better understood and amenable to many model-agnostic methods. Such ’reduction’ is done by first identifying the hardness unique to DP Transformer training: the attention distraction phenomenon and a lack of compatibility with existing techniques for efficient gradient clipping. To deal with these two issues, we propose the Re-Attention Mechanism and Phantom Clipping, respectively. We believe that our work not only casts new light on training DP Transformers but also promotes a modular treatment to advance research in the field of differentially private deep learning.
APA
Ding, Y., Wu, X., Meng, Y., Luo, Y., Wang, H. & Pan, W.. (2024). Delving into Differentially Private Transformer. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:11049-11071 Available from https://proceedings.mlr.press/v235/ding24g.html.

Related Material