Scaling Laws for Differentially Private Language Models

Ryan Mckenna, Yangsibo Huang, Amer Sinha, Borja Balle, Zachary Charles, Christopher A. Choquette-Choo, Badih Ghazi, Georgios Kaissis, Ravi Kumar, Ruibo Liu, Da Yu, Chiyuan Zhang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:43375-43398, 2025.

Abstract

Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale, and provide guidance on important hyper-parameter choices that would otherwise be expensive. LLMs also rely on large, high-quality training datasets, like those sourced from (sometimes sensitive) user data. Training models on this sensitive user data requires careful privacy protections like differential privacy (DP). However, the dynamics of DP training are significantly different, and consequently their scaling laws are not yet fully understood. In this work, we establish scaling laws that accurately model the intricacies of DP LLM training, providing a complete picture of the compute-privacy-utility and the optimal training configurations in many settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-mckenna25a, title = {Scaling Laws for Differentially Private Language Models}, author = {Mckenna, Ryan and Huang, Yangsibo and Sinha, Amer and Balle, Borja and Charles, Zachary and Choquette-Choo, Christopher A. and Ghazi, Badih and Kaissis, Georgios and Kumar, Ravi and Liu, Ruibo and Yu, Da and Zhang, Chiyuan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {43375--43398}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/mckenna25a/mckenna25a.pdf}, url = {https://proceedings.mlr.press/v267/mckenna25a.html}, abstract = {Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale, and provide guidance on important hyper-parameter choices that would otherwise be expensive. LLMs also rely on large, high-quality training datasets, like those sourced from (sometimes sensitive) user data. Training models on this sensitive user data requires careful privacy protections like differential privacy (DP). However, the dynamics of DP training are significantly different, and consequently their scaling laws are not yet fully understood. In this work, we establish scaling laws that accurately model the intricacies of DP LLM training, providing a complete picture of the compute-privacy-utility and the optimal training configurations in many settings.} }
Endnote
%0 Conference Paper %T Scaling Laws for Differentially Private Language Models %A Ryan Mckenna %A Yangsibo Huang %A Amer Sinha %A Borja Balle %A Zachary Charles %A Christopher A. Choquette-Choo %A Badih Ghazi %A Georgios Kaissis %A Ravi Kumar %A Ruibo Liu %A Da Yu %A Chiyuan Zhang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-mckenna25a %I PMLR %P 43375--43398 %U https://proceedings.mlr.press/v267/mckenna25a.html %V 267 %X Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale, and provide guidance on important hyper-parameter choices that would otherwise be expensive. LLMs also rely on large, high-quality training datasets, like those sourced from (sometimes sensitive) user data. Training models on this sensitive user data requires careful privacy protections like differential privacy (DP). However, the dynamics of DP training are significantly different, and consequently their scaling laws are not yet fully understood. In this work, we establish scaling laws that accurately model the intricacies of DP LLM training, providing a complete picture of the compute-privacy-utility and the optimal training configurations in many settings.
APA
Mckenna, R., Huang, Y., Sinha, A., Balle, B., Charles, Z., Choquette-Choo, C.A., Ghazi, B., Kaissis, G., Kumar, R., Liu, R., Yu, D. & Zhang, C.. (2025). Scaling Laws for Differentially Private Language Models. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:43375-43398 Available from https://proceedings.mlr.press/v267/mckenna25a.html.

Related Material