Leveraging Randomness in Model and Data Partitioning for Privacy Amplification

Andy Dong, Wei-Ning Chen, Ayfer Ozgur
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:13938-13962, 2025.

Abstract

We study how inherent randomness in the training process—where each sample (or client in federated learning) contributes only to a randomly selected portion of training—can be leveraged for privacy amplification. This includes (1) data partitioning, where a sample participates in only a subset of training iterations, and (2) model partitioning, where a sample updates only a subset of the model parameters. We apply our framework to model parallelism in federated learning, where each client updates a randomly selected subnetwork to reduce memory and computational overhead, and show that existing methods, e.g. model splitting or dropout, provide a significant privacy amplification gain not captured by previous privacy analysis techniques. Additionally, we introduce balanced iteration subsampling, a new data partitioning method where each sample (or client) participates in a fixed number of training iterations. We show that in certain regimes, this method yields stronger privacy amplification than Poisson (i.i.d.) sampling of data (or clients). Our results demonstrate that randomness in the training process, which is structured rather than i.i.d. and interacts with data in complex ways, can be systematically leveraged for nontrivial privacy amplification.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-dong25a, title = {Leveraging Randomness in Model and Data Partitioning for Privacy Amplification}, author = {Dong, Andy and Chen, Wei-Ning and Ozgur, Ayfer}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {13938--13962}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/dong25a/dong25a.pdf}, url = {https://proceedings.mlr.press/v267/dong25a.html}, abstract = {We study how inherent randomness in the training process—where each sample (or client in federated learning) contributes only to a randomly selected portion of training—can be leveraged for privacy amplification. This includes (1) data partitioning, where a sample participates in only a subset of training iterations, and (2) model partitioning, where a sample updates only a subset of the model parameters. We apply our framework to model parallelism in federated learning, where each client updates a randomly selected subnetwork to reduce memory and computational overhead, and show that existing methods, e.g. model splitting or dropout, provide a significant privacy amplification gain not captured by previous privacy analysis techniques. Additionally, we introduce balanced iteration subsampling, a new data partitioning method where each sample (or client) participates in a fixed number of training iterations. We show that in certain regimes, this method yields stronger privacy amplification than Poisson (i.i.d.) sampling of data (or clients). Our results demonstrate that randomness in the training process, which is structured rather than i.i.d. and interacts with data in complex ways, can be systematically leveraged for nontrivial privacy amplification.} }
Endnote
%0 Conference Paper %T Leveraging Randomness in Model and Data Partitioning for Privacy Amplification %A Andy Dong %A Wei-Ning Chen %A Ayfer Ozgur %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-dong25a %I PMLR %P 13938--13962 %U https://proceedings.mlr.press/v267/dong25a.html %V 267 %X We study how inherent randomness in the training process—where each sample (or client in federated learning) contributes only to a randomly selected portion of training—can be leveraged for privacy amplification. This includes (1) data partitioning, where a sample participates in only a subset of training iterations, and (2) model partitioning, where a sample updates only a subset of the model parameters. We apply our framework to model parallelism in federated learning, where each client updates a randomly selected subnetwork to reduce memory and computational overhead, and show that existing methods, e.g. model splitting or dropout, provide a significant privacy amplification gain not captured by previous privacy analysis techniques. Additionally, we introduce balanced iteration subsampling, a new data partitioning method where each sample (or client) participates in a fixed number of training iterations. We show that in certain regimes, this method yields stronger privacy amplification than Poisson (i.i.d.) sampling of data (or clients). Our results demonstrate that randomness in the training process, which is structured rather than i.i.d. and interacts with data in complex ways, can be systematically leveraged for nontrivial privacy amplification.
APA
Dong, A., Chen, W. & Ozgur, A.. (2025). Leveraging Randomness in Model and Data Partitioning for Privacy Amplification. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:13938-13962 Available from https://proceedings.mlr.press/v267/dong25a.html.

Related Material