Near-Optimal Rates for O(1)-Smooth DP-SCO with a Single Epoch and Large Batches

Christopher A. Choquette-Choo; Arun Ganesh; Abhradeep Guha Thakurta

Near-Optimal Rates for O(1)-Smooth DP-SCO with a Single Epoch and Large Batches

Christopher A. Choquette-Choo, Arun Ganesh, Abhradeep Guha Thakurta

Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:315-348, 2025.

Abstract

In this paper we revisit the DP stochastic convex optimization (SCO) problem. For convex smooth losses, it is well-known that the canonical DP-SGD (stochastic gradient descent) achieves the optimal rate of $O\left(\frac{LR}{\sqrt{n}} + \frac{LR \sqrt{p \log(1/\delta)}}{\epsilon n}\right)$ under $(\epsilon, \delta)$-DP (Bassily et al. 2014), and also well-known that variants of DP-SGD can achieve the optimal rate in a single epoch (Feldman et al. 2019). However, the batch gradient complexity (i.e., number of adaptive optimization steps), which is important in applications like federated learning, is less well-understood. In particular, all prior work on DP-SCO requires $\Omega(n)$ batch gradient steps, multiple epochs, or convexity for privacy. We propose an algorithm, Accelerated-DP-SRGD (stochastic recursive gradient descent), which bypasses the limitations of past work: it achieves the optimal rate for DP-SCO (up to polylog factors), in a single epoch using $\sqrt{n}$ batch gradient steps with batch size $\sqrt{n}$, and can be made private for arbitrary (non-convex) losses via clipping. If the global minimizer is in the constraint set, we can further improve this to $n^{1/4}$ batch gradient steps with batch size $n^{3/4}$. To achieve this, our algorithm combines three key ingredients, a variant of stochastic recursive gradients (SRG), accelerated gradient descent, and correlated noise generation from DP continual counting.

Cite this Paper

BibTeX

@InProceedings{pmlr-v272-choquette-choo25a,
  title = 	 {Near-Optimal Rates for O(1)-Smooth DP-SCO with a Single Epoch and Large Batches},
  author =       {Choquette-Choo, Christopher A. and Ganesh, Arun and Thakurta, Abhradeep Guha},
  booktitle = 	 {Proceedings of The 36th International Conference on Algorithmic Learning Theory},
  pages = 	 {315--348},
  year = 	 {2025},
  editor = 	 {Kamath, Gautam and Loh, Po-Ling},
  volume = 	 {272},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {24--27 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v272/main/assets/choquette-choo25a/choquette-choo25a.pdf},
  url = 	 {https://proceedings.mlr.press/v272/choquette-choo25a.html},
  abstract = 	 {In this paper we revisit the DP stochastic convex optimization (SCO) problem. For convex smooth losses, it is well-known that the canonical DP-SGD (stochastic gradient descent) achieves the optimal rate of $O\left(\frac{LR}{\sqrt{n}} + \frac{LR \sqrt{p \log(1/\delta)}}{\epsilon n}\right)$ under $(\epsilon, \delta)$-DP (Bassily et al. 2014), and also well-known that variants of DP-SGD can achieve the optimal rate in a single epoch (Feldman et al. 2019). However, the batch gradient complexity (i.e., number of adaptive optimization steps), which is important in applications like federated learning, is less well-understood. In particular, all prior work on DP-SCO requires $\Omega(n)$ batch gradient steps, multiple epochs, or convexity for privacy. We propose an algorithm, Accelerated-DP-SRGD (stochastic recursive gradient descent), which bypasses the limitations of past work: it achieves the optimal rate for DP-SCO (up to polylog factors), in a single epoch using  $\sqrt{n}$ batch gradient steps with batch size $\sqrt{n}$, and can be made private for arbitrary (non-convex) losses via clipping. If the global minimizer is in the constraint set, we can further improve this to $n^{1/4}$ batch gradient steps with batch size $n^{3/4}$. To achieve this, our algorithm combines three key ingredients, a variant of stochastic recursive gradients (SRG), accelerated gradient descent, and correlated noise generation from DP continual counting.}
}

Endnote

%0 Conference Paper
%T Near-Optimal Rates for O(1)-Smooth DP-SCO with a Single Epoch and Large Batches
%A Christopher A. Choquette-Choo
%A Arun Ganesh
%A Abhradeep Guha Thakurta
%B Proceedings of The 36th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2025
%E Gautam Kamath
%E Po-Ling Loh	
%F pmlr-v272-choquette-choo25a
%I PMLR
%P 315--348
%U https://proceedings.mlr.press/v272/choquette-choo25a.html
%V 272
%X In this paper we revisit the DP stochastic convex optimization (SCO) problem. For convex smooth losses, it is well-known that the canonical DP-SGD (stochastic gradient descent) achieves the optimal rate of $O\left(\frac{LR}{\sqrt{n}} + \frac{LR \sqrt{p \log(1/\delta)}}{\epsilon n}\right)$ under $(\epsilon, \delta)$-DP (Bassily et al. 2014), and also well-known that variants of DP-SGD can achieve the optimal rate in a single epoch (Feldman et al. 2019). However, the batch gradient complexity (i.e., number of adaptive optimization steps), which is important in applications like federated learning, is less well-understood. In particular, all prior work on DP-SCO requires $\Omega(n)$ batch gradient steps, multiple epochs, or convexity for privacy. We propose an algorithm, Accelerated-DP-SRGD (stochastic recursive gradient descent), which bypasses the limitations of past work: it achieves the optimal rate for DP-SCO (up to polylog factors), in a single epoch using  $\sqrt{n}$ batch gradient steps with batch size $\sqrt{n}$, and can be made private for arbitrary (non-convex) losses via clipping. If the global minimizer is in the constraint set, we can further improve this to $n^{1/4}$ batch gradient steps with batch size $n^{3/4}$. To achieve this, our algorithm combines three key ingredients, a variant of stochastic recursive gradients (SRG), accelerated gradient descent, and correlated noise generation from DP continual counting.

APA

Choquette-Choo, C.A., Ganesh, A. & Thakurta, A.G.. (2025). Near-Optimal Rates for O(1)-Smooth DP-SCO with a Single Epoch and Large Batches. Proceedings of The 36th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 272:315-348 Available from https://proceedings.mlr.press/v272/choquette-choo25a.html.

Near-Optimal Rates for O(1)-Smooth DP-SCO with a Single Epoch and Large Batches

Abstract

Cite this Paper

Related Material