On the Gradient Complexity of Private Optimization with Private Oracles

Michael Menart; Aleksandar Nikolov

On the Gradient Complexity of Private Optimization with Private Oracles

Michael Menart, Aleksandar Nikolov

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:5115-5158, 2026.

Abstract

We study the running time, in terms of first order oracle queries, of differentially private empirical/population risk minimization of Lipschitz convex losses. We primarily consider the setting where the loss is non-smooth and the optimizer interacts with a private proxy oracle, which sends only private messages about a minibatch of gradients. In this setting, we show that expected running time $\Omega(\min{\frac{\sqrt{d}}{\alpha^2}, \frac{d}{\log(1/\alpha)}})$ is necessary to achieve $\alpha$ excess risk on problems of dimension $d$ when $d \geq 1/\alpha^2$. Upper bounds via DP-SGD show these results are tight when $d>\tilde{\Omega}(1/\alpha^4)$. In fact, the lower bound nearly matches the best known upper bound for general private optimizers in this regime. A consequence of our results is that, in high dimensions, the ubiquitous DP-SGD algorithm necessarily suffers a dimension dependent runtime slowdown and further that DP-SGD is optimal among the subclass of DP optimizers that use private oracles. We further show our lower bound can be strengthened to $\Omega(\min{\frac{d}{\bar{m}\alpha^2}, \frac{d}{\log(1/\alpha)} })$ for algorithms which use minibatches of size at most $\bar{m} < \sqrt{d}$. We next consider smooth losses, where we relax the private oracle assumption and give lower bounds under only the condition that the optimizer is private. Here, we lower bound the expected number of first order oracle calls by $\tilde{\Omega}\big(\frac{\sqrt{d}}{\alpha} + \min{\frac{1}{\alpha^2}, n}\big)$, where $n$ is the size of the dataset. Modifications to existing algorithms show this bound is nearly tight. To our knowledge, ours are the first oracle complexity lower bounds to leverage differential privacy beyond the local privacy model. Compared to non-private lower bounds, our results show that differentially private optimizers pay a dimension dependent runtime penalty. Finally, as a natural extension of our proof technique, we show lower bounds in the non-smooth setting for optimizers interacting with information limited oracles. If the proxy oracle transmits at most $\Gamma$-bits of information about the gradients in the minibatch, then $\Omega\big(\min{\frac{d}{\alpha^2\Gamma}, \frac{d}{\log(1/\alpha)}}\big)$ oracle calls are needed. This result shows fundamental limitations of gradient quantization techniques in optimization.

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-menart26a,
  title = 	 {On the Gradient Complexity of Private Optimization with Private Oracles},
  author =       {Menart, Michael and Nikolov, Aleksandar},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {5115--5158},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/menart26a/menart26a.pdf},
  url = 	 {https://proceedings.mlr.press/v336/menart26a.html},
  abstract = 	 {We study the running time, in terms of first order oracle queries, of differentially private empirical/population risk minimization of Lipschitz convex losses. We primarily consider the setting where the loss is non-smooth and the optimizer interacts with a private proxy oracle, which sends only private messages about a minibatch of gradients. In this setting, we show that expected running time $\Omega(\min{\frac{\sqrt{d}}{\alpha^2}, \frac{d}{\log(1/\alpha)}})$ is necessary to achieve $\alpha$ excess risk on problems of dimension $d$ when $d \geq 1/\alpha^2$. Upper bounds via DP-SGD show these results are tight when $d>\tilde{\Omega}(1/\alpha^4)$. In fact, the lower bound nearly matches the best known upper bound for general private optimizers in this regime. A consequence of our results is that, in high dimensions, the ubiquitous DP-SGD algorithm necessarily suffers a dimension dependent runtime slowdown and further that DP-SGD is optimal among the subclass of DP optimizers that use private oracles. We further show our lower bound can be strengthened to $\Omega(\min{\frac{d}{\bar{m}\alpha^2}, \frac{d}{\log(1/\alpha)} })$ for algorithms which use minibatches of size at most $\bar{m} < \sqrt{d}$. We next consider smooth losses, where we relax the private oracle assumption and give lower bounds under only the condition that the optimizer is private. Here, we lower bound the expected number of first order oracle calls by $\tilde{\Omega}\big(\frac{\sqrt{d}}{\alpha} + \min{\frac{1}{\alpha^2}, n}\big)$, where $n$ is the size of the dataset. Modifications to existing algorithms show this bound is nearly tight. To our knowledge, ours are the first oracle complexity lower bounds to leverage differential privacy beyond the local privacy model. Compared to non-private lower bounds, our results show that differentially private optimizers pay a dimension dependent runtime penalty. Finally, as a natural extension of our proof technique, we show lower bounds in the non-smooth setting for optimizers interacting with information limited oracles. If the proxy oracle transmits at most $\Gamma$-bits of information about the gradients in the minibatch, then $\Omega\big(\min{\frac{d}{\alpha^2\Gamma}, \frac{d}{\log(1/\alpha)}}\big)$ oracle calls are needed. This result shows fundamental limitations of gradient quantization techniques in optimization.}
}

Endnote

%0 Conference Paper
%T On the Gradient Complexity of Private Optimization with Private Oracles
%A Michael Menart
%A Aleksandar Nikolov
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-menart26a
%I PMLR
%P 5115--5158
%U https://proceedings.mlr.press/v336/menart26a.html
%V 336
%X We study the running time, in terms of first order oracle queries, of differentially private empirical/population risk minimization of Lipschitz convex losses. We primarily consider the setting where the loss is non-smooth and the optimizer interacts with a private proxy oracle, which sends only private messages about a minibatch of gradients. In this setting, we show that expected running time $\Omega(\min{\frac{\sqrt{d}}{\alpha^2}, \frac{d}{\log(1/\alpha)}})$ is necessary to achieve $\alpha$ excess risk on problems of dimension $d$ when $d \geq 1/\alpha^2$. Upper bounds via DP-SGD show these results are tight when $d>\tilde{\Omega}(1/\alpha^4)$. In fact, the lower bound nearly matches the best known upper bound for general private optimizers in this regime. A consequence of our results is that, in high dimensions, the ubiquitous DP-SGD algorithm necessarily suffers a dimension dependent runtime slowdown and further that DP-SGD is optimal among the subclass of DP optimizers that use private oracles. We further show our lower bound can be strengthened to $\Omega(\min{\frac{d}{\bar{m}\alpha^2}, \frac{d}{\log(1/\alpha)} })$ for algorithms which use minibatches of size at most $\bar{m} < \sqrt{d}$. We next consider smooth losses, where we relax the private oracle assumption and give lower bounds under only the condition that the optimizer is private. Here, we lower bound the expected number of first order oracle calls by $\tilde{\Omega}\big(\frac{\sqrt{d}}{\alpha} + \min{\frac{1}{\alpha^2}, n}\big)$, where $n$ is the size of the dataset. Modifications to existing algorithms show this bound is nearly tight. To our knowledge, ours are the first oracle complexity lower bounds to leverage differential privacy beyond the local privacy model. Compared to non-private lower bounds, our results show that differentially private optimizers pay a dimension dependent runtime penalty. Finally, as a natural extension of our proof technique, we show lower bounds in the non-smooth setting for optimizers interacting with information limited oracles. If the proxy oracle transmits at most $\Gamma$-bits of information about the gradients in the minibatch, then $\Omega\big(\min{\frac{d}{\alpha^2\Gamma}, \frac{d}{\log(1/\alpha)}}\big)$ oracle calls are needed. This result shows fundamental limitations of gradient quantization techniques in optimization.

APA

Menart, M. & Nikolov, A.. (2026). On the Gradient Complexity of Private Optimization with Private Oracles. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:5115-5158 Available from https://proceedings.mlr.press/v336/menart26a.html.

Related Material

Download PDF