[edit]
On the Gradient Complexity of Private Optimization with Private Oracles
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:5115-5158, 2026.
Abstract
We study the running time, in terms of first order oracle queries, of differentially private empirical/population risk minimization of Lipschitz convex losses. We primarily consider the setting where the loss is non-smooth and the optimizer interacts with a private proxy oracle, which sends only private messages about a minibatch of gradients. In this setting, we show that expected running time $\Omega(\min{\frac{\sqrt{d}}{\alpha^2}, \frac{d}{\log(1/\alpha)}})$ is necessary to achieve $\alpha$ excess risk on problems of dimension $d$ when $d \geq 1/\alpha^2$. Upper bounds via DP-SGD show these results are tight when $d>\tilde{\Omega}(1/\alpha^4)$. In fact, the lower bound nearly matches the best known upper bound for general private optimizers in this regime. A consequence of our results is that, in high dimensions, the ubiquitous DP-SGD algorithm necessarily suffers a dimension dependent runtime slowdown and further that DP-SGD is optimal among the subclass of DP optimizers that use private oracles. We further show our lower bound can be strengthened to $\Omega(\min{\frac{d}{\bar{m}\alpha^2}, \frac{d}{\log(1/\alpha)} })$ for algorithms which use minibatches of size at most $\bar{m} < \sqrt{d}$. We next consider smooth losses, where we relax the private oracle assumption and give lower bounds under only the condition that the optimizer is private. Here, we lower bound the expected number of first order oracle calls by $\tilde{\Omega}\big(\frac{\sqrt{d}}{\alpha} + \min{\frac{1}{\alpha^2}, n}\big)$, where $n$ is the size of the dataset. Modifications to existing algorithms show this bound is nearly tight. To our knowledge, ours are the first oracle complexity lower bounds to leverage differential privacy beyond the local privacy model. Compared to non-private lower bounds, our results show that differentially private optimizers pay a dimension dependent runtime penalty. Finally, as a natural extension of our proof technique, we show lower bounds in the non-smooth setting for optimizers interacting with information limited oracles. If the proxy oracle transmits at most $\Gamma$-bits of information about the gradients in the minibatch, then $\Omega\big(\min{\frac{d}{\alpha^2\Gamma}, \frac{d}{\log(1/\alpha)}}\big)$ oracle calls are needed. This result shows fundamental limitations of gradient quantization techniques in optimization.