[edit]
“Plus/minus the learning rate”: Easy and Scalable Statistical Inference with SGD
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:2285-2309, 2023.
Abstract
In this paper, we develop a statistical inference procedure using stochastic gradient descent (SGD)-based confidence intervals. These intervals are of the simplest possible form: θN,j±2√(γ/N) , where θN is the SGD estimate of model parameters θ over N data points, and γ is the learning rate. This construction relies only on a proper selection of the learning rate to ensure the standard SGD conditions for O(1/n) convergence. The procedure performs well in our empirical evaluations, achieving near-nominal coverage intervals scaling up to 20× as many parameters as other SGD-based inference methods. We also demonstrate our method’s practical significance on modeling adverse events in emergency general surgery patients using a novel dataset from the Hospital of the University of Pennsylvania.