“Plus/minus the learning rate”: Easy and Scalable Statistical Inference with SGD

Jerry Chee, Hwanwoo Kim, Panos Toulis
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:2285-2309, 2023.

Abstract

In this paper, we develop a statistical inference procedure using stochastic gradient descent (SGD)-based confidence intervals. These intervals are of the simplest possible form: $\theta_{N,j} \pm 2\sqrt{}(\gamma/N)$ , where $\theta_N$ is the SGD estimate of model parameters $\theta$ over N data points, and $\gamma$ is the learning rate. This construction relies only on a proper selection of the learning rate to ensure the standard SGD conditions for O(1/n) convergence. The procedure performs well in our empirical evaluations, achieving near-nominal coverage intervals scaling up to 20$\times$ as many parameters as other SGD-based inference methods. We also demonstrate our method’s practical significance on modeling adverse events in emergency general surgery patients using a novel dataset from the Hospital of the University of Pennsylvania.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-chee23a, title = {“Plus/minus the learning rate”: Easy and Scalable Statistical Inference with SGD}, author = {Chee, Jerry and Kim, Hwanwoo and Toulis, Panos}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {2285--2309}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/chee23a/chee23a.pdf}, url = {https://proceedings.mlr.press/v206/chee23a.html}, abstract = {In this paper, we develop a statistical inference procedure using stochastic gradient descent (SGD)-based confidence intervals. These intervals are of the simplest possible form: $\theta_{N,j} \pm 2\sqrt{}(\gamma/N)$ , where $\theta_N$ is the SGD estimate of model parameters $\theta$ over N data points, and $\gamma$ is the learning rate. This construction relies only on a proper selection of the learning rate to ensure the standard SGD conditions for O(1/n) convergence. The procedure performs well in our empirical evaluations, achieving near-nominal coverage intervals scaling up to 20$\times$ as many parameters as other SGD-based inference methods. We also demonstrate our method’s practical significance on modeling adverse events in emergency general surgery patients using a novel dataset from the Hospital of the University of Pennsylvania.} }
Endnote
%0 Conference Paper %T “Plus/minus the learning rate”: Easy and Scalable Statistical Inference with SGD %A Jerry Chee %A Hwanwoo Kim %A Panos Toulis %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-chee23a %I PMLR %P 2285--2309 %U https://proceedings.mlr.press/v206/chee23a.html %V 206 %X In this paper, we develop a statistical inference procedure using stochastic gradient descent (SGD)-based confidence intervals. These intervals are of the simplest possible form: $\theta_{N,j} \pm 2\sqrt{}(\gamma/N)$ , where $\theta_N$ is the SGD estimate of model parameters $\theta$ over N data points, and $\gamma$ is the learning rate. This construction relies only on a proper selection of the learning rate to ensure the standard SGD conditions for O(1/n) convergence. The procedure performs well in our empirical evaluations, achieving near-nominal coverage intervals scaling up to 20$\times$ as many parameters as other SGD-based inference methods. We also demonstrate our method’s practical significance on modeling adverse events in emergency general surgery patients using a novel dataset from the Hospital of the University of Pennsylvania.
APA
Chee, J., Kim, H. & Toulis, P.. (2023). “Plus/minus the learning rate”: Easy and Scalable Statistical Inference with SGD. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:2285-2309 Available from https://proceedings.mlr.press/v206/chee23a.html.

Related Material