“Plus/minus the learning rate”: Easy and Scalable Statistical Inference with SGD

Jerry Chee; Hwanwoo Kim; Panos Toulis

“Plus/minus the learning rate”: Easy and Scalable Statistical Inference with SGD

Jerry Chee, Hwanwoo Kim, Panos Toulis

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:2285-2309, 2023.

Abstract

In this paper, we develop a statistical inference procedure using stochastic gradient descent (SGD)-based confidence intervals. These intervals are of the simplest possible form:

$\theta_{N,j} \pm 2\sqrt{}(\gamma/N)$ , where

$\theta_N$ is the SGD estimate of model parameters

$\theta$ over N data points, and

$\gamma$ is the learning rate. This construction relies only on a proper selection of the learning rate to ensure the standard SGD conditions for O(1/n) convergence. The procedure performs well in our empirical evaluations, achieving near-nominal coverage intervals scaling up to 20

$\times$ as many parameters as other SGD-based inference methods. We also demonstrate our method’s practical significance on modeling adverse events in emergency general surgery patients using a novel dataset from the Hospital of the University of Pennsylvania.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-chee23a,
  title = 	 {“Plus/minus the learning rate”: Easy and Scalable Statistical Inference with SGD},
  author =       {Chee, Jerry and Kim, Hwanwoo and Toulis, Panos},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2285--2309},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/chee23a/chee23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/chee23a.html},
  abstract = 	 {In this paper, we develop a statistical inference procedure using stochastic gradient descent (SGD)-based confidence intervals. These intervals are of the simplest possible form: $\theta_{N,j} \pm 2\sqrt{}(\gamma/N)$ , where $\theta_N$ is the SGD estimate of model parameters $\theta$ over N data points, and $\gamma$ is the learning rate. This construction relies only on a proper selection of the learning rate to ensure the standard SGD conditions for O(1/n) convergence. The procedure performs well in our empirical evaluations, achieving near-nominal coverage intervals scaling up to 20$\times$ as many parameters as other SGD-based inference methods. We also demonstrate our method’s  practical significance on modeling adverse events in emergency general surgery patients using a novel dataset from the Hospital of the University of Pennsylvania.}
}

Endnote

%0 Conference Paper
%T “Plus/minus the learning rate”: Easy and Scalable Statistical Inference with SGD
%A Jerry Chee
%A Hwanwoo Kim
%A Panos Toulis
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-chee23a
%I PMLR
%P 2285--2309
%U https://proceedings.mlr.press/v206/chee23a.html
%V 206
%X In this paper, we develop a statistical inference procedure using stochastic gradient descent (SGD)-based confidence intervals. These intervals are of the simplest possible form: $\theta_{N,j} \pm 2\sqrt{}(\gamma/N)$ , where $\theta_N$ is the SGD estimate of model parameters $\theta$ over N data points, and $\gamma$ is the learning rate. This construction relies only on a proper selection of the learning rate to ensure the standard SGD conditions for O(1/n) convergence. The procedure performs well in our empirical evaluations, achieving near-nominal coverage intervals scaling up to 20$\times$ as many parameters as other SGD-based inference methods. We also demonstrate our method’s  practical significance on modeling adverse events in emergency general surgery patients using a novel dataset from the Hospital of the University of Pennsylvania.

APA


Chee, J., Kim, H. & Toulis, P.. (2023). “Plus/minus the learning rate”: Easy and Scalable Statistical Inference with SGD. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:2285-2309 Available from https://proceedings.mlr.press/v206/chee23a.html.

“Plus/minus the learning rate”: Easy and Scalable Statistical Inference with SGD

Abstract

Cite this Paper

Related Material