Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert

Yoonhyung Lee, Sungdong Lee, Joong-Ho Won
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:12423-12454, 2022.

Abstract

The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters. Specifically, we derive non-asymptotic point estimation error bounds of both proxRM and proxPR iterates and their limiting distributions, and propose on-line estimators of their asymptotic covariance matrices that require only a single run of ISGD. The latter estimators are used to construct valid confidence intervals for the model parameters. Our analysis is free of the generalized linear model assumption that has limited the preceding analyses, and employs feasible procedures. Our on-line covariance matrix estimators appear to be the first of this kind in the ISGD literature.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-lee22f, title = {Statistical inference with implicit {SGD}: proximal Robbins-Monro vs. Polyak-Ruppert}, author = {Lee, Yoonhyung and Lee, Sungdong and Won, Joong-Ho}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {12423--12454}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/lee22f/lee22f.pdf}, url = {https://proceedings.mlr.press/v162/lee22f.html}, abstract = {The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters. Specifically, we derive non-asymptotic point estimation error bounds of both proxRM and proxPR iterates and their limiting distributions, and propose on-line estimators of their asymptotic covariance matrices that require only a single run of ISGD. The latter estimators are used to construct valid confidence intervals for the model parameters. Our analysis is free of the generalized linear model assumption that has limited the preceding analyses, and employs feasible procedures. Our on-line covariance matrix estimators appear to be the first of this kind in the ISGD literature.} }
Endnote
%0 Conference Paper %T Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert %A Yoonhyung Lee %A Sungdong Lee %A Joong-Ho Won %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-lee22f %I PMLR %P 12423--12454 %U https://proceedings.mlr.press/v162/lee22f.html %V 162 %X The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters. Specifically, we derive non-asymptotic point estimation error bounds of both proxRM and proxPR iterates and their limiting distributions, and propose on-line estimators of their asymptotic covariance matrices that require only a single run of ISGD. The latter estimators are used to construct valid confidence intervals for the model parameters. Our analysis is free of the generalized linear model assumption that has limited the preceding analyses, and employs feasible procedures. Our on-line covariance matrix estimators appear to be the first of this kind in the ISGD literature.
APA
Lee, Y., Lee, S. & Won, J.. (2022). Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:12423-12454 Available from https://proceedings.mlr.press/v162/lee22f.html.

Related Material