Towards Stability and Optimality in Stochastic Gradient Descent

Panos Toulis; Dustin Tran; Edo Airoldi

Towards Stability and Optimality in Stochastic Gradient Descent

Panos Toulis, Dustin Tran, Edo Airoldi

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1290-1298, 2016.

Abstract

Iterative procedures for parameter estimation based on stochastic gradient descent (SGD) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on SGD are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed averaged implicit SGD (AI-SGD). For statistical efficiency AI-SGD employs averaging of the iterates, which achieves the Cramer-Rao bound under strong convexity, i.e., it is asymptotically an optimal unbiased estimator of the true parameter value. For numerical stability AI-SGD employs an implicit update at each iteration, which is similar to updates performed by proximal operators in optimization. In practice, AI-SGD achieves competitive performance with state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-toulis16,
  title = 	 {Towards Stability and Optimality in Stochastic Gradient Descent},
  author = 	 {Toulis, Panos and Tran, Dustin and Airoldi, Edo},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1290--1298},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/toulis16.pdf},
  url = 	 {https://proceedings.mlr.press/v51/toulis16.html},
  abstract = 	 {Iterative procedures for parameter estimation based on stochastic gradient descent (SGD) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on SGD are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed averaged implicit SGD (AI-SGD). For statistical efficiency AI-SGD employs averaging of the iterates, which achieves the Cramer-Rao bound under strong convexity, i.e., it is asymptotically an optimal unbiased estimator of the true parameter value. For numerical stability AI-SGD employs an implicit update at each iteration, which is similar to updates performed by proximal operators in optimization. In practice, AI-SGD achieves competitive performance with state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates.}
}

Endnote

%0 Conference Paper
%T Towards Stability and Optimality in Stochastic Gradient Descent
%A Panos Toulis
%A Dustin Tran
%A Edo Airoldi
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-toulis16
%I PMLR
%P 1290--1298
%U https://proceedings.mlr.press/v51/toulis16.html
%V 51
%X Iterative procedures for parameter estimation based on stochastic gradient descent (SGD) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on SGD are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed averaged implicit SGD (AI-SGD). For statistical efficiency AI-SGD employs averaging of the iterates, which achieves the Cramer-Rao bound under strong convexity, i.e., it is asymptotically an optimal unbiased estimator of the true parameter value. For numerical stability AI-SGD employs an implicit update at each iteration, which is similar to updates performed by proximal operators in optimization. In practice, AI-SGD achieves competitive performance with state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates.

RIS


TY  - CPAPER
TI  - Towards Stability and Optimality in Stochastic Gradient Descent
AU  - Panos Toulis
AU  - Dustin Tran
AU  - Edo Airoldi
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-toulis16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 1290
EP  - 1298
L1  - http://proceedings.mlr.press/v51/toulis16.pdf
UR  - https://proceedings.mlr.press/v51/toulis16.html
AB  - Iterative procedures for parameter estimation based on stochastic gradient descent (SGD) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on SGD are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed averaged implicit SGD (AI-SGD). For statistical efficiency AI-SGD employs averaging of the iterates, which achieves the Cramer-Rao bound under strong convexity, i.e., it is asymptotically an optimal unbiased estimator of the true parameter value. For numerical stability AI-SGD employs an implicit update at each iteration, which is similar to updates performed by proximal operators in optimization. In practice, AI-SGD achieves competitive performance with state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates.
ER  -

APA


Toulis, P., Tran, D. & Airoldi, E.. (2016). Towards Stability and Optimality in Stochastic Gradient Descent. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:1290-1298 Available from https://proceedings.mlr.press/v51/toulis16.html.

Towards Stability and Optimality in Stochastic Gradient Descent

Abstract

Cite this Paper

Related Material