Towards Stability and Optimality in Stochastic Gradient Descent

Panos Toulis, Dustin Tran, Edo Airoldi
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1290-1298, 2016.

Abstract

Iterative procedures for parameter estimation based on stochastic gradient descent (SGD) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on SGD are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed averaged implicit SGD (AI-SGD). For statistical efficiency AI-SGD employs averaging of the iterates, which achieves the Cramer-Rao bound under strong convexity, i.e., it is asymptotically an optimal unbiased estimator of the true parameter value. For numerical stability AI-SGD employs an implicit update at each iteration, which is similar to updates performed by proximal operators in optimization. In practice, AI-SGD achieves competitive performance with state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-toulis16, title = {Towards Stability and Optimality in Stochastic Gradient Descent}, author = {Toulis, Panos and Tran, Dustin and Airoldi, Edo}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {1290--1298}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/toulis16.pdf}, url = {https://proceedings.mlr.press/v51/toulis16.html}, abstract = {Iterative procedures for parameter estimation based on stochastic gradient descent (SGD) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on SGD are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed averaged implicit SGD (AI-SGD). For statistical efficiency AI-SGD employs averaging of the iterates, which achieves the Cramer-Rao bound under strong convexity, i.e., it is asymptotically an optimal unbiased estimator of the true parameter value. For numerical stability AI-SGD employs an implicit update at each iteration, which is similar to updates performed by proximal operators in optimization. In practice, AI-SGD achieves competitive performance with state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates.} }
Endnote
%0 Conference Paper %T Towards Stability and Optimality in Stochastic Gradient Descent %A Panos Toulis %A Dustin Tran %A Edo Airoldi %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-toulis16 %I PMLR %P 1290--1298 %U https://proceedings.mlr.press/v51/toulis16.html %V 51 %X Iterative procedures for parameter estimation based on stochastic gradient descent (SGD) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on SGD are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed averaged implicit SGD (AI-SGD). For statistical efficiency AI-SGD employs averaging of the iterates, which achieves the Cramer-Rao bound under strong convexity, i.e., it is asymptotically an optimal unbiased estimator of the true parameter value. For numerical stability AI-SGD employs an implicit update at each iteration, which is similar to updates performed by proximal operators in optimization. In practice, AI-SGD achieves competitive performance with state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates.
RIS
TY - CPAPER TI - Towards Stability and Optimality in Stochastic Gradient Descent AU - Panos Toulis AU - Dustin Tran AU - Edo Airoldi BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-toulis16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 1290 EP - 1298 L1 - http://proceedings.mlr.press/v51/toulis16.pdf UR - https://proceedings.mlr.press/v51/toulis16.html AB - Iterative procedures for parameter estimation based on stochastic gradient descent (SGD) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on SGD are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed averaged implicit SGD (AI-SGD). For statistical efficiency AI-SGD employs averaging of the iterates, which achieves the Cramer-Rao bound under strong convexity, i.e., it is asymptotically an optimal unbiased estimator of the true parameter value. For numerical stability AI-SGD employs an implicit update at each iteration, which is similar to updates performed by proximal operators in optimization. In practice, AI-SGD achieves competitive performance with state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates. ER -
APA
Toulis, P., Tran, D. & Airoldi, E.. (2016). Towards Stability and Optimality in Stochastic Gradient Descent. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:1290-1298 Available from https://proceedings.mlr.press/v51/toulis16.html.

Related Material