SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

Lam Nguyen, PHUONG HA NGUYEN, Marten Dijk, Peter Richtarik, Katya Scheinberg, Martin Takac
; Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3750-3758, 2018.

Abstract

Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al.,2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime, which results in more relaxed conditions than those in (Bottou et al.,2016). We then move on the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime, obtaining the first convergence results for this method in the case of diminished learning rate.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-nguyen18c, title = {{SGD} and Hogwild! {C}onvergence Without the Bounded Gradients Assumption}, author = {Nguyen, Lam and NGUYEN, PHUONG HA and van Dijk, Marten and Richtarik, Peter and Scheinberg, Katya and Takac, Martin}, pages = {3750--3758}, year = {2018}, editor = {Jennifer Dy and Andreas Krause}, volume = {80}, series = {Proceedings of Machine Learning Research}, address = {Stockholmsmässan, Stockholm Sweden}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/nguyen18c/nguyen18c.pdf}, url = {http://proceedings.mlr.press/v80/nguyen18c.html}, abstract = {Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al.,2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime, which results in more relaxed conditions than those in (Bottou et al.,2016). We then move on the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime, obtaining the first convergence results for this method in the case of diminished learning rate.} }
Endnote
%0 Conference Paper %T SGD and Hogwild! Convergence Without the Bounded Gradients Assumption %A Lam Nguyen %A PHUONG HA NGUYEN %A Marten Dijk %A Peter Richtarik %A Katya Scheinberg %A Martin Takac %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-nguyen18c %I PMLR %J Proceedings of Machine Learning Research %P 3750--3758 %U http://proceedings.mlr.press %V 80 %W PMLR %X Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al.,2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime, which results in more relaxed conditions than those in (Bottou et al.,2016). We then move on the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime, obtaining the first convergence results for this method in the case of diminished learning rate.
APA
Nguyen, L., NGUYEN, P.H., Dijk, M., Richtarik, P., Scheinberg, K. & Takac, M.. (2018). SGD and Hogwild! Convergence Without the Bounded Gradients Assumption. Proceedings of the 35th International Conference on Machine Learning, in PMLR 80:3750-3758

Related Material