On the Convergence of SARAH and Beyond

Bingcong Li, Meng Ma, Georgios B. Giannakis
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:223-233, 2020.

Abstract

The main theme of this work is a unifying algorithm, \textbf{L}oop\textbf{L}ess \textbf{S}ARAH (L2S) for problems formulated as summation of $n$ individual loss functions. L2S broadens a recently developed variance reduction method known as SARAH. To find an $\epsilon$-accurate solution, L2S enjoys a complexity of ${\cal O}\big( (n+\kappa) \ln (1/\epsilon)\big)$ for strongly convex problems. For convex problems, when adopting an $n$-dependent step size, the complexity of L2S is ${\cal O}(n+ \sqrt{n}/\epsilon)$; while for more frequently adopted $n$-independent step size, the complexity is ${\cal O}(n+ n/\epsilon)$. Distinct from SARAH, our theoretical findings support an $n$-independent step size in convex problems without extra assumptions. For nonconvex problems, the complexity of L2S is ${\cal O}(n+ \sqrt{n}/\epsilon)$. Our numerical tests on neural networks suggest that L2S can have better generalization properties than SARAH. Along with L2S, our side results include the linear convergence of the last iteration for SARAH in strongly convex problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-li20a, title = {On the Convergence of SARAH and Beyond}, author = {Li, Bingcong and Ma, Meng and Giannakis, Georgios B.}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {223--233}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/li20a/li20a.pdf}, url = { http://proceedings.mlr.press/v108/li20a.html }, abstract = {The main theme of this work is a unifying algorithm, \textbf{L}oop\textbf{L}ess \textbf{S}ARAH (L2S) for problems formulated as summation of $n$ individual loss functions. L2S broadens a recently developed variance reduction method known as SARAH. To find an $\epsilon$-accurate solution, L2S enjoys a complexity of ${\cal O}\big( (n+\kappa) \ln (1/\epsilon)\big)$ for strongly convex problems. For convex problems, when adopting an $n$-dependent step size, the complexity of L2S is ${\cal O}(n+ \sqrt{n}/\epsilon)$; while for more frequently adopted $n$-independent step size, the complexity is ${\cal O}(n+ n/\epsilon)$. Distinct from SARAH, our theoretical findings support an $n$-independent step size in convex problems without extra assumptions. For nonconvex problems, the complexity of L2S is ${\cal O}(n+ \sqrt{n}/\epsilon)$. Our numerical tests on neural networks suggest that L2S can have better generalization properties than SARAH. Along with L2S, our side results include the linear convergence of the last iteration for SARAH in strongly convex problems. } }
Endnote
%0 Conference Paper %T On the Convergence of SARAH and Beyond %A Bingcong Li %A Meng Ma %A Georgios B. Giannakis %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-li20a %I PMLR %P 223--233 %U http://proceedings.mlr.press/v108/li20a.html %V 108 %X The main theme of this work is a unifying algorithm, \textbf{L}oop\textbf{L}ess \textbf{S}ARAH (L2S) for problems formulated as summation of $n$ individual loss functions. L2S broadens a recently developed variance reduction method known as SARAH. To find an $\epsilon$-accurate solution, L2S enjoys a complexity of ${\cal O}\big( (n+\kappa) \ln (1/\epsilon)\big)$ for strongly convex problems. For convex problems, when adopting an $n$-dependent step size, the complexity of L2S is ${\cal O}(n+ \sqrt{n}/\epsilon)$; while for more frequently adopted $n$-independent step size, the complexity is ${\cal O}(n+ n/\epsilon)$. Distinct from SARAH, our theoretical findings support an $n$-independent step size in convex problems without extra assumptions. For nonconvex problems, the complexity of L2S is ${\cal O}(n+ \sqrt{n}/\epsilon)$. Our numerical tests on neural networks suggest that L2S can have better generalization properties than SARAH. Along with L2S, our side results include the linear convergence of the last iteration for SARAH in strongly convex problems.
APA
Li, B., Ma, M. & Giannakis, G.B.. (2020). On the Convergence of SARAH and Beyond. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:223-233 Available from http://proceedings.mlr.press/v108/li20a.html .

Related Material