Nonconvex Variance Reduced Optimization with Arbitrary Sampling

[edit]

Samuel Horváth, Peter Richtarik ;
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2781-2789, 2019.

Abstract

We provide the first importance sampling variants of variance reduced algorithms for empirical risk minimization with non-convex loss functions. In particular, we analyze non-convex versions of \texttt{SVRG}, \texttt{SAGA} and \texttt{SARAH}. Our methods have the capacity to speed up the training process by an order of magnitude compared to the state of the art on real datasets. Moreover, we also improve upon current mini-batch analysis of these methods by proposing importance sampling for minibatches in this setting. Surprisingly, our approach can in some regimes lead to superlinear speedup with respect to the minibatch size, which is not usually present in stochastic optimization. All the above results follow from a general analysis of the methods which works with arbitrary sampling, i.e., fully general randomized strategy for the selection of subsets of examples to be sampled in each iteration. Finally, we also perform a novel importance sampling analysis of \texttt{SARAH} in the convex setting.

Related Material