Proximal Splitting Meets Variance Reduction
[edit]
Proceedings of Machine Learning Research, PMLR 89:110, 2019.
Abstract
Despite the raise to fame of stochastic variance reduced methods like SAGA and ProxSVRG, their use in nonsmooth optimization is still limited to a few simple cases. Existing methods require to compute the proximal operator of the nonsmooth term at each iteration, which, for complex penalties like the total variation, overlapping group lasso or trend filtering, is an iterative process that becomes unfeasible for moderately large problems. In this work we propose and analyze VRTOS, a variancereduced method to solve problems with an arbitrary number of nonsmooth terms. Like other variance reduced methods, it only requires to evaluate one gradient per iteration and converges with a constant step size, and so is ideally suited for large scale applications. Unlike existing variance reduced methods, it admits multiple nonsmooth terms whose proximal operator only needs to be evaluated once per iteration. We provide a convergence rate analysis for the proposed methods that achieves the same asymptotic rate as their full gradient variants and illustrate its computational advantage on 4 different large scale datasets.
Related Material


