[edit]
Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:304-312, 2019.
Abstract
Accelerated coordinate descent is a widely popular optimization algorithm due to its efficiency on large-dimensional problems. It achieves state-of-the-art complexity on an important class of empirical risk minimization problems. In this paper we design and analyze an accelerated coordinate descent (\texttt{ACD}) method which in each iteration updates a random subset of coordinates according to an arbitrary but fixed probability law, which is a parameter of the method. While mini-batch variants of \texttt{ACD} are more popular and relevant in practice, there is no importance sampling for \texttt{ACD} that outperforms the standard uniform mini-batch sampling. Through insights enabled by our general analysis, we design new importance sampling for mini-batch \texttt{ACD} which significantly outperforms previous state-of-the-art minibatch \texttt{ACD} in practice. We prove a rate that is at most O(√τ) times worse than the rate of minibatch \texttt{ACD} with uniform sampling, but can be O(n/τ) times better, where τ is the minibatch size. Since in modern supervised learning training systems it is standard practice to choose τ≪n, and often τ=O(1), our method can lead to dramatic speedups. Lastly, we obtain similar results for minibatch nonaccelerated \texttt{CD} as well, achieving improvements on previous best rates.