[edit]
Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:304-312, 2019.
Abstract
Accelerated coordinate descent is a widely popular optimization algorithm due to its efficiency on large-dimensional problems. It achieves state-of-the-art complexity on an important class of empirical risk minimization problems. In this paper we design and analyze an accelerated coordinate descent (\texttt{ACD}) method which in each iteration updates a random subset of coordinates according to an arbitrary but fixed probability law, which is a parameter of the method. While mini-batch variants of \texttt{ACD} are more popular and relevant in practice, there is no importance sampling for \texttt{ACD} that outperforms the standard uniform mini-batch sampling. Through insights enabled by our general analysis, we design new importance sampling for mini-batch \texttt{ACD} which significantly outperforms previous state-of-the-art minibatch \texttt{ACD} in practice. We prove a rate that is at most $\mathcal{O}(\sqrt{\tau})$ times worse than the rate of minibatch \texttt{ACD} with uniform sampling, but can be $\mathcal{O}(n/\tau)$ times better, where $\tau$ is the minibatch size. Since in modern supervised learning training systems it is standard practice to choose $\tau \ll n$, and often $\tau=\mathcal{O}(1)$, our method can lead to dramatic speedups. Lastly, we obtain similar results for minibatch nonaccelerated \texttt{CD} as well, achieving improvements on previous best rates.