Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches
[edit]
Proceedings of Machine Learning Research, PMLR 89:304312, 2019.
Abstract
Accelerated coordinate descent is a widely popular optimization algorithm due to its efficiency on largedimensional problems. It achieves stateoftheart complexity on an important class of empirical risk minimization problems. In this paper we design and analyze an accelerated coordinate descent (\texttt{ACD}) method which in each iteration updates a random subset of coordinates according to an arbitrary but fixed probability law, which is a parameter of the method. While minibatch variants of \texttt{ACD} are more popular and relevant in practice, there is no importance sampling for \texttt{ACD} that outperforms the standard uniform minibatch sampling. Through insights enabled by our general analysis, we design new importance sampling for minibatch \texttt{ACD} which significantly outperforms previous stateoftheart minibatch \texttt{ACD} in practice. We prove a rate that is at most $\mathcal{O}(\sqrt{\tau})$ times worse than the rate of minibatch \texttt{ACD} with uniform sampling, but can be $\mathcal{O}(n/\tau)$ times better, where $\tau$ is the minibatch size. Since in modern supervised learning training systems it is standard practice to choose $\tau \ll n$, and often $\tau=\mathcal{O}(1)$, our method can lead to dramatic speedups. Lastly, we obtain similar results for minibatch nonaccelerated \texttt{CD} as well, achieving improvements on previous best rates.
Related Material


