Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches

Filip Hanzely; Peter Richtarik

Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches

Filip Hanzely, Peter Richtarik

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:304-312, 2019.

Abstract

Accelerated coordinate descent is a widely popular optimization algorithm due to its efficiency on large-dimensional problems. It achieves state-of-the-art complexity on an important class of empirical risk minimization problems. In this paper we design and analyze an accelerated coordinate descent (\texttt{ACD}) method which in each iteration updates a random subset of coordinates according to an arbitrary but fixed probability law, which is a parameter of the method. While mini-batch variants of \texttt{ACD} are more popular and relevant in practice, there is no importance sampling for \texttt{ACD} that outperforms the standard uniform mini-batch sampling. Through insights enabled by our general analysis, we design new importance sampling for mini-batch \texttt{ACD} which significantly outperforms previous state-of-the-art minibatch \texttt{ACD} in practice. We prove a rate that is at most

$\mathcal{O}(\sqrt{\tau})$ times worse than the rate of minibatch \texttt{ACD} with uniform sampling, but can be

$\mathcal{O}(n/\tau)$ times better, where

$\tau$ is the minibatch size. Since in modern supervised learning training systems it is standard practice to choose

$\tau \ll n$ , and often

$\tau=\mathcal{O}(1)$ , our method can lead to dramatic speedups. Lastly, we obtain similar results for minibatch nonaccelerated \texttt{CD} as well, achieving improvements on previous best rates.

Cite this Paper

BibTeX


@InProceedings{pmlr-v89-hanzely19a,
  title = 	 {Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches},
  author =       {Hanzely, Filip and Richtarik, Peter},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {304--312},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/hanzely19a/hanzely19a.pdf},
  url = 	 {https://proceedings.mlr.press/v89/hanzely19a.html},
  abstract = 	 {Accelerated coordinate descent is a widely popular optimization algorithm due to its efficiency on large-dimensional problems. It achieves state-of-the-art complexity on an important class of empirical risk minimization problems.  In this paper we design and analyze an accelerated coordinate descent (\texttt{ACD}) method which in each iteration updates a random subset of coordinates according to an arbitrary but fixed probability law, which is a parameter of the method.   While mini-batch variants of \texttt{ACD} are more popular and relevant in practice, there is no importance sampling for \texttt{ACD} that outperforms the standard uniform mini-batch sampling. Through insights enabled by our general analysis, we design new importance sampling for mini-batch \texttt{ACD} which significantly outperforms previous state-of-the-art minibatch \texttt{ACD} in practice. We prove a rate that is at most $\mathcal{O}(\sqrt{\tau})$ times worse than the rate of minibatch \texttt{ACD} with uniform sampling, but can be $\mathcal{O}(n/\tau)$ times better, where $\tau$ is the minibatch size. Since in modern supervised learning training systems it is standard practice to choose $\tau \ll n$, and often $\tau=\mathcal{O}(1)$, our method can lead to dramatic speedups. Lastly, we obtain similar results for minibatch nonaccelerated  \texttt{CD} as well, achieving improvements on previous best rates.}
}

Endnote

%0 Conference Paper
%T Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches
%A Filip Hanzely
%A Peter Richtarik
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-hanzely19a
%I PMLR
%P 304--312
%U https://proceedings.mlr.press/v89/hanzely19a.html
%V 89
%X Accelerated coordinate descent is a widely popular optimization algorithm due to its efficiency on large-dimensional problems. It achieves state-of-the-art complexity on an important class of empirical risk minimization problems.  In this paper we design and analyze an accelerated coordinate descent (\texttt{ACD}) method which in each iteration updates a random subset of coordinates according to an arbitrary but fixed probability law, which is a parameter of the method.   While mini-batch variants of \texttt{ACD} are more popular and relevant in practice, there is no importance sampling for \texttt{ACD} that outperforms the standard uniform mini-batch sampling. Through insights enabled by our general analysis, we design new importance sampling for mini-batch \texttt{ACD} which significantly outperforms previous state-of-the-art minibatch \texttt{ACD} in practice. We prove a rate that is at most $\mathcal{O}(\sqrt{\tau})$ times worse than the rate of minibatch \texttt{ACD} with uniform sampling, but can be $\mathcal{O}(n/\tau)$ times better, where $\tau$ is the minibatch size. Since in modern supervised learning training systems it is standard practice to choose $\tau \ll n$, and often $\tau=\mathcal{O}(1)$, our method can lead to dramatic speedups. Lastly, we obtain similar results for minibatch nonaccelerated  \texttt{CD} as well, achieving improvements on previous best rates.

APA


Hanzely, F. & Richtarik, P.. (2019). Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:304-312 Available from https://proceedings.mlr.press/v89/hanzely19a.html.

Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches

Abstract

Cite this Paper

Related Material