Communication-Efficient Distributed Optimization with Quantized Preconditioners

Foivos Alimisis, Peter Davies, Dan Alistarh
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:196-206, 2021.

Abstract

We investigate fast and communication-efficient algorithms for the classic problem of minimizing a sum of strongly convex and smooth functions that are distributed among $n$ different nodes, which can communicate using a limited number of bits. Most previous communication-efficient approaches for this problem are limited to first-order optimization, and therefore have \emph{linear} dependence on the condition number in their communication complexity. We show that this dependence is not inherent: communication-efficient methods can in fact have sublinear dependence on the condition number. For this, we design and analyze the first communication-efficient distributed variants of preconditioned gradient descent for Generalized Linear Models, and for Newton’s method. Our results rely on a new technique for quantizing both the preconditioner and the descent direction at each step of the algorithms, while controlling their convergence rate. We also validate our findings experimentally, showing faster convergence and reduced communication relative to previous methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-alimisis21a, title = {Communication-Efficient Distributed Optimization with Quantized Preconditioners}, author = {Alimisis, Foivos and Davies, Peter and Alistarh, Dan}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {196--206}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/alimisis21a/alimisis21a.pdf}, url = {https://proceedings.mlr.press/v139/alimisis21a.html}, abstract = {We investigate fast and communication-efficient algorithms for the classic problem of minimizing a sum of strongly convex and smooth functions that are distributed among $n$ different nodes, which can communicate using a limited number of bits. Most previous communication-efficient approaches for this problem are limited to first-order optimization, and therefore have \emph{linear} dependence on the condition number in their communication complexity. We show that this dependence is not inherent: communication-efficient methods can in fact have sublinear dependence on the condition number. For this, we design and analyze the first communication-efficient distributed variants of preconditioned gradient descent for Generalized Linear Models, and for Newton’s method. Our results rely on a new technique for quantizing both the preconditioner and the descent direction at each step of the algorithms, while controlling their convergence rate. We also validate our findings experimentally, showing faster convergence and reduced communication relative to previous methods.} }
Endnote
%0 Conference Paper %T Communication-Efficient Distributed Optimization with Quantized Preconditioners %A Foivos Alimisis %A Peter Davies %A Dan Alistarh %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-alimisis21a %I PMLR %P 196--206 %U https://proceedings.mlr.press/v139/alimisis21a.html %V 139 %X We investigate fast and communication-efficient algorithms for the classic problem of minimizing a sum of strongly convex and smooth functions that are distributed among $n$ different nodes, which can communicate using a limited number of bits. Most previous communication-efficient approaches for this problem are limited to first-order optimization, and therefore have \emph{linear} dependence on the condition number in their communication complexity. We show that this dependence is not inherent: communication-efficient methods can in fact have sublinear dependence on the condition number. For this, we design and analyze the first communication-efficient distributed variants of preconditioned gradient descent for Generalized Linear Models, and for Newton’s method. Our results rely on a new technique for quantizing both the preconditioner and the descent direction at each step of the algorithms, while controlling their convergence rate. We also validate our findings experimentally, showing faster convergence and reduced communication relative to previous methods.
APA
Alimisis, F., Davies, P. & Alistarh, D.. (2021). Communication-Efficient Distributed Optimization with Quantized Preconditioners. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:196-206 Available from https://proceedings.mlr.press/v139/alimisis21a.html.

Related Material