LocalNewton: Reducing communication rounds for distributed learning

Vipul Gupta, Avishek Ghosh, Michał Dereziński, Rajiv Khanna, Kannan Ramchandran, Michael W. Mahoney
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, PMLR 161:632-642, 2021.

Abstract

To address the communication bottleneck problem in distributed optimization within a master-worker framework, we propose LocalNewton, a distributed second-order algorithm with local averaging. In LocalNewton, the worker machines update their model in every iteration by finding a suitable second-order descent direction using only the data and model stored in their own local memory. We let the workers run multiple such iterations locally and communicate the models to the master node only once every few (say $L$) iterations. LocalNewton is highly practical since it requires only one hyperparameter, the number $L$ of local iterations. We use novel matrix concentration based techniques to obtain theoretical guarantees for LocalNewton, and we validate them with detailed empirical evaluation. To enhance practicability, we devise an adaptive scheme to choose $L$, and we show that this reduces the number of local iterations in worker machines between two model synchronizations as the training proceeds, successively refining the model quality at the master. Via extensive experiments using several real-world datasets with AWS Lambda workers and an AWS EC2 master, we show that LocalNewton requires fewer than $60%$ of the communication rounds (between master and workers) and less than $40%$ of the end-to-end running time, compared to state-of-the-art algorithms, to reach the same training loss.

Cite this Paper


BibTeX
@InProceedings{pmlr-v161-gupta21a, title = {LocalNewton: Reducing communication rounds for distributed learning}, author = {Gupta, Vipul and Ghosh, Avishek and Derezi\'nski, Micha{\l} and Khanna, Rajiv and Ramchandran, Kannan and Mahoney, Michael W.}, booktitle = {Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence}, pages = {632--642}, year = {2021}, editor = {de Campos, Cassio and Maathuis, Marloes H.}, volume = {161}, series = {Proceedings of Machine Learning Research}, month = {27--30 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v161/gupta21a/gupta21a.pdf}, url = {https://proceedings.mlr.press/v161/gupta21a.html}, abstract = {To address the communication bottleneck problem in distributed optimization within a master-worker framework, we propose LocalNewton, a distributed second-order algorithm with local averaging. In LocalNewton, the worker machines update their model in every iteration by finding a suitable second-order descent direction using only the data and model stored in their own local memory. We let the workers run multiple such iterations locally and communicate the models to the master node only once every few (say $L$) iterations. LocalNewton is highly practical since it requires only one hyperparameter, the number $L$ of local iterations. We use novel matrix concentration based techniques to obtain theoretical guarantees for LocalNewton, and we validate them with detailed empirical evaluation. To enhance practicability, we devise an adaptive scheme to choose $L$, and we show that this reduces the number of local iterations in worker machines between two model synchronizations as the training proceeds, successively refining the model quality at the master. Via extensive experiments using several real-world datasets with AWS Lambda workers and an AWS EC2 master, we show that LocalNewton requires fewer than $60%$ of the communication rounds (between master and workers) and less than $40%$ of the end-to-end running time, compared to state-of-the-art algorithms, to reach the same training loss.} }
Endnote
%0 Conference Paper %T LocalNewton: Reducing communication rounds for distributed learning %A Vipul Gupta %A Avishek Ghosh %A Michał Dereziński %A Rajiv Khanna %A Kannan Ramchandran %A Michael W. Mahoney %B Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2021 %E Cassio de Campos %E Marloes H. Maathuis %F pmlr-v161-gupta21a %I PMLR %P 632--642 %U https://proceedings.mlr.press/v161/gupta21a.html %V 161 %X To address the communication bottleneck problem in distributed optimization within a master-worker framework, we propose LocalNewton, a distributed second-order algorithm with local averaging. In LocalNewton, the worker machines update their model in every iteration by finding a suitable second-order descent direction using only the data and model stored in their own local memory. We let the workers run multiple such iterations locally and communicate the models to the master node only once every few (say $L$) iterations. LocalNewton is highly practical since it requires only one hyperparameter, the number $L$ of local iterations. We use novel matrix concentration based techniques to obtain theoretical guarantees for LocalNewton, and we validate them with detailed empirical evaluation. To enhance practicability, we devise an adaptive scheme to choose $L$, and we show that this reduces the number of local iterations in worker machines between two model synchronizations as the training proceeds, successively refining the model quality at the master. Via extensive experiments using several real-world datasets with AWS Lambda workers and an AWS EC2 master, we show that LocalNewton requires fewer than $60%$ of the communication rounds (between master and workers) and less than $40%$ of the end-to-end running time, compared to state-of-the-art algorithms, to reach the same training loss.
APA
Gupta, V., Ghosh, A., Dereziński, M., Khanna, R., Ramchandran, K. & Mahoney, M.W.. (2021). LocalNewton: Reducing communication rounds for distributed learning. Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 161:632-642 Available from https://proceedings.mlr.press/v161/gupta21a.html.

Related Material