Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Nhuong Nguyen, Toan Nguyen, PHUONG HA NGUYEN, Quoc Tran-Dinh, Lam Nguyen, Marten van Dijk
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1207-1215, 2021.

Abstract

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way – and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central “aggregator” which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show O(K^{0.5}) communication rounds for heterogeneous data for strongly convex problems, where K is the total number of gradient computations across all local compute nodes. For our scheme, we prove a tight and novel non-trivial convergence analysis for strongly convex problems for heterogeneous data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-nguyen21a, title = { Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes }, author = {Nguyen, Nhuong and Nguyen, Toan and HA NGUYEN, PHUONG and Tran-Dinh, Quoc and Nguyen, Lam and van Dijk, Marten}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {1207--1215}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/nguyen21a/nguyen21a.pdf}, url = {https://proceedings.mlr.press/v130/nguyen21a.html}, abstract = { Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way – and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central “aggregator” which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show O(K^{0.5}) communication rounds for heterogeneous data for strongly convex problems, where K is the total number of gradient computations across all local compute nodes. For our scheme, we prove a tight and novel non-trivial convergence analysis for strongly convex problems for heterogeneous data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets. } }
Endnote
%0 Conference Paper %T Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes %A Nhuong Nguyen %A Toan Nguyen %A PHUONG HA NGUYEN %A Quoc Tran-Dinh %A Lam Nguyen %A Marten van Dijk %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-nguyen21a %I PMLR %P 1207--1215 %U https://proceedings.mlr.press/v130/nguyen21a.html %V 130 %X Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way – and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central “aggregator” which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show O(K^{0.5}) communication rounds for heterogeneous data for strongly convex problems, where K is the total number of gradient computations across all local compute nodes. For our scheme, we prove a tight and novel non-trivial convergence analysis for strongly convex problems for heterogeneous data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets.
APA
Nguyen, N., Nguyen, T., HA NGUYEN, P., Tran-Dinh, Q., Nguyen, L. & van Dijk, M.. (2021). Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1207-1215 Available from https://proceedings.mlr.press/v130/nguyen21a.html.

Related Material