Communication-Computation Efficient Gradient Coding

Min Ye, Emmanuel Abbe
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5610-5619, 2018.

Abstract

This paper develops coding techniques to reduce the running time of distributed learning tasks. It characterizes the fundamental tradeoff to compute gradients in terms of three parameters: computation load, straggler tolerance and communication cost. It further gives an explicit coding scheme that achieves the optimal tradeoff based on recursive polynomial constructions, coding both across data subsets and vector components. As a result, the proposed scheme allows to minimize the running time for gradient computations. Implementations are made on Amazon EC2 clusters using Python with mpi4py package. Results show that the proposed scheme maintains the same generalization error while reducing the running time by $32%$ compared to uncoded schemes and $23%$ compared to prior coded schemes focusing only on stragglers (Tandon et al., ICML 2017).

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-ye18a, title = {Communication-Computation Efficient Gradient Coding}, author = {Ye, Min and Abbe, Emmanuel}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {5610--5619}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/ye18a/ye18a.pdf}, url = {https://proceedings.mlr.press/v80/ye18a.html}, abstract = {This paper develops coding techniques to reduce the running time of distributed learning tasks. It characterizes the fundamental tradeoff to compute gradients in terms of three parameters: computation load, straggler tolerance and communication cost. It further gives an explicit coding scheme that achieves the optimal tradeoff based on recursive polynomial constructions, coding both across data subsets and vector components. As a result, the proposed scheme allows to minimize the running time for gradient computations. Implementations are made on Amazon EC2 clusters using Python with mpi4py package. Results show that the proposed scheme maintains the same generalization error while reducing the running time by $32%$ compared to uncoded schemes and $23%$ compared to prior coded schemes focusing only on stragglers (Tandon et al., ICML 2017).} }
Endnote
%0 Conference Paper %T Communication-Computation Efficient Gradient Coding %A Min Ye %A Emmanuel Abbe %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-ye18a %I PMLR %P 5610--5619 %U https://proceedings.mlr.press/v80/ye18a.html %V 80 %X This paper develops coding techniques to reduce the running time of distributed learning tasks. It characterizes the fundamental tradeoff to compute gradients in terms of three parameters: computation load, straggler tolerance and communication cost. It further gives an explicit coding scheme that achieves the optimal tradeoff based on recursive polynomial constructions, coding both across data subsets and vector components. As a result, the proposed scheme allows to minimize the running time for gradient computations. Implementations are made on Amazon EC2 clusters using Python with mpi4py package. Results show that the proposed scheme maintains the same generalization error while reducing the running time by $32%$ compared to uncoded schemes and $23%$ compared to prior coded schemes focusing only on stragglers (Tandon et al., ICML 2017).
APA
Ye, M. & Abbe, E.. (2018). Communication-Computation Efficient Gradient Coding. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5610-5619 Available from https://proceedings.mlr.press/v80/ye18a.html.

Related Material