Gradient Coding: Avoiding Stragglers in Distributed Learning

Rashish Tandon; Qi Lei; Alexandros G. Dimakis; Nikos Karampatziakis

Gradient Coding: Avoiding Stragglers in Distributed Learning

Rashish Tandon, Qi Lei, Alexandros G. Dimakis, Nikos Karampatziakis

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3368-3376, 2017.

Abstract

We propose a novel coding theoretic framework for mitigating stragglers in distributed learning. We show how carefully replicating data blocks and coding across gradients can provide tolerance to failures and stragglers for synchronous Gradient Descent. We implement our schemes in python (using MPI) to run on Amazon EC2, and show how we compare against baseline approaches in running time and generalization error.

Cite this Paper

BibTeX

@InProceedings{pmlr-v70-tandon17a,
  title = 	 {Gradient Coding: Avoiding Stragglers in Distributed Learning},
  author =       {Rashish Tandon and Qi Lei and Alexandros G. Dimakis and Nikos Karampatziakis},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {3368--3376},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/tandon17a/tandon17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/tandon17a.html},
  abstract = 	 {We propose a novel coding theoretic framework for mitigating stragglers in distributed learning. We show how carefully replicating data blocks and coding across gradients can provide tolerance to failures and stragglers for synchronous Gradient Descent. We implement our schemes in python (using MPI) to run on Amazon EC2, and show how we compare against baseline approaches in running time and generalization error.}
}

Endnote

%0 Conference Paper
%T Gradient Coding: Avoiding Stragglers in Distributed Learning
%A Rashish Tandon
%A Qi Lei
%A Alexandros G. Dimakis
%A Nikos Karampatziakis
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-tandon17a
%I PMLR
%P 3368--3376
%U https://proceedings.mlr.press/v70/tandon17a.html
%V 70
%X We propose a novel coding theoretic framework for mitigating stragglers in distributed learning. We show how carefully replicating data blocks and coding across gradients can provide tolerance to failures and stragglers for synchronous Gradient Descent. We implement our schemes in python (using MPI) to run on Amazon EC2, and show how we compare against baseline approaches in running time and generalization error.

APA

Tandon, R., Lei, Q., Dimakis, A.G. & Karampatziakis, N.. (2017). Gradient Coding: Avoiding Stragglers in Distributed Learning. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:3368-3376 Available from https://proceedings.mlr.press/v70/tandon17a.html.

Gradient Coding: Avoiding Stragglers in Distributed Learning

Abstract

Cite this Paper

Related Material