Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance

Cong Xie; Sanmi Koyejo; Indranil Gupta

Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance

Cong Xie, Sanmi Koyejo, Indranil Gupta

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6893-6901, 2019.

Abstract

We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers. Zeno generalizes previous results that assumed a majority of non-faulty nodes; we need assume only one non-faulty worker. Our key idea is to suspect workers that are potentially defective. Since this is likely to lead to false positives, we use a ranking-based preference mechanism. We prove the convergence of SGD for non-convex problems under these scenarios. Experimental results show that Zeno outperforms existing approaches.

Cite this Paper

BibTeX


@InProceedings{pmlr-v97-xie19b,
  title = 	 {Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance},
  author =       {Xie, Cong and Koyejo, Sanmi and Gupta, Indranil},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {6893--6901},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/xie19b/xie19b.pdf},
  url = 	 {https://proceedings.mlr.press/v97/xie19b.html},
  abstract = 	 {We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers. Zeno generalizes previous results that assumed a majority of non-faulty nodes; we need assume only one non-faulty worker. Our key idea is to suspect workers that are potentially defective. Since this is likely to lead to false positives, we use a ranking-based preference mechanism. We prove the convergence of SGD for non-convex problems under these scenarios. Experimental results show that Zeno outperforms existing approaches.}
}

Endnote

%0 Conference Paper
%T Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance
%A Cong Xie
%A Sanmi Koyejo
%A Indranil Gupta
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-xie19b
%I PMLR
%P 6893--6901
%U https://proceedings.mlr.press/v97/xie19b.html
%V 97
%X We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers. Zeno generalizes previous results that assumed a majority of non-faulty nodes; we need assume only one non-faulty worker. Our key idea is to suspect workers that are potentially defective. Since this is likely to lead to false positives, we use a ranking-based preference mechanism. We prove the convergence of SGD for non-convex problems under these scenarios. Experimental results show that Zeno outperforms existing approaches.

APA


Xie, C., Koyejo, S. & Gupta, I.. (2019). Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:6893-6901 Available from https://proceedings.mlr.press/v97/xie19b.html.

Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance

Abstract

Cite this Paper

Related Material