Zeno++: Robust Fully Asynchronous SGD

Cong Xie, Sanmi Koyejo, Indranil Gupta
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:10495-10503, 2020.

Abstract

We propose Zeno++, a new robust asynchronous Stochastic Gradient Descent(SGD) procedure, intended to tolerate Byzantine failures of workers. In contrast to previous work, Zeno++ removes several unrealistic restrictions on worker-server communication, now allowing for fully asynchronous updates from anonymous workers, for arbitrarily stale worker updates, and for the possibility of an unbounded number of Byzantine workers. The key idea is to estimate the descent of the loss value after the candidate gradient is applied, where large descent values indicate that the update results in optimization progress. We prove the convergence of Zeno++ for non-convex problems under Byzantine failures. Experimental results show that Zeno++ outperforms existing Byzantine-tolerant asynchronous SGD algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-xie20c, title = {Zeno++: Robust Fully Asynchronous {SGD}}, author = {Xie, Cong and Koyejo, Sanmi and Gupta, Indranil}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {10495--10503}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/xie20c/xie20c.pdf}, url = {https://proceedings.mlr.press/v119/xie20c.html}, abstract = {We propose Zeno++, a new robust asynchronous Stochastic Gradient Descent(SGD) procedure, intended to tolerate Byzantine failures of workers. In contrast to previous work, Zeno++ removes several unrealistic restrictions on worker-server communication, now allowing for fully asynchronous updates from anonymous workers, for arbitrarily stale worker updates, and for the possibility of an unbounded number of Byzantine workers. The key idea is to estimate the descent of the loss value after the candidate gradient is applied, where large descent values indicate that the update results in optimization progress. We prove the convergence of Zeno++ for non-convex problems under Byzantine failures. Experimental results show that Zeno++ outperforms existing Byzantine-tolerant asynchronous SGD algorithms.} }
Endnote
%0 Conference Paper %T Zeno++: Robust Fully Asynchronous SGD %A Cong Xie %A Sanmi Koyejo %A Indranil Gupta %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-xie20c %I PMLR %P 10495--10503 %U https://proceedings.mlr.press/v119/xie20c.html %V 119 %X We propose Zeno++, a new robust asynchronous Stochastic Gradient Descent(SGD) procedure, intended to tolerate Byzantine failures of workers. In contrast to previous work, Zeno++ removes several unrealistic restrictions on worker-server communication, now allowing for fully asynchronous updates from anonymous workers, for arbitrarily stale worker updates, and for the possibility of an unbounded number of Byzantine workers. The key idea is to estimate the descent of the loss value after the candidate gradient is applied, where large descent values indicate that the update results in optimization progress. We prove the convergence of Zeno++ for non-convex problems under Byzantine failures. Experimental results show that Zeno++ outperforms existing Byzantine-tolerant asynchronous SGD algorithms.
APA
Xie, C., Koyejo, S. & Gupta, I.. (2020). Zeno++: Robust Fully Asynchronous SGD. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:10495-10503 Available from https://proceedings.mlr.press/v119/xie20c.html.

Related Material