Byzantine Machine Learning Made Easy By Resilient Averaging of Momentums

Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot, John Stephan
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:6246-6283, 2022.

Abstract

Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a.k.a., Byzantine) workers. Although a myriad of techniques addressing the problem have been proposed, the field arguably rests on fragile foundations. These techniques are hard to prove correct and rely on assumptions that are (a) quite unrealistic, i.e., often violated in practice, and (b) heterogeneous, i.e., making it difficult to compare approaches. We present RESAM (RESilient Averaging of Momentums), a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions. Our framework is mainly composed of two operators: resilient averaging at the server and distributed momentum at the workers. We prove a general theorem stating the convergence of distributed SGD under RESAM. Interestingly, demonstrating and comparing the convergence of many existing techniques become direct corollaries of our theorem, without resorting to stringent assumptions. We also present an empirical evaluation of the practical relevance of RESAM.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-farhadkhani22a, title = {{B}yzantine Machine Learning Made Easy By Resilient Averaging of Momentums}, author = {Farhadkhani, Sadegh and Guerraoui, Rachid and Gupta, Nirupam and Pinot, Rafael and Stephan, John}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {6246--6283}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/farhadkhani22a/farhadkhani22a.pdf}, url = {https://proceedings.mlr.press/v162/farhadkhani22a.html}, abstract = {Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a.k.a., Byzantine) workers. Although a myriad of techniques addressing the problem have been proposed, the field arguably rests on fragile foundations. These techniques are hard to prove correct and rely on assumptions that are (a) quite unrealistic, i.e., often violated in practice, and (b) heterogeneous, i.e., making it difficult to compare approaches. We present RESAM (RESilient Averaging of Momentums), a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions. Our framework is mainly composed of two operators: resilient averaging at the server and distributed momentum at the workers. We prove a general theorem stating the convergence of distributed SGD under RESAM. Interestingly, demonstrating and comparing the convergence of many existing techniques become direct corollaries of our theorem, without resorting to stringent assumptions. We also present an empirical evaluation of the practical relevance of RESAM.} }
Endnote
%0 Conference Paper %T Byzantine Machine Learning Made Easy By Resilient Averaging of Momentums %A Sadegh Farhadkhani %A Rachid Guerraoui %A Nirupam Gupta %A Rafael Pinot %A John Stephan %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-farhadkhani22a %I PMLR %P 6246--6283 %U https://proceedings.mlr.press/v162/farhadkhani22a.html %V 162 %X Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a.k.a., Byzantine) workers. Although a myriad of techniques addressing the problem have been proposed, the field arguably rests on fragile foundations. These techniques are hard to prove correct and rely on assumptions that are (a) quite unrealistic, i.e., often violated in practice, and (b) heterogeneous, i.e., making it difficult to compare approaches. We present RESAM (RESilient Averaging of Momentums), a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions. Our framework is mainly composed of two operators: resilient averaging at the server and distributed momentum at the workers. We prove a general theorem stating the convergence of distributed SGD under RESAM. Interestingly, demonstrating and comparing the convergence of many existing techniques become direct corollaries of our theorem, without resorting to stringent assumptions. We also present an empirical evaluation of the practical relevance of RESAM.
APA
Farhadkhani, S., Guerraoui, R., Gupta, N., Pinot, R. & Stephan, J.. (2022). Byzantine Machine Learning Made Easy By Resilient Averaging of Momentums. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:6246-6283 Available from https://proceedings.mlr.press/v162/farhadkhani22a.html.

Related Material