The Hidden Vulnerability of Distributed Learning in Byzantium

El Mahdi El Mhamdi, Rachid Guerraoui, Sébastien Rouault
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3521-3530, 2018.

Abstract

While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending poisoned gradients during the training phase. Some of these approaches have been proven Byzantine–resilient: they ensure the convergence of SGD despite the presence of a minority of adversarial workers. We show in this paper that convergence is not enough. In high dimension $d \gg 1$, an adver\-sary can build on the loss function’s non–convexity to make SGD converge to ineffective models. More precisely, we bring to light that existing Byzantine–resilient schemes leave a margin of poisoning of $\bigOmega\left(f(d)\right)$, where $f(d)$ increases at least like $\sqrt[p]{d }$. Based on this leeway, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR–10 and MNIST. We introduce Bulyan, and prove it significantly reduces the attackers leeway to a narrow $\bigO\,( \sfrac{1}{\sqrt{d }})$ bound. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence as if only non–Byzantine gradients had been used to update the model.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-mhamdi18a, title = {The Hidden Vulnerability of Distributed Learning in {B}yzantium}, author = {El Mhamdi, El Mahdi and Guerraoui, Rachid and Rouault, S{\'e}bastien}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {3521--3530}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/mhamdi18a/mhamdi18a.pdf}, url = {https://proceedings.mlr.press/v80/mhamdi18a.html}, abstract = {While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending poisoned gradients during the training phase. Some of these approaches have been proven Byzantine–resilient: they ensure the convergence of SGD despite the presence of a minority of adversarial workers. We show in this paper that convergence is not enough. In high dimension $d \gg 1$, an adver\-sary can build on the loss function’s non–convexity to make SGD converge to ineffective models. More precisely, we bring to light that existing Byzantine–resilient schemes leave a margin of poisoning of $\bigOmega\left(f(d)\right)$, where $f(d)$ increases at least like $\sqrt[p]{d }$. Based on this leeway, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR–10 and MNIST. We introduce Bulyan, and prove it significantly reduces the attackers leeway to a narrow $\bigO\,( \sfrac{1}{\sqrt{d }})$ bound. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence as if only non–Byzantine gradients had been used to update the model.} }
Endnote
%0 Conference Paper %T The Hidden Vulnerability of Distributed Learning in Byzantium %A El Mahdi El Mhamdi %A Rachid Guerraoui %A Sébastien Rouault %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-mhamdi18a %I PMLR %P 3521--3530 %U https://proceedings.mlr.press/v80/mhamdi18a.html %V 80 %X While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending poisoned gradients during the training phase. Some of these approaches have been proven Byzantine–resilient: they ensure the convergence of SGD despite the presence of a minority of adversarial workers. We show in this paper that convergence is not enough. In high dimension $d \gg 1$, an adver\-sary can build on the loss function’s non–convexity to make SGD converge to ineffective models. More precisely, we bring to light that existing Byzantine–resilient schemes leave a margin of poisoning of $\bigOmega\left(f(d)\right)$, where $f(d)$ increases at least like $\sqrt[p]{d }$. Based on this leeway, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR–10 and MNIST. We introduce Bulyan, and prove it significantly reduces the attackers leeway to a narrow $\bigO\,( \sfrac{1}{\sqrt{d }})$ bound. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence as if only non–Byzantine gradients had been used to update the model.
APA
El Mhamdi, E.M., Guerraoui, R. & Rouault, S.. (2018). The Hidden Vulnerability of Distributed Learning in Byzantium. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:3521-3530 Available from https://proceedings.mlr.press/v80/mhamdi18a.html.

Related Material