[edit]

# The Hidden Vulnerability of Distributed Learning in Byzantium

*Proceedings of the 35th International Conference on Machine Learning*, PMLR 80:3521-3530, 2018.

#### Abstract

While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending

*poisoned*gradients during the training phase. Some of these approaches have been proven*Byzantine–resilient*: they ensure the*convergence*of SGD despite the presence of a minority of adversarial workers. We show in this paper that*convergence is not enough*. In high dimension $d \gg 1$, an adver\-sary can build on the loss function’s non–convexity to make SGD converge to*ineffective*models. More precisely, we bring to light that existing Byzantine–resilient schemes leave a*margin of poisoning*of $\bigOmega\left(f(d)\right)$, where $f(d)$ increases at least like $\sqrt[p]{d }$. Based on this*leeway*, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR–10 and MNIST. We introduce*Bulyan*, and prove it significantly reduces the attackers leeway to a narrow $\bigO\,( \sfrac{1}{\sqrt{d }})$ bound. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence*as if*only non–Byzantine gradients had been used to update the model.