Robust Collaborative Learning with Linear Gradient Overhead

Sadegh Farhadkhani; Rachid Guerraoui; Nirupam Gupta; Lê-Nguyên Hoang; Rafael Pinot; John Stephan

Robust Collaborative Learning with Linear Gradient Overhead

Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Lê-Nguyên Hoang, Rafael Pinot, John Stephan

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:9761-9813, 2023.

Abstract

Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak’s momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of

$(\alpha, \lambda)$ -reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https://github.com/LPD-EPFL/robust-collaborative-learning.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-farhadkhani23a,
  title = 	 {Robust Collaborative Learning with Linear Gradient Overhead},
  author =       {Farhadkhani, Sadegh and Guerraoui, Rachid and Gupta, Nirupam and Hoang, L\^{e}-Nguy\^{e}n and Pinot, Rafael and Stephan, John},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {9761--9813},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/farhadkhani23a/farhadkhani23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/farhadkhani23a.html},
  abstract = 	 {Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak’s momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of $(\alpha, \lambda)$-reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https://github.com/LPD-EPFL/robust-collaborative-learning.}
}

Endnote

%0 Conference Paper
%T Robust Collaborative Learning with Linear Gradient Overhead
%A Sadegh Farhadkhani
%A Rachid Guerraoui
%A Nirupam Gupta
%A Lê-Nguyên Hoang
%A Rafael Pinot
%A John Stephan
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-farhadkhani23a
%I PMLR
%P 9761--9813
%U https://proceedings.mlr.press/v202/farhadkhani23a.html
%V 202
%X Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak’s momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of $(\alpha, \lambda)$-reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https://github.com/LPD-EPFL/robust-collaborative-learning.

APA


Farhadkhani, S., Guerraoui, R., Gupta, N., Hoang, L., Pinot, R. & Stephan, J.. (2023). Robust Collaborative Learning with Linear Gradient Overhead. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:9761-9813 Available from https://proceedings.mlr.press/v202/farhadkhani23a.html.

Robust Collaborative Learning with Linear Gradient Overhead

Abstract

Cite this Paper

Related Material