The Non-IID Data Quagmire of Decentralized Machine Learning

Kevin Hsieh; Amar Phanishayee; Onur Mutlu; Phillip Gibbons

The Non-IID Data Quagmire of Decentralized Machine Learning

Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip Gibbons

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:4387-4398, 2020.

Abstract

Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the accuracy loss of batch normalization.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-hsieh20a,
  title = 	 {The Non-{IID} Data Quagmire of Decentralized Machine Learning},
  author =       {Hsieh, Kevin and Phanishayee, Amar and Mutlu, Onur and Gibbons, Phillip},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {4387--4398},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/hsieh20a/hsieh20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/hsieh20a.html},
  abstract = 	 {Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the accuracy loss of batch normalization.}
}

Endnote

%0 Conference Paper
%T The Non-IID Data Quagmire of Decentralized Machine Learning
%A Kevin Hsieh
%A Amar Phanishayee
%A Onur Mutlu
%A Phillip Gibbons
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-hsieh20a
%I PMLR
%P 4387--4398
%U https://proceedings.mlr.press/v119/hsieh20a.html
%V 119
%X Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the accuracy loss of batch normalization.

APA


Hsieh, K., Phanishayee, A., Mutlu, O. & Gibbons, P.. (2020). The Non-IID Data Quagmire of Decentralized Machine Learning. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:4387-4398 Available from https://proceedings.mlr.press/v119/hsieh20a.html.

The Non-IID Data Quagmire of Decentralized Machine Learning

Abstract

Cite this Paper

Related Material