ResIST: Layer-wise decomposition of ResNets for distributed training

Chen Dun; Cameron R. Wolfe; Christopher M. Jermaine; Anastasios Kyrillidis

ResIST: Layer-wise decomposition of ResNets for distributed training

Chen Dun, Cameron R. Wolfe, Christopher M. Jermaine, Anastasios Kyrillidis

Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:610-620, 2022.

Abstract

We propose ResIST, a novel distributed training protocol for Residual Networks (ResNets). ResIST randomly decomposes a global ResNet into several shallow sub-ResNets that are trained independently in a distributed manner for several local iterations, before having their updates synchronized and aggregated into the global model. In the next round, new sub-ResNets are randomly generated and the process repeats until convergence. By construction, per iteration, ResIST communicates only a small portion of network parameters to each machine and never uses the full model during training. Thus, ResIST reduces the per-iteration communication, memory, and time requirements of ResNet training to only a fraction of the requirements of full-model training. In comparison to common protocols, like data-parallel training and data-parallel training with local SGD, ResIST yields a decrease in communication and compute requirements, while being competitive with respect to model performance.

Cite this Paper

BibTeX


@InProceedings{pmlr-v180-dun22a,
  title = 	 {ResIST: Layer-wise decomposition of ResNets for distributed training},
  author =       {Dun, Chen and Wolfe, Cameron R. and Jermaine, Christopher M. and Kyrillidis, Anastasios},
  booktitle = 	 {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {610--620},
  year = 	 {2022},
  editor = 	 {Cussens, James and Zhang, Kun},
  volume = 	 {180},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {01--05 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v180/dun22a/dun22a.pdf},
  url = 	 {https://proceedings.mlr.press/v180/dun22a.html},
  abstract = 	 {We propose ResIST, a novel distributed training protocol for Residual Networks (ResNets). ResIST randomly decomposes a global ResNet into several shallow sub-ResNets that are trained independently in a distributed manner for several local iterations, before having their updates synchronized and aggregated into the global model. In the next round, new sub-ResNets are randomly generated and the process repeats until convergence. By construction, per iteration, ResIST communicates only a small portion of network parameters to each machine and never uses the full model during training. Thus, ResIST reduces the per-iteration communication, memory, and time requirements of ResNet training to only a fraction of the requirements of full-model training. In comparison to common protocols, like data-parallel training and data-parallel training with local SGD, ResIST yields a decrease in communication and compute requirements, while being competitive with respect to model performance.}
}

Endnote

%0 Conference Paper
%T ResIST: Layer-wise decomposition of ResNets for distributed training
%A Chen Dun
%A Cameron R. Wolfe
%A Christopher M. Jermaine
%A Anastasios Kyrillidis
%B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2022
%E James Cussens
%E Kun Zhang	
%F pmlr-v180-dun22a
%I PMLR
%P 610--620
%U https://proceedings.mlr.press/v180/dun22a.html
%V 180
%X We propose ResIST, a novel distributed training protocol for Residual Networks (ResNets). ResIST randomly decomposes a global ResNet into several shallow sub-ResNets that are trained independently in a distributed manner for several local iterations, before having their updates synchronized and aggregated into the global model. In the next round, new sub-ResNets are randomly generated and the process repeats until convergence. By construction, per iteration, ResIST communicates only a small portion of network parameters to each machine and never uses the full model during training. Thus, ResIST reduces the per-iteration communication, memory, and time requirements of ResNet training to only a fraction of the requirements of full-model training. In comparison to common protocols, like data-parallel training and data-parallel training with local SGD, ResIST yields a decrease in communication and compute requirements, while being competitive with respect to model performance.

APA


Dun, C., Wolfe, C.R., Jermaine, C.M. & Kyrillidis, A.. (2022). ResIST: Layer-wise decomposition of ResNets for distributed training. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:610-620 Available from https://proceedings.mlr.press/v180/dun22a.html.

ResIST: Layer-wise decomposition of ResNets for distributed training

Abstract

Cite this Paper

Related Material