Deep learning with COTS HPC systems

Adam Coates; Brody Huval; Tao Wang; David Wu; Bryan Catanzaro; Ng Andrew

Deep learning with COTS HPC systems

Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, Ng Andrew

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1337-1345, 2013.

Abstract

Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features. Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud-like computing infrastructure and thousands of CPU cores. In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI. Our system is able to train 1 billion parameter networks on just 3 machines in a couple of days, and we show that it can scale to networks with over 11 billion parameters using just 16 machines. As this infrastructure is much more easily marshaled by others, the approach enables much wider-spread research with extremely large neural networks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v28-coates13,
  title = 	 {Deep learning with COTS HPC systems},
  author = 	 {Coates, Adam and Huval, Brody and Wang, Tao and Wu, David and Catanzaro, Bryan and Andrew, Ng},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {1337--1345},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/coates13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/coates13.html},
  abstract = 	 {Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features.  Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud-like computing infrastructure and thousands of CPU cores.  In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI.  Our system is able to train 1 billion parameter networks on just 3 machines in a couple of days, and we show that it can scale to networks with over 11 billion parameters using just 16 machines.  As this infrastructure is much more easily marshaled by others, the approach enables much wider-spread research with extremely large neural networks.}
}

Endnote

%0 Conference Paper
%T Deep learning with COTS HPC systems
%A Adam Coates
%A Brody Huval
%A Tao Wang
%A David Wu
%A Bryan Catanzaro
%A Ng Andrew
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-coates13
%I PMLR
%P 1337--1345
%U https://proceedings.mlr.press/v28/coates13.html
%V 28
%N 3
%X Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features.  Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud-like computing infrastructure and thousands of CPU cores.  In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI.  Our system is able to train 1 billion parameter networks on just 3 machines in a couple of days, and we show that it can scale to networks with over 11 billion parameters using just 16 machines.  As this infrastructure is much more easily marshaled by others, the approach enables much wider-spread research with extremely large neural networks.

RIS

TY  - CPAPER
TI  - Deep learning with COTS HPC systems
AU  - Adam Coates
AU  - Brody Huval
AU  - Tao Wang
AU  - David Wu
AU  - Bryan Catanzaro
AU  - Ng Andrew
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-coates13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 1337
EP  - 1345
L1  - http://proceedings.mlr.press/v28/coates13.pdf
UR  - https://proceedings.mlr.press/v28/coates13.html
AB  - Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features.  Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud-like computing infrastructure and thousands of CPU cores.  In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI.  Our system is able to train 1 billion parameter networks on just 3 machines in a couple of days, and we show that it can scale to networks with over 11 billion parameters using just 16 machines.  As this infrastructure is much more easily marshaled by others, the approach enables much wider-spread research with extremely large neural networks.
ER  -

APA

Coates, A., Huval, B., Wang, T., Wu, D., Catanzaro, B. & Andrew, N.. (2013). Deep learning with COTS HPC systems. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):1337-1345 Available from https://proceedings.mlr.press/v28/coates13.html.

Related Material

Download PDF