Deep learning with COTS HPC systems

Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, Ng Andrew
; Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1337-1345, 2013.

Abstract

Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features. Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud-like computing infrastructure and thousands of CPU cores. In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI. Our system is able to train 1 billion parameter networks on just 3 machines in a couple of days, and we show that it can scale to networks with over 11 billion parameters using just 16 machines. As this infrastructure is much more easily marshaled by others, the approach enables much wider-spread research with extremely large neural networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-coates13, title = {Deep learning with COTS HPC systems}, author = {Adam Coates and Brody Huval and Tao Wang and David Wu and Bryan Catanzaro and Ng Andrew}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {1337--1345}, year = {2013}, editor = {Sanjoy Dasgupta and David McAllester}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/coates13.pdf}, url = {http://proceedings.mlr.press/v28/coates13.html}, abstract = {Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features. Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud-like computing infrastructure and thousands of CPU cores. In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI. Our system is able to train 1 billion parameter networks on just 3 machines in a couple of days, and we show that it can scale to networks with over 11 billion parameters using just 16 machines. As this infrastructure is much more easily marshaled by others, the approach enables much wider-spread research with extremely large neural networks.} }
Endnote
%0 Conference Paper %T Deep learning with COTS HPC systems %A Adam Coates %A Brody Huval %A Tao Wang %A David Wu %A Bryan Catanzaro %A Ng Andrew %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-coates13 %I PMLR %J Proceedings of Machine Learning Research %P 1337--1345 %U http://proceedings.mlr.press %V 28 %N 3 %W PMLR %X Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features. Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud-like computing infrastructure and thousands of CPU cores. In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI. Our system is able to train 1 billion parameter networks on just 3 machines in a couple of days, and we show that it can scale to networks with over 11 billion parameters using just 16 machines. As this infrastructure is much more easily marshaled by others, the approach enables much wider-spread research with extremely large neural networks.
RIS
TY - CPAPER TI - Deep learning with COTS HPC systems AU - Adam Coates AU - Brody Huval AU - Tao Wang AU - David Wu AU - Bryan Catanzaro AU - Ng Andrew BT - Proceedings of the 30th International Conference on Machine Learning PY - 2013/02/13 DA - 2013/02/13 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-coates13 PB - PMLR SP - 1337 DP - PMLR EP - 1345 L1 - http://proceedings.mlr.press/v28/coates13.pdf UR - http://proceedings.mlr.press/v28/coates13.html AB - Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features. Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud-like computing infrastructure and thousands of CPU cores. In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI. Our system is able to train 1 billion parameter networks on just 3 machines in a couple of days, and we show that it can scale to networks with over 11 billion parameters using just 16 machines. As this infrastructure is much more easily marshaled by others, the approach enables much wider-spread research with extremely large neural networks. ER -
APA
Coates, A., Huval, B., Wang, T., Wu, D., Catanzaro, B. & Andrew, N.. (2013). Deep learning with COTS HPC systems. Proceedings of the 30th International Conference on Machine Learning, in PMLR 28(3):1337-1345

Related Material