Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks

Zhihao Jia; Sina Lin; Charles R. Qi; Alex Aiken

Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks

Zhihao Jia, Sina Lin, Charles R. Qi, Alex Aiken

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2274-2283, 2018.

Abstract

The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g., data or model parallelism) to all layers in a network. Although easy to reason about, these approaches result in suboptimal runtime performance in large-scale distributed training, since different layers in a network may prefer different parallelization strategies. In this paper, we propose layer-wise parallelism that allows each layer in a network to use an individual parallelization strategy. We jointly optimize how each layer is parallelized by solving a graph search problem. Our evaluation shows that layer-wise parallelism outperforms state-of-the-art approaches by increasing training throughput, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining original network accuracy.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-jia18a,
  title = 	 {Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks},
  author =       {Jia, Zhihao and Lin, Sina and Qi, Charles R. and Aiken, Alex},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {2274--2283},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/jia18a/jia18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/jia18a.html},
  abstract = 	 {The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g., data or model parallelism) to all layers in a network. Although easy to reason about, these approaches result in suboptimal runtime performance in large-scale distributed training, since different layers in a network may prefer different parallelization strategies. In this paper, we propose layer-wise parallelism that allows each layer in a network to use an individual parallelization strategy. We jointly optimize how each layer is parallelized by solving a graph search problem. Our evaluation shows that layer-wise parallelism outperforms state-of-the-art approaches by increasing training throughput, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining original network accuracy.}
}

Endnote

%0 Conference Paper
%T Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks
%A Zhihao Jia
%A Sina Lin
%A Charles R. Qi
%A Alex Aiken
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-jia18a
%I PMLR
%P 2274--2283
%U https://proceedings.mlr.press/v80/jia18a.html
%V 80
%X The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g., data or model parallelism) to all layers in a network. Although easy to reason about, these approaches result in suboptimal runtime performance in large-scale distributed training, since different layers in a network may prefer different parallelization strategies. In this paper, we propose layer-wise parallelism that allows each layer in a network to use an individual parallelization strategy. We jointly optimize how each layer is parallelized by solving a graph search problem. Our evaluation shows that layer-wise parallelism outperforms state-of-the-art approaches by increasing training throughput, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining original network accuracy.

APA


Jia, Z., Lin, S., Qi, C.R. & Aiken, A.. (2018). Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2274-2283 Available from https://proceedings.mlr.press/v80/jia18a.html.

Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks

Abstract

Cite this Paper

Related Material