Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks

Zhihao Jia, Sina Lin, Charles R. Qi, Alex Aiken
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2274-2283, 2018.

Abstract

The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g., data or model parallelism) to all layers in a network. Although easy to reason about, these approaches result in suboptimal runtime performance in large-scale distributed training, since different layers in a network may prefer different parallelization strategies. In this paper, we propose layer-wise parallelism that allows each layer in a network to use an individual parallelization strategy. We jointly optimize how each layer is parallelized by solving a graph search problem. Our evaluation shows that layer-wise parallelism outperforms state-of-the-art approaches by increasing training throughput, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining original network accuracy.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-jia18a, title = {Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks}, author = {Jia, Zhihao and Lin, Sina and Qi, Charles R. and Aiken, Alex}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2274--2283}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/jia18a/jia18a.pdf}, url = {https://proceedings.mlr.press/v80/jia18a.html}, abstract = {The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g., data or model parallelism) to all layers in a network. Although easy to reason about, these approaches result in suboptimal runtime performance in large-scale distributed training, since different layers in a network may prefer different parallelization strategies. In this paper, we propose layer-wise parallelism that allows each layer in a network to use an individual parallelization strategy. We jointly optimize how each layer is parallelized by solving a graph search problem. Our evaluation shows that layer-wise parallelism outperforms state-of-the-art approaches by increasing training throughput, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining original network accuracy.} }
Endnote
%0 Conference Paper %T Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks %A Zhihao Jia %A Sina Lin %A Charles R. Qi %A Alex Aiken %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-jia18a %I PMLR %P 2274--2283 %U https://proceedings.mlr.press/v80/jia18a.html %V 80 %X The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g., data or model parallelism) to all layers in a network. Although easy to reason about, these approaches result in suboptimal runtime performance in large-scale distributed training, since different layers in a network may prefer different parallelization strategies. In this paper, we propose layer-wise parallelism that allows each layer in a network to use an individual parallelization strategy. We jointly optimize how each layer is parallelized by solving a graph search problem. Our evaluation shows that layer-wise parallelism outperforms state-of-the-art approaches by increasing training throughput, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining original network accuracy.
APA
Jia, Z., Lin, S., Qi, C.R. & Aiken, A.. (2018). Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2274-2283 Available from https://proceedings.mlr.press/v80/jia18a.html.

Related Material