meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

Xu Sun, Xuancheng Ren, Shuming Ma, Houfeng Wang
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3299-3308, 2017.

Abstract

We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1–4\% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-sun17c, title = {me{P}rop: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting}, author = {Xu Sun and Xuancheng Ren and Shuming Ma and Houfeng Wang}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {3299--3308}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/sun17c/sun17c.pdf}, url = { http://proceedings.mlr.press/v70/sun17c.html }, abstract = {We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1–4\% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given.} }
Endnote
%0 Conference Paper %T meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting %A Xu Sun %A Xuancheng Ren %A Shuming Ma %A Houfeng Wang %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-sun17c %I PMLR %P 3299--3308 %U http://proceedings.mlr.press/v70/sun17c.html %V 70 %X We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1–4\% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given.
APA
Sun, X., Ren, X., Ma, S. & Wang, H.. (2017). meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:3299-3308 Available from http://proceedings.mlr.press/v70/sun17c.html .

Related Material