Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization

Eldad Meller, Alexander Finkelstein, Uri Almog, Mark Grobman
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4486-4495, 2019.

Abstract

Quantization of neural networks has become common practice, driven by the need for efficient implementations of deep neural networks on embedded devices. In this paper, we exploit an oft-overlooked degree of freedom in most networks - for a given layer, individual output channels can be scaled by any factor provided that the corresponding weights of the next layer are inversely scaled. Therefore, a given network has many factorizations which change the weights of the network without changing its function. We present a conceptually simple and easy to implement method that uses this property and show that proper factorizations significantly decrease the degradation caused by quantization. We show improvement on a wide variety of networks and achieve state-of-the-art degradation results for MobileNets. While our focus is on quantization, this type of factorization is applicable to other domains such as network-pruning, neural nets regularization and network interpretability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-meller19a, title = {Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization}, author = {Meller, Eldad and Finkelstein, Alexander and Almog, Uri and Grobman, Mark}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {4486--4495}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/meller19a/meller19a.pdf}, url = {https://proceedings.mlr.press/v97/meller19a.html}, abstract = {Quantization of neural networks has become common practice, driven by the need for efficient implementations of deep neural networks on embedded devices. In this paper, we exploit an oft-overlooked degree of freedom in most networks - for a given layer, individual output channels can be scaled by any factor provided that the corresponding weights of the next layer are inversely scaled. Therefore, a given network has many factorizations which change the weights of the network without changing its function. We present a conceptually simple and easy to implement method that uses this property and show that proper factorizations significantly decrease the degradation caused by quantization. We show improvement on a wide variety of networks and achieve state-of-the-art degradation results for MobileNets. While our focus is on quantization, this type of factorization is applicable to other domains such as network-pruning, neural nets regularization and network interpretability.} }
Endnote
%0 Conference Paper %T Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization %A Eldad Meller %A Alexander Finkelstein %A Uri Almog %A Mark Grobman %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-meller19a %I PMLR %P 4486--4495 %U https://proceedings.mlr.press/v97/meller19a.html %V 97 %X Quantization of neural networks has become common practice, driven by the need for efficient implementations of deep neural networks on embedded devices. In this paper, we exploit an oft-overlooked degree of freedom in most networks - for a given layer, individual output channels can be scaled by any factor provided that the corresponding weights of the next layer are inversely scaled. Therefore, a given network has many factorizations which change the weights of the network without changing its function. We present a conceptually simple and easy to implement method that uses this property and show that proper factorizations significantly decrease the degradation caused by quantization. We show improvement on a wide variety of networks and achieve state-of-the-art degradation results for MobileNets. While our focus is on quantization, this type of factorization is applicable to other domains such as network-pruning, neural nets regularization and network interpretability.
APA
Meller, E., Finkelstein, A., Almog, U. & Grobman, M.. (2019). Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4486-4495 Available from https://proceedings.mlr.press/v97/meller19a.html.

Related Material