Rate Distortion For Model Compression:From Theory To Practice

Weihao Gao, Yu-Han Liu, Chong Wang, Sewoong Oh
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2102-2111, 2019.

Abstract

The enormous size of modern deep neural net-works makes it challenging to deploy those models in memory and communication limited scenarios. Thus, compressing a trained model without a significant loss in performance has become an increasingly important task. Tremendous advances has been made recently, where the main technical building blocks are pruning, quantization, and low-rank factorization. In this paper, we propose principled approaches to improve upon the common heuristics used in those building blocks, by studying the fundamental limit for model compression via the rate distortion theory. We prove a lower bound for the rate distortion function for model compression and prove its achievability for linear models. Although this achievable compression scheme is intractable in practice, this analysis motivates a novel objective function for model compression, which can be used to improve classes of model compressor such as pruning or quantization. Theoretically, we prove that the proposed scheme is optimal for compressing one-hidden-layer ReLU neural networks. Empirically,we show that the proposed scheme improves upon the baseline in the compression-accuracy tradeoff.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-gao19c, title = {Rate Distortion For Model {C}ompression:{F}rom Theory To Practice}, author = {Gao, Weihao and Liu, Yu-Han and Wang, Chong and Oh, Sewoong}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {2102--2111}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/gao19c/gao19c.pdf}, url = {https://proceedings.mlr.press/v97/gao19c.html}, abstract = {The enormous size of modern deep neural net-works makes it challenging to deploy those models in memory and communication limited scenarios. Thus, compressing a trained model without a significant loss in performance has become an increasingly important task. Tremendous advances has been made recently, where the main technical building blocks are pruning, quantization, and low-rank factorization. In this paper, we propose principled approaches to improve upon the common heuristics used in those building blocks, by studying the fundamental limit for model compression via the rate distortion theory. We prove a lower bound for the rate distortion function for model compression and prove its achievability for linear models. Although this achievable compression scheme is intractable in practice, this analysis motivates a novel objective function for model compression, which can be used to improve classes of model compressor such as pruning or quantization. Theoretically, we prove that the proposed scheme is optimal for compressing one-hidden-layer ReLU neural networks. Empirically,we show that the proposed scheme improves upon the baseline in the compression-accuracy tradeoff.} }
Endnote
%0 Conference Paper %T Rate Distortion For Model Compression:From Theory To Practice %A Weihao Gao %A Yu-Han Liu %A Chong Wang %A Sewoong Oh %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-gao19c %I PMLR %P 2102--2111 %U https://proceedings.mlr.press/v97/gao19c.html %V 97 %X The enormous size of modern deep neural net-works makes it challenging to deploy those models in memory and communication limited scenarios. Thus, compressing a trained model without a significant loss in performance has become an increasingly important task. Tremendous advances has been made recently, where the main technical building blocks are pruning, quantization, and low-rank factorization. In this paper, we propose principled approaches to improve upon the common heuristics used in those building blocks, by studying the fundamental limit for model compression via the rate distortion theory. We prove a lower bound for the rate distortion function for model compression and prove its achievability for linear models. Although this achievable compression scheme is intractable in practice, this analysis motivates a novel objective function for model compression, which can be used to improve classes of model compressor such as pruning or quantization. Theoretically, we prove that the proposed scheme is optimal for compressing one-hidden-layer ReLU neural networks. Empirically,we show that the proposed scheme improves upon the baseline in the compression-accuracy tradeoff.
APA
Gao, W., Liu, Y., Wang, C. & Oh, S.. (2019). Rate Distortion For Model Compression:From Theory To Practice. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2102-2111 Available from https://proceedings.mlr.press/v97/gao19c.html.

Related Material