Fast Lossless Neural Compression with Integer-Only Discrete Flows

Siyu Wang, Jianfei Chen, Chongxuan Li, Jun Zhu, Bo Zhang
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:22562-22575, 2022.

Abstract

By applying entropy codecs with learned data distributions, neural compressors have significantly outperformed traditional codecs in terms of compression ratio. However, the high inference latency of neural networks hinders the deployment of neural compressors in practical applications. In this work, we propose Integer-only Discrete Flows (IODF) an efficient neural compressor with integer-only arithmetic. Our work is built upon integer discrete flows, which consists of invertible transformations between discrete random variables. We propose efficient invertible transformations with integer-only arithmetic based on 8-bit quantization. Our invertible transformation is equipped with learnable binary gates to remove redundant filters during inference. We deploy IODF with TensorRT on GPUs, achieving $10\times$ inference speedup compared to the fastest existing neural compressors, while retaining the high compression rates on ImageNet32 and ImageNet64.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-wang22a, title = {Fast Lossless Neural Compression with Integer-Only Discrete Flows}, author = {Wang, Siyu and Chen, Jianfei and Li, Chongxuan and Zhu, Jun and Zhang, Bo}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {22562--22575}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/wang22a/wang22a.pdf}, url = {https://proceedings.mlr.press/v162/wang22a.html}, abstract = {By applying entropy codecs with learned data distributions, neural compressors have significantly outperformed traditional codecs in terms of compression ratio. However, the high inference latency of neural networks hinders the deployment of neural compressors in practical applications. In this work, we propose Integer-only Discrete Flows (IODF) an efficient neural compressor with integer-only arithmetic. Our work is built upon integer discrete flows, which consists of invertible transformations between discrete random variables. We propose efficient invertible transformations with integer-only arithmetic based on 8-bit quantization. Our invertible transformation is equipped with learnable binary gates to remove redundant filters during inference. We deploy IODF with TensorRT on GPUs, achieving $10\times$ inference speedup compared to the fastest existing neural compressors, while retaining the high compression rates on ImageNet32 and ImageNet64.} }
Endnote
%0 Conference Paper %T Fast Lossless Neural Compression with Integer-Only Discrete Flows %A Siyu Wang %A Jianfei Chen %A Chongxuan Li %A Jun Zhu %A Bo Zhang %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-wang22a %I PMLR %P 22562--22575 %U https://proceedings.mlr.press/v162/wang22a.html %V 162 %X By applying entropy codecs with learned data distributions, neural compressors have significantly outperformed traditional codecs in terms of compression ratio. However, the high inference latency of neural networks hinders the deployment of neural compressors in practical applications. In this work, we propose Integer-only Discrete Flows (IODF) an efficient neural compressor with integer-only arithmetic. Our work is built upon integer discrete flows, which consists of invertible transformations between discrete random variables. We propose efficient invertible transformations with integer-only arithmetic based on 8-bit quantization. Our invertible transformation is equipped with learnable binary gates to remove redundant filters during inference. We deploy IODF with TensorRT on GPUs, achieving $10\times$ inference speedup compared to the fastest existing neural compressors, while retaining the high compression rates on ImageNet32 and ImageNet64.
APA
Wang, S., Chen, J., Li, C., Zhu, J. & Zhang, B.. (2022). Fast Lossless Neural Compression with Integer-Only Discrete Flows. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:22562-22575 Available from https://proceedings.mlr.press/v162/wang22a.html.

Related Material