FIREPruning: Learning-based Filter Pruning for Convolutional Neural Network Compression

Yuchu Fang, Wenzhong Li, Yao Zeng, Sanglu Lu
Proceedings of The 12th Asian Conference on Machine Learning, PMLR 129:385-400, 2020.

Abstract

Despite their great success in various fields, modern convolutional neural networks (CNNs) require huge amount of computation in inference due to their deeper network structure, which prevents them from being used in resource-limited devices such as mobile phones and embedded sensors. Recently, filter pruning had been introduced as a promising model compression method to reduce computation cost and storage overhead. However, existing filter pruning approaches are mainly model-based, which rely on empirical model to evaluate the importance of filters and set parameters manually to guide model compression. In this paper, we observe that CNNs commonly consist of large amount of inactive filters, and introduce Filter Inactive RatE (FIRE), a novel metric to evaluate the importance of filters in a neural network. Based on FIRE, we develop a learning based filter pruning strategy called FIREPruning for fast model compression. It adopts a regression model to predict the FIRE value and uses a three stage pipeline (FIRE prediction, pruning, and fine-tuning) to compress the neural network efficiently. Extensive experiments based on widely-used CNN models and well-known datasets show that FIREPruning reduces overall computation cost up to 86.9% without sacrificing too much accuracy, which significantly outperforms the state-of-the-art model compression methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v129-fang20a, title = {FIREPruning: Learning-based Filter Pruning for Convolutional Neural Network Compression}, author = {Fang, Yuchu and Li, Wenzhong and Zeng, Yao and Lu, Sanglu}, booktitle = {Proceedings of The 12th Asian Conference on Machine Learning}, pages = {385--400}, year = {2020}, editor = {Pan, Sinno Jialin and Sugiyama, Masashi}, volume = {129}, series = {Proceedings of Machine Learning Research}, month = {18--20 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v129/fang20a/fang20a.pdf}, url = {https://proceedings.mlr.press/v129/fang20a.html}, abstract = {Despite their great success in various fields, modern convolutional neural networks (CNNs) require huge amount of computation in inference due to their deeper network structure, which prevents them from being used in resource-limited devices such as mobile phones and embedded sensors. Recently, filter pruning had been introduced as a promising model compression method to reduce computation cost and storage overhead. However, existing filter pruning approaches are mainly model-based, which rely on empirical model to evaluate the importance of filters and set parameters manually to guide model compression. In this paper, we observe that CNNs commonly consist of large amount of inactive filters, and introduce Filter Inactive RatE (FIRE), a novel metric to evaluate the importance of filters in a neural network. Based on FIRE, we develop a learning based filter pruning strategy called FIREPruning for fast model compression. It adopts a regression model to predict the FIRE value and uses a three stage pipeline (FIRE prediction, pruning, and fine-tuning) to compress the neural network efficiently. Extensive experiments based on widely-used CNN models and well-known datasets show that FIREPruning reduces overall computation cost up to 86.9% without sacrificing too much accuracy, which significantly outperforms the state-of-the-art model compression methods.} }
Endnote
%0 Conference Paper %T FIREPruning: Learning-based Filter Pruning for Convolutional Neural Network Compression %A Yuchu Fang %A Wenzhong Li %A Yao Zeng %A Sanglu Lu %B Proceedings of The 12th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Sinno Jialin Pan %E Masashi Sugiyama %F pmlr-v129-fang20a %I PMLR %P 385--400 %U https://proceedings.mlr.press/v129/fang20a.html %V 129 %X Despite their great success in various fields, modern convolutional neural networks (CNNs) require huge amount of computation in inference due to their deeper network structure, which prevents them from being used in resource-limited devices such as mobile phones and embedded sensors. Recently, filter pruning had been introduced as a promising model compression method to reduce computation cost and storage overhead. However, existing filter pruning approaches are mainly model-based, which rely on empirical model to evaluate the importance of filters and set parameters manually to guide model compression. In this paper, we observe that CNNs commonly consist of large amount of inactive filters, and introduce Filter Inactive RatE (FIRE), a novel metric to evaluate the importance of filters in a neural network. Based on FIRE, we develop a learning based filter pruning strategy called FIREPruning for fast model compression. It adopts a regression model to predict the FIRE value and uses a three stage pipeline (FIRE prediction, pruning, and fine-tuning) to compress the neural network efficiently. Extensive experiments based on widely-used CNN models and well-known datasets show that FIREPruning reduces overall computation cost up to 86.9% without sacrificing too much accuracy, which significantly outperforms the state-of-the-art model compression methods.
APA
Fang, Y., Li, W., Zeng, Y. & Lu, S.. (2020). FIREPruning: Learning-based Filter Pruning for Convolutional Neural Network Compression. Proceedings of The 12th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 129:385-400 Available from https://proceedings.mlr.press/v129/fang20a.html.

Related Material