FIREPruning: Learning-based Filter Pruning for Convolutional Neural Network Compression
Proceedings of The 12th Asian Conference on Machine Learning, PMLR 129:385-400, 2020.
Despite their great success in various fields, modern convolutional neural networks (CNNs) require huge amount of computation in inference due to their deeper network structure, which prevents them from being used in resource-limited devices such as mobile phones and embedded sensors. Recently, filter pruning had been introduced as a promising model compression method to reduce computation cost and storage overhead. However, existing filter pruning approaches are mainly model-based, which rely on empirical model to evaluate the importance of filters and set parameters manually to guide model compression. In this paper, we observe that CNNs commonly consist of large amount of inactive filters, and introduce Filter Inactive RatE (FIRE), a novel metric to evaluate the importance of filters in a neural network. Based on FIRE, we develop a learning based filter pruning strategy called FIREPruning for fast model compression. It adopts a regression model to predict the FIRE value and uses a three stage pipeline (FIRE prediction, pruning, and fine-tuning) to compress the neural network efficiently. Extensive experiments based on widely-used CNN models and well-known datasets show that FIREPruning reduces overall computation cost up to 86.9% without sacrificing too much accuracy, which significantly outperforms the state-of-the-art model compression methods.