[edit]
Hierarchical Quantization Algorithm for Deep Learning Network Models
Proceedings of 2024 International Conference on Machine Learning and Intelligent Computing, PMLR 245:265-270, 2024.
Abstract
Deep learning network models have achieved inspiring performances in various fields such as computer vision, natural language processing, and biomedicine. However, the high computational and storage costs of the models restrain their application in resource-limited situations. However, due to the increased com-plexity and computation of deep neural networks, there are still some challenges in deploying deep learning models into real-world applications for resource-constrained devices. To address this problem, researchers have proposed various quantization algorithms to decrease the expenditure of calculation and storage in deep learning models. This thesis addresses the problem of hierarchical quantization of deep learning models and proposes a simple hierarchical quantization algorithm that aims to effectively reduce the computation and storage requirements of deep learning network models and maintain the accuracy of the models. To demonstrate the effectiveness of the proposed hierarchical quantization method, we conducted experiments on several classical deep learning models. Our experiments prove our approach can better maintain the models’ accuracy while reduc-ing the storage and computation requirements compared to the traditional quantization algorithms.