[edit]
CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:52-63, 2026.
Abstract
Deep Convolutional Neural Networks (CNNs) are increasingly difficult to deploy on microcontrollers (MCUs) and lightweight NPUs (Neural Processing Units) due to their growing size and compute demands. Low-rank tensor decomposition, such as Tucker factorization, is a promising way to reduce parameters and operations with reasonable accuracy loss. However, existing approaches select ranks locally and often ignore global trade-offs between compression and accuracy. We introduce CompressNAS, a MicroNAS-inspired framework that treats rank selection as a global search problem. CompressNAS employs a fast accuracy estimator to evaluate candidate decompositions, enabling efficient yet exhaustive rank exploration under memory and accuracy constraints. In ImageNet, CompressNAS compresses ResNet-18 by 8$\times$ with less than 4% accuracy drop; on COCO, we achieve 2$\times$ compression of YOLOv5s without any accuracy drop and 2$\times$ compression of YOLOv5n with a 2.5% drop.