CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition

Sudhakar Sah; Nikhil Chhabra; Matthieu Durnerin

CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition

Sudhakar Sah, Nikhil Chhabra, Matthieu Durnerin

Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:52-63, 2026.

Abstract

Deep Convolutional Neural Networks (CNNs) are increasingly difficult to deploy on microcontrollers (MCUs) and lightweight NPUs (Neural Processing Units) due to their growing size and compute demands. Low-rank tensor decomposition, such as Tucker factorization, is a promising way to reduce parameters and operations with reasonable accuracy loss. However, existing approaches select ranks locally and often ignore global trade-offs between compression and accuracy. We introduce CompressNAS, a MicroNAS-inspired framework that treats rank selection as a global search problem. CompressNAS employs a fast accuracy estimator to evaluate candidate decompositions, enabling efficient yet exhaustive rank exploration under memory and accuracy constraints. In ImageNet, CompressNAS compresses ResNet-18 by 8$\times$ with less than 4% accuracy drop; on COCO, we achieve 2$\times$ compression of YOLOv5s without any accuracy drop and 2$\times$ compression of YOLOv5n with a 2.5% drop.

Cite this Paper

BibTeX

@InProceedings{pmlr-v318-sah26b,
  title = 	 {CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition},
  author =       {Sah, Sudhakar and Chhabra, Nikhil and Durnerin, Matthieu},
  booktitle = 	 {Proceedings of the The 39th Canadian Conference on Artificial Intelligence},
  pages = 	 {52--63},
  year = 	 {2026},
  editor = 	 {Bouzar-Benlabiod, Lydia and Leung, Carson},
  volume = 	 {318},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--29 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v318/main/assets/sah26b/sah26b.pdf},
  url = 	 {https://proceedings.mlr.press/v318/sah26b.html},
  abstract = 	 {Deep Convolutional Neural Networks (CNNs) are increasingly difficult to deploy on microcontrollers (MCUs) and lightweight NPUs (Neural Processing Units) due to their growing size and compute demands. Low-rank tensor decomposition, such as Tucker factorization, is a promising way to reduce parameters and operations with reasonable accuracy loss. However, existing approaches select ranks locally and often ignore global trade-offs between compression and accuracy. We introduce CompressNAS, a MicroNAS-inspired framework that treats rank selection as a global search problem. CompressNAS employs a fast accuracy estimator to evaluate candidate decompositions, enabling efficient yet exhaustive rank exploration under memory and accuracy constraints. In ImageNet, CompressNAS compresses ResNet-18 by 8$\times$ with less than 4% accuracy drop; on COCO, we achieve 2$\times$ compression of YOLOv5s without any accuracy drop and 2$\times$ compression of YOLOv5n with a 2.5% drop.}
}

Endnote

%0 Conference Paper
%T CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition
%A Sudhakar Sah
%A Nikhil Chhabra
%A Matthieu Durnerin
%B Proceedings of the The 39th Canadian Conference on Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2026
%E Lydia Bouzar-Benlabiod
%E Carson Leung	
%F pmlr-v318-sah26b
%I PMLR
%P 52--63
%U https://proceedings.mlr.press/v318/sah26b.html
%V 318
%X Deep Convolutional Neural Networks (CNNs) are increasingly difficult to deploy on microcontrollers (MCUs) and lightweight NPUs (Neural Processing Units) due to their growing size and compute demands. Low-rank tensor decomposition, such as Tucker factorization, is a promising way to reduce parameters and operations with reasonable accuracy loss. However, existing approaches select ranks locally and often ignore global trade-offs between compression and accuracy. We introduce CompressNAS, a MicroNAS-inspired framework that treats rank selection as a global search problem. CompressNAS employs a fast accuracy estimator to evaluate candidate decompositions, enabling efficient yet exhaustive rank exploration under memory and accuracy constraints. In ImageNet, CompressNAS compresses ResNet-18 by 8$\times$ with less than 4% accuracy drop; on COCO, we achieve 2$\times$ compression of YOLOv5s without any accuracy drop and 2$\times$ compression of YOLOv5n with a 2.5% drop.

APA

Sah, S., Chhabra, N. & Durnerin, M.. (2026). CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:52-63 Available from https://proceedings.mlr.press/v318/sah26b.html.

Related Material

Download PDF