QUTE: Quantifying Uncertainty in TinyML models with Early-exit-assisted ensembles for model-monitoring

Nikhil Pratap Ghanathe, Steven J E Wilton
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:19286-19306, 2025.

Abstract

Uncertainty quantification (UQ) provides a resource-efficient solution for on-device monitoring of tinyML models deployed remotely without access to true labels. However, existing UQ methods impose significant memory and compute demands, making them impractical for ultra-low-power, KB-sized tinyML devices. Prior work has attempted to reduce overhead by using early-exit ensembles to quantify uncertainty in a single forward pass, but these approaches still carry prohibitive costs. To address this, we propose QUTE, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models. QUTE introduces additional output blocks at the final exit of the base network, distilling early-exit knowledge into these blocks to form a diverse yet lightweight ensemble. We show that QUTE delivers superior uncertainty quality on tiny models, achieving comparable performance on larger models with 59% smaller model sizes than the closest prior work. When deployed on a microcontroller, QUTE demonstrates a 31% reduction in latency on average. In addition, we show that QUTE excels at detecting accuracy-drop events, outperforming all prior works.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-ghanathe25a, title = {{QUTE}: Quantifying Uncertainty in {T}iny{ML} models with Early-exit-assisted ensembles for model-monitoring}, author = {Ghanathe, Nikhil Pratap and Wilton, Steven J E}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {19286--19306}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/ghanathe25a/ghanathe25a.pdf}, url = {https://proceedings.mlr.press/v267/ghanathe25a.html}, abstract = {Uncertainty quantification (UQ) provides a resource-efficient solution for on-device monitoring of tinyML models deployed remotely without access to true labels. However, existing UQ methods impose significant memory and compute demands, making them impractical for ultra-low-power, KB-sized tinyML devices. Prior work has attempted to reduce overhead by using early-exit ensembles to quantify uncertainty in a single forward pass, but these approaches still carry prohibitive costs. To address this, we propose QUTE, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models. QUTE introduces additional output blocks at the final exit of the base network, distilling early-exit knowledge into these blocks to form a diverse yet lightweight ensemble. We show that QUTE delivers superior uncertainty quality on tiny models, achieving comparable performance on larger models with 59% smaller model sizes than the closest prior work. When deployed on a microcontroller, QUTE demonstrates a 31% reduction in latency on average. In addition, we show that QUTE excels at detecting accuracy-drop events, outperforming all prior works.} }
Endnote
%0 Conference Paper %T QUTE: Quantifying Uncertainty in TinyML models with Early-exit-assisted ensembles for model-monitoring %A Nikhil Pratap Ghanathe %A Steven J E Wilton %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-ghanathe25a %I PMLR %P 19286--19306 %U https://proceedings.mlr.press/v267/ghanathe25a.html %V 267 %X Uncertainty quantification (UQ) provides a resource-efficient solution for on-device monitoring of tinyML models deployed remotely without access to true labels. However, existing UQ methods impose significant memory and compute demands, making them impractical for ultra-low-power, KB-sized tinyML devices. Prior work has attempted to reduce overhead by using early-exit ensembles to quantify uncertainty in a single forward pass, but these approaches still carry prohibitive costs. To address this, we propose QUTE, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models. QUTE introduces additional output blocks at the final exit of the base network, distilling early-exit knowledge into these blocks to form a diverse yet lightweight ensemble. We show that QUTE delivers superior uncertainty quality on tiny models, achieving comparable performance on larger models with 59% smaller model sizes than the closest prior work. When deployed on a microcontroller, QUTE demonstrates a 31% reduction in latency on average. In addition, we show that QUTE excels at detecting accuracy-drop events, outperforming all prior works.
APA
Ghanathe, N.P. & Wilton, S.J.E.. (2025). QUTE: Quantifying Uncertainty in TinyML models with Early-exit-assisted ensembles for model-monitoring. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:19286-19306 Available from https://proceedings.mlr.press/v267/ghanathe25a.html.

Related Material