TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge

Young D. Kwon; Rui Li; Stylianos Venieris; Jagmohan Chauhan; Nicholas Donald Lane; Cecilia Mascolo

TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge

Young D. Kwon, Rui Li, Stylianos Venieris, Jagmohan Chauhan, Nicholas Donald Lane, Cecilia Mascolo

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:25812-25843, 2024.

Abstract

On-device training is essential for user personalisation and privacy. With the pervasiveness of IoT devices and microcontroller units (MCUs), this task becomes more challenging due to the constrained memory and compute resources, and the limited availability of labelled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time ($\textit{e.g.}$ a few hours), or induce substantial accuracy loss ($\geq$10%). In this paper, we propose TinyTrain, an on-device training approach that drastically reduces training time by selectively updating parts of the model and explicitly coping with data scarcity. TinyTrain introduces a task-adaptive sparse-update method that $\textit{dynamically}$ selects the layer/channel to update based on a multi-objective criterion that jointly captures user data, the memory, and the compute capabilities of the target device, leading to high accuracy on unseen tasks with reduced computation and memory footprint. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0% in accuracy, while reducing the backward-pass memory and computation cost by up to 1,098$\times$ and 7.68$\times$, respectively. Targeting broadly used real-world edge devices, TinyTrain achieves 9.5$\times$ faster and 3.5$\times$ more energy-efficient training over status-quo approaches, and 2.23$\times$ smaller memory footprint than SOTA methods, while remaining within the 1 MB memory envelope of MCU-grade platforms.

Cite this Paper

BibTeX

@InProceedings{pmlr-v235-kwon24c,
  title = 	 {{T}iny{T}rain: Resource-Aware Task-Adaptive Sparse Training of {DNN}s at the Data-Scarce Edge},
  author =       {Kwon, Young D. and Li, Rui and Venieris, Stylianos and Chauhan, Jagmohan and Lane, Nicholas Donald and Mascolo, Cecilia},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {25812--25843},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/kwon24c/kwon24c.pdf},
  url = 	 {https://proceedings.mlr.press/v235/kwon24c.html},
  abstract = 	 {On-device training is essential for user personalisation and privacy. With the pervasiveness of IoT devices and microcontroller units (MCUs), this task becomes more challenging due to the constrained memory and compute resources, and the limited availability of labelled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time ($\textit{e.g.}$ a few hours), or induce substantial accuracy loss ($\geq$10%). In this paper, we propose TinyTrain, an on-device training approach that drastically reduces training time by selectively updating parts of the model and explicitly coping with data scarcity. TinyTrain introduces a task-adaptive sparse-update method that $\textit{dynamically}$ selects the layer/channel to update based on a multi-objective criterion that jointly captures user data, the memory, and the compute capabilities of the target device, leading to high accuracy on unseen tasks with reduced computation and memory footprint. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0% in accuracy, while reducing the backward-pass memory and computation cost by up to 1,098$\times$ and 7.68$\times$, respectively. Targeting broadly used real-world edge devices, TinyTrain achieves 9.5$\times$ faster and 3.5$\times$ more energy-efficient training over status-quo approaches, and 2.23$\times$ smaller memory footprint than SOTA methods, while remaining within the 1 MB memory envelope of MCU-grade platforms.}
}

Endnote

%0 Conference Paper
%T TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge
%A Young D. Kwon
%A Rui Li
%A Stylianos Venieris
%A Jagmohan Chauhan
%A Nicholas Donald Lane
%A Cecilia Mascolo
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-kwon24c
%I PMLR
%P 25812--25843
%U https://proceedings.mlr.press/v235/kwon24c.html
%V 235
%X On-device training is essential for user personalisation and privacy. With the pervasiveness of IoT devices and microcontroller units (MCUs), this task becomes more challenging due to the constrained memory and compute resources, and the limited availability of labelled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time ($\textit{e.g.}$ a few hours), or induce substantial accuracy loss ($\geq$10%). In this paper, we propose TinyTrain, an on-device training approach that drastically reduces training time by selectively updating parts of the model and explicitly coping with data scarcity. TinyTrain introduces a task-adaptive sparse-update method that $\textit{dynamically}$ selects the layer/channel to update based on a multi-objective criterion that jointly captures user data, the memory, and the compute capabilities of the target device, leading to high accuracy on unseen tasks with reduced computation and memory footprint. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0% in accuracy, while reducing the backward-pass memory and computation cost by up to 1,098$\times$ and 7.68$\times$, respectively. Targeting broadly used real-world edge devices, TinyTrain achieves 9.5$\times$ faster and 3.5$\times$ more energy-efficient training over status-quo approaches, and 2.23$\times$ smaller memory footprint than SOTA methods, while remaining within the 1 MB memory envelope of MCU-grade platforms.

APA

Kwon, Y.D., Li, R., Venieris, S., Chauhan, J., Lane, N.D. & Mascolo, C.. (2024). TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:25812-25843 Available from https://proceedings.mlr.press/v235/kwon24c.html.

TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge

Abstract

Cite this Paper

Related Material