Learngene Tells You How to Customize: Task-Aware Parameter Initialization at Flexible Scales

Jiaze Xu, Shiyu Xia, Xu Yang, Jiaqi Lv, Xin Geng
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:69806-69818, 2025.

Abstract

Appropriate parameter initialization strategies are essential for reducing the high computational costs of training large pretrained models in various task scenarios. Graph HyperNetwork (GHN), a parameter initialization method, has recently demonstrated strong performance in initializing models. However, GHN still faces several challenges, including limited effectiveness in initializing larger models, poor performance on smaller datasets, and the requirement of task-specific GHN training, where each new task necessitates retraining the GHN model, leading to increased computational and storage overhead. To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called Task-Aware Learngene (TAL). Briefly, our approach pretrains a TAL model under the guidance of a well-trained model and then performs multi-task tuning to obtain a shared TAL model that enables parameter prediction based on both model architectures and task-specific characteristics. Extensive experiments show the superiority of TAL. Models initialized with TAL outperform those initialized using GHN method by an average of 24.39% in terms of accuracy across Decathlon datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-xu25af, title = {Learngene Tells You How to Customize: Task-Aware Parameter Initialization at Flexible Scales}, author = {Xu, Jiaze and Xia, Shiyu and Yang, Xu and Lv, Jiaqi and Geng, Xin}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {69806--69818}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/xu25af/xu25af.pdf}, url = {https://proceedings.mlr.press/v267/xu25af.html}, abstract = {Appropriate parameter initialization strategies are essential for reducing the high computational costs of training large pretrained models in various task scenarios. Graph HyperNetwork (GHN), a parameter initialization method, has recently demonstrated strong performance in initializing models. However, GHN still faces several challenges, including limited effectiveness in initializing larger models, poor performance on smaller datasets, and the requirement of task-specific GHN training, where each new task necessitates retraining the GHN model, leading to increased computational and storage overhead. To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called Task-Aware Learngene (TAL). Briefly, our approach pretrains a TAL model under the guidance of a well-trained model and then performs multi-task tuning to obtain a shared TAL model that enables parameter prediction based on both model architectures and task-specific characteristics. Extensive experiments show the superiority of TAL. Models initialized with TAL outperform those initialized using GHN method by an average of 24.39% in terms of accuracy across Decathlon datasets.} }
Endnote
%0 Conference Paper %T Learngene Tells You How to Customize: Task-Aware Parameter Initialization at Flexible Scales %A Jiaze Xu %A Shiyu Xia %A Xu Yang %A Jiaqi Lv %A Xin Geng %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-xu25af %I PMLR %P 69806--69818 %U https://proceedings.mlr.press/v267/xu25af.html %V 267 %X Appropriate parameter initialization strategies are essential for reducing the high computational costs of training large pretrained models in various task scenarios. Graph HyperNetwork (GHN), a parameter initialization method, has recently demonstrated strong performance in initializing models. However, GHN still faces several challenges, including limited effectiveness in initializing larger models, poor performance on smaller datasets, and the requirement of task-specific GHN training, where each new task necessitates retraining the GHN model, leading to increased computational and storage overhead. To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called Task-Aware Learngene (TAL). Briefly, our approach pretrains a TAL model under the guidance of a well-trained model and then performs multi-task tuning to obtain a shared TAL model that enables parameter prediction based on both model architectures and task-specific characteristics. Extensive experiments show the superiority of TAL. Models initialized with TAL outperform those initialized using GHN method by an average of 24.39% in terms of accuracy across Decathlon datasets.
APA
Xu, J., Xia, S., Yang, X., Lv, J. & Geng, X.. (2025). Learngene Tells You How to Customize: Task-Aware Parameter Initialization at Flexible Scales. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:69806-69818 Available from https://proceedings.mlr.press/v267/xu25af.html.

Related Material