[edit]
Learngene Tells You How to Customize: Task-Aware Parameter Initialization at Flexible Scales
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:69806-69818, 2025.
Abstract
Appropriate parameter initialization strategies are essential for reducing the high computational costs of training large pretrained models in various task scenarios. Graph HyperNetwork (GHN), a parameter initialization method, has recently demonstrated strong performance in initializing models. However, GHN still faces several challenges, including limited effectiveness in initializing larger models, poor performance on smaller datasets, and the requirement of task-specific GHN training, where each new task necessitates retraining the GHN model, leading to increased computational and storage overhead. To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called Task-Aware Learngene (TAL). Briefly, our approach pretrains a TAL model under the guidance of a well-trained model and then performs multi-task tuning to obtain a shared TAL model that enables parameter prediction based on both model architectures and task-specific characteristics. Extensive experiments show the superiority of TAL. Models initialized with TAL outperform those initialized using GHN method by an average of 24.39% in terms of accuracy across Decathlon datasets.