Continual Learners are Incremental Model Generalizers

Jaehong Yoon; Sung Ju Hwang; Yue Cao

Continual Learners are Incremental Model Generalizers

Jaehong Yoon, Sung Ju Hwang, Yue Cao

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:40129-40146, 2023.

Abstract

Motivated by the efficiency and rapid convergence of pre-trained models for solving downstream tasks, this paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. We find that, in both supervised and unsupervised CL, the transfer quality of representations does not show a noticeable degradation of fine-tuning performance but rather increases gradually. This is because CL models can learn improved task-general features when easily forgetting task-specific knowledge. Based on this observation, we suggest a new unsupervised CL framework with masked modeling, which aims to capture fluent task-generic representation during training. Furthermore, we propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks. The model fine-tuned with GLAD achieves competitive performance and can also be used as a good pre-trained model itself. We believe this paper breaks the barriers between pre-training and fine-tuning steps and leads to a sustainable learning framework in which the continual learner incrementally improves model generalization, yielding better transfer to unseen tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-yoon23b,
  title = 	 {Continual Learners are Incremental Model Generalizers},
  author =       {Yoon, Jaehong and Hwang, Sung Ju and Cao, Yue},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {40129--40146},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/yoon23b/yoon23b.pdf},
  url = 	 {https://proceedings.mlr.press/v202/yoon23b.html},
  abstract = 	 {Motivated by the efficiency and rapid convergence of pre-trained models for solving downstream tasks, this paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. We find that, in both supervised and unsupervised CL, the transfer quality of representations does not show a noticeable degradation of fine-tuning performance but rather increases gradually. This is because CL models can learn improved task-general features when easily forgetting task-specific knowledge. Based on this observation, we suggest a new unsupervised CL framework with masked modeling, which aims to capture fluent task-generic representation during training. Furthermore, we propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks. The model fine-tuned with GLAD achieves competitive performance and can also be used as a good pre-trained model itself. We believe this paper breaks the barriers between pre-training and fine-tuning steps and leads to a sustainable learning framework in which the continual learner incrementally improves model generalization, yielding better transfer to unseen tasks.}
}

Endnote

%0 Conference Paper
%T Continual Learners are Incremental Model Generalizers
%A Jaehong Yoon
%A Sung Ju Hwang
%A Yue Cao
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-yoon23b
%I PMLR
%P 40129--40146
%U https://proceedings.mlr.press/v202/yoon23b.html
%V 202
%X Motivated by the efficiency and rapid convergence of pre-trained models for solving downstream tasks, this paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. We find that, in both supervised and unsupervised CL, the transfer quality of representations does not show a noticeable degradation of fine-tuning performance but rather increases gradually. This is because CL models can learn improved task-general features when easily forgetting task-specific knowledge. Based on this observation, we suggest a new unsupervised CL framework with masked modeling, which aims to capture fluent task-generic representation during training. Furthermore, we propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks. The model fine-tuned with GLAD achieves competitive performance and can also be used as a good pre-trained model itself. We believe this paper breaks the barriers between pre-training and fine-tuning steps and leads to a sustainable learning framework in which the continual learner incrementally improves model generalization, yielding better transfer to unseen tasks.

APA


Yoon, J., Hwang, S.J. & Cao, Y.. (2023). Continual Learners are Incremental Model Generalizers. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:40129-40146 Available from https://proceedings.mlr.press/v202/yoon23b.html.

Continual Learners are Incremental Model Generalizers

Abstract

Cite this Paper

Related Material