An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates

Albin Soutif, Simone Magistri, Joost van de Weijer, Andrew D. Bagdanov
Proceedings of The 3rd Conference on Lifelong Learning Agents, PMLR 274:996-1012, 2025.

Abstract

Broad, open source availability of large pretrained foundation models on the internet through platforms such as HuggingFace has taken the world of practical deep learning by storm. A classical pipeline for neural network training now typically consists of finetuning these pretrained network on a small target dataset instead of training from scratch. In the case of large models this can be done even on modest hardware using a low rank training technique known as Low-Rank Adaptation (LoRA). While Low Rank training has already been studied in the continual learning setting, existing works often consider storing the learned adapter along with the existing model but rarely attempt to modify the weights of the pretrained model by merging the LoRA with the existing weights after finishing the training of each task. In this article we investigate this setting and study the impact of LoRA rank on the forgetting of the pretraining foundation task and on the plasticity and forgetting of subsequent ones. We observe that this rank has an important impact on forgetting of both the pretraining and downstream tasks. We also observe that vision transformers finetuned in that way exhibit a sort of “contextual” forgetting, a behaviour that we do not observe for residual networks and that we believe has not been observed yet in previous continual learning works.

Cite this Paper


BibTeX
@InProceedings{pmlr-v274-soutif25a, title = {An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates}, author = {Soutif, Albin and Magistri, Simone and Weijer, Joost van de and Bagdanov, Andrew D.}, booktitle = {Proceedings of The 3rd Conference on Lifelong Learning Agents}, pages = {996--1012}, year = {2025}, editor = {Lomonaco, Vincenzo and Melacci, Stefano and Tuytelaars, Tinne and Chandar, Sarath and Pascanu, Razvan}, volume = {274}, series = {Proceedings of Machine Learning Research}, month = {29 Jul--01 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v274/main/assets/soutif25a/soutif25a.pdf}, url = {https://proceedings.mlr.press/v274/soutif25a.html}, abstract = {Broad, open source availability of large pretrained foundation models on the internet through platforms such as HuggingFace has taken the world of practical deep learning by storm. A classical pipeline for neural network training now typically consists of finetuning these pretrained network on a small target dataset instead of training from scratch. In the case of large models this can be done even on modest hardware using a low rank training technique known as Low-Rank Adaptation (LoRA). While Low Rank training has already been studied in the continual learning setting, existing works often consider storing the learned adapter along with the existing model but rarely attempt to modify the weights of the pretrained model by merging the LoRA with the existing weights after finishing the training of each task. In this article we investigate this setting and study the impact of LoRA rank on the forgetting of the pretraining foundation task and on the plasticity and forgetting of subsequent ones. We observe that this rank has an important impact on forgetting of both the pretraining and downstream tasks. We also observe that vision transformers finetuned in that way exhibit a sort of “contextual” forgetting, a behaviour that we do not observe for residual networks and that we believe has not been observed yet in previous continual learning works.} }
Endnote
%0 Conference Paper %T An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates %A Albin Soutif %A Simone Magistri %A Joost van de Weijer %A Andrew D. Bagdanov %B Proceedings of The 3rd Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2025 %E Vincenzo Lomonaco %E Stefano Melacci %E Tinne Tuytelaars %E Sarath Chandar %E Razvan Pascanu %F pmlr-v274-soutif25a %I PMLR %P 996--1012 %U https://proceedings.mlr.press/v274/soutif25a.html %V 274 %X Broad, open source availability of large pretrained foundation models on the internet through platforms such as HuggingFace has taken the world of practical deep learning by storm. A classical pipeline for neural network training now typically consists of finetuning these pretrained network on a small target dataset instead of training from scratch. In the case of large models this can be done even on modest hardware using a low rank training technique known as Low-Rank Adaptation (LoRA). While Low Rank training has already been studied in the continual learning setting, existing works often consider storing the learned adapter along with the existing model but rarely attempt to modify the weights of the pretrained model by merging the LoRA with the existing weights after finishing the training of each task. In this article we investigate this setting and study the impact of LoRA rank on the forgetting of the pretraining foundation task and on the plasticity and forgetting of subsequent ones. We observe that this rank has an important impact on forgetting of both the pretraining and downstream tasks. We also observe that vision transformers finetuned in that way exhibit a sort of “contextual” forgetting, a behaviour that we do not observe for residual networks and that we believe has not been observed yet in previous continual learning works.
APA
Soutif, A., Magistri, S., Weijer, J.v.d. & Bagdanov, A.D.. (2025). An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates. Proceedings of The 3rd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 274:996-1012 Available from https://proceedings.mlr.press/v274/soutif25a.html.

Related Material