What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation

Gabriele Merlin, Vedant Nanda, Ruchit Rawal, Mariya Toneva
Proceedings of The 2nd Conference on Lifelong Learning Agents, PMLR 232:601-619, 2023.

Abstract

The pretrain-finetune paradigm usually improves downstream performance over training a model from scratch on the same task, becoming commonplace across many areas of machine learning. While pretraining is empirically observed to be beneficial for a range of tasks, there is not a clear understanding yet of the reasons for this effect. In this work, we examine the relationship between pretrained vision transformers and the corresponding finetuned versions on several benchmark datasets and tasks. We present new metrics that specifically investigate the degree to which invariances learned by a pretrained model are retained or forgotten during finetuning. Using these metrics, we present a suite of empirical findings, including that pretraining induces transferable invariances in shallow layers and that invariances from deeper pretrained layers are compressed towards shallower layers during finetuning. Together, these findings contribute to understanding some of the reasons for the successes of pretrained models and the changes that a pretrained model undergoes when finetuned on a downstream task.

Cite this Paper


BibTeX
@InProceedings{pmlr-v232-merlin23a, title = {What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation}, author = {Merlin, Gabriele and Nanda, Vedant and Rawal, Ruchit and Toneva, Mariya}, booktitle = {Proceedings of The 2nd Conference on Lifelong Learning Agents}, pages = {601--619}, year = {2023}, editor = {Chandar, Sarath and Pascanu, Razvan and Sedghi, Hanie and Precup, Doina}, volume = {232}, series = {Proceedings of Machine Learning Research}, month = {22--25 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v232/merlin23a/merlin23a.pdf}, url = {https://proceedings.mlr.press/v232/merlin23a.html}, abstract = {The pretrain-finetune paradigm usually improves downstream performance over training a model from scratch on the same task, becoming commonplace across many areas of machine learning. While pretraining is empirically observed to be beneficial for a range of tasks, there is not a clear understanding yet of the reasons for this effect. In this work, we examine the relationship between pretrained vision transformers and the corresponding finetuned versions on several benchmark datasets and tasks. We present new metrics that specifically investigate the degree to which invariances learned by a pretrained model are retained or forgotten during finetuning. Using these metrics, we present a suite of empirical findings, including that pretraining induces transferable invariances in shallow layers and that invariances from deeper pretrained layers are compressed towards shallower layers during finetuning. Together, these findings contribute to understanding some of the reasons for the successes of pretrained models and the changes that a pretrained model undergoes when finetuned on a downstream task. } }
Endnote
%0 Conference Paper %T What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation %A Gabriele Merlin %A Vedant Nanda %A Ruchit Rawal %A Mariya Toneva %B Proceedings of The 2nd Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2023 %E Sarath Chandar %E Razvan Pascanu %E Hanie Sedghi %E Doina Precup %F pmlr-v232-merlin23a %I PMLR %P 601--619 %U https://proceedings.mlr.press/v232/merlin23a.html %V 232 %X The pretrain-finetune paradigm usually improves downstream performance over training a model from scratch on the same task, becoming commonplace across many areas of machine learning. While pretraining is empirically observed to be beneficial for a range of tasks, there is not a clear understanding yet of the reasons for this effect. In this work, we examine the relationship between pretrained vision transformers and the corresponding finetuned versions on several benchmark datasets and tasks. We present new metrics that specifically investigate the degree to which invariances learned by a pretrained model are retained or forgotten during finetuning. Using these metrics, we present a suite of empirical findings, including that pretraining induces transferable invariances in shallow layers and that invariances from deeper pretrained layers are compressed towards shallower layers during finetuning. Together, these findings contribute to understanding some of the reasons for the successes of pretrained models and the changes that a pretrained model undergoes when finetuned on a downstream task.
APA
Merlin, G., Nanda, V., Rawal, R. & Toneva, M.. (2023). What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation. Proceedings of The 2nd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 232:601-619 Available from https://proceedings.mlr.press/v232/merlin23a.html.

Related Material