On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm

Zhanpeng Zhou, Zijun Chen, Yilan Chen, Bo Zhang, Junchi Yan
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:61854-61884, 2024.

Abstract

The pretraining-finetuning paradigm has become the prevailing trend in modern deep learning. In this work, we discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL). Specifically, we show that if we linearly interpolate the weights of two finetuned models, the features in the weight-interpolated model are often approximately equal to the linear interpolation of features in two finetuned models at each layer. We provide comprehensive empirical evidence supporting that CTL consistently occurs for finetuned models that start from the same pretrained checkpoint. We conjecture that in the pretraining-finetuning paradigm, neural networks approximately function as linear maps, mapping from the parameter space to the feature space. Based on this viewpoint, our study unveils novel insights into explaining model merging/editing, particularly by translating operations from the parameter space to the feature space. Furthermore, we delve deeper into the root cause for the emergence of CTL, highlighting the role of pretraining.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zhou24e, title = {On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm}, author = {Zhou, Zhanpeng and Chen, Zijun and Chen, Yilan and Zhang, Bo and Yan, Junchi}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {61854--61884}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhou24e/zhou24e.pdf}, url = {https://proceedings.mlr.press/v235/zhou24e.html}, abstract = {The pretraining-finetuning paradigm has become the prevailing trend in modern deep learning. In this work, we discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL). Specifically, we show that if we linearly interpolate the weights of two finetuned models, the features in the weight-interpolated model are often approximately equal to the linear interpolation of features in two finetuned models at each layer. We provide comprehensive empirical evidence supporting that CTL consistently occurs for finetuned models that start from the same pretrained checkpoint. We conjecture that in the pretraining-finetuning paradigm, neural networks approximately function as linear maps, mapping from the parameter space to the feature space. Based on this viewpoint, our study unveils novel insights into explaining model merging/editing, particularly by translating operations from the parameter space to the feature space. Furthermore, we delve deeper into the root cause for the emergence of CTL, highlighting the role of pretraining.} }
Endnote
%0 Conference Paper %T On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm %A Zhanpeng Zhou %A Zijun Chen %A Yilan Chen %A Bo Zhang %A Junchi Yan %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zhou24e %I PMLR %P 61854--61884 %U https://proceedings.mlr.press/v235/zhou24e.html %V 235 %X The pretraining-finetuning paradigm has become the prevailing trend in modern deep learning. In this work, we discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL). Specifically, we show that if we linearly interpolate the weights of two finetuned models, the features in the weight-interpolated model are often approximately equal to the linear interpolation of features in two finetuned models at each layer. We provide comprehensive empirical evidence supporting that CTL consistently occurs for finetuned models that start from the same pretrained checkpoint. We conjecture that in the pretraining-finetuning paradigm, neural networks approximately function as linear maps, mapping from the parameter space to the feature space. Based on this viewpoint, our study unveils novel insights into explaining model merging/editing, particularly by translating operations from the parameter space to the feature space. Furthermore, we delve deeper into the root cause for the emergence of CTL, highlighting the role of pretraining.
APA
Zhou, Z., Chen, Z., Chen, Y., Zhang, B. & Yan, J.. (2024). On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:61854-61884 Available from https://proceedings.mlr.press/v235/zhou24e.html.

Related Material