On Task Vectors and Gradients

Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Giuseppe Alessio D’Inverno, Fabrizio Silvestri, Emanuele Rodolà
Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models, PMLR 322:398-417, 2026.

Abstract

Task arithmetic has emerged as a simple yet powerful technique for model merging, enabling the combination of multiple finetuned models into a single model. Despite its empirical success, a clear theoretical understanding of why and when it works has been lacking. This paper provides a rigorous theoretical foundation for task arithmetic by establishing a direct connection between task vectors and gradients of the task losses. We show that under standard gradient descent, a task vector generated from one epoch of finetuning is exactly equivalent to the negative gradient of the loss, scaled by the learning rate. For the practical multi-epoch setting, we prove that this equivalence holds approximately, with a second-order error term that we explicitly bound for feed-forward networks. Our empirical analysis across seven vision benchmarks corroborates our theory, demonstrating that the first-epoch gradient dominates the finetuning trajectory in both norm and direction. A key implication is that merging models finetuned for only a single epoch often yields performance comparable to merging fully converged models. These findings reframe task arithmetic as a form of approximate multitask learning, providing a clear rationale for its effectiveness and highlighting the critical role of early training dynamics in model merging.

Cite this Paper


BibTeX
@InProceedings{pmlr-v322-zhou26a, title = {On Task Vectors and Gradients}, author = {Zhou, Luca and Solombrino, Daniele and Crisostomi, Donato and Bucarelli, Maria Sofia and D'Inverno, Giuseppe Alessio and Silvestri, Fabrizio and Rodol\`{a}, Emanuele}, booktitle = {Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models}, pages = {398--417}, year = {2026}, editor = {Fumero, Marco and Domine, Clementine and L"ahner, Zorah and Cannistraci, Irene and Zhao, Bo and Williams, Alex}, volume = {322}, series = {Proceedings of Machine Learning Research}, month = {06 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v322/main/assets/zhou26a/zhou26a.pdf}, url = {https://proceedings.mlr.press/v322/zhou26a.html}, abstract = {Task arithmetic has emerged as a simple yet powerful technique for model merging, enabling the combination of multiple finetuned models into a single model. Despite its empirical success, a clear theoretical understanding of why and when it works has been lacking. This paper provides a rigorous theoretical foundation for task arithmetic by establishing a direct connection between task vectors and gradients of the task losses. We show that under standard gradient descent, a task vector generated from one epoch of finetuning is exactly equivalent to the negative gradient of the loss, scaled by the learning rate. For the practical multi-epoch setting, we prove that this equivalence holds approximately, with a second-order error term that we explicitly bound for feed-forward networks. Our empirical analysis across seven vision benchmarks corroborates our theory, demonstrating that the first-epoch gradient dominates the finetuning trajectory in both norm and direction. A key implication is that merging models finetuned for only a single epoch often yields performance comparable to merging fully converged models. These findings reframe task arithmetic as a form of approximate multitask learning, providing a clear rationale for its effectiveness and highlighting the critical role of early training dynamics in model merging.} }
Endnote
%0 Conference Paper %T On Task Vectors and Gradients %A Luca Zhou %A Daniele Solombrino %A Donato Crisostomi %A Maria Sofia Bucarelli %A Giuseppe Alessio D’Inverno %A Fabrizio Silvestri %A Emanuele Rodolà %B Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models %C Proceedings of Machine Learning Research %D 2026 %E Marco Fumero %E Clementine Domine %E Zorah L"ahner %E Irene Cannistraci %E Bo Zhao %E Alex Williams %F pmlr-v322-zhou26a %I PMLR %P 398--417 %U https://proceedings.mlr.press/v322/zhou26a.html %V 322 %X Task arithmetic has emerged as a simple yet powerful technique for model merging, enabling the combination of multiple finetuned models into a single model. Despite its empirical success, a clear theoretical understanding of why and when it works has been lacking. This paper provides a rigorous theoretical foundation for task arithmetic by establishing a direct connection between task vectors and gradients of the task losses. We show that under standard gradient descent, a task vector generated from one epoch of finetuning is exactly equivalent to the negative gradient of the loss, scaled by the learning rate. For the practical multi-epoch setting, we prove that this equivalence holds approximately, with a second-order error term that we explicitly bound for feed-forward networks. Our empirical analysis across seven vision benchmarks corroborates our theory, demonstrating that the first-epoch gradient dominates the finetuning trajectory in both norm and direction. A key implication is that merging models finetuned for only a single epoch often yields performance comparable to merging fully converged models. These findings reframe task arithmetic as a form of approximate multitask learning, providing a clear rationale for its effectiveness and highlighting the critical role of early training dynamics in model merging.
APA
Zhou, L., Solombrino, D., Crisostomi, D., Bucarelli, M.S., D’Inverno, G.A., Silvestri, F. & Rodolà, E.. (2026). On Task Vectors and Gradients. Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 322:398-417 Available from https://proceedings.mlr.press/v322/zhou26a.html.

Related Material