On Task Vectors and Gradients

Luca Zhou; Daniele Solombrino; Donato Crisostomi; Maria Sofia Bucarelli; Giuseppe Alessio D’Inverno; Fabrizio Silvestri; Emanuele Rodolà

On Task Vectors and Gradients

Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Giuseppe Alessio D’Inverno, Fabrizio Silvestri, Emanuele Rodolà

Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models, PMLR 322:398-417, 2026.

Abstract

Task arithmetic has emerged as a simple yet powerful technique for model merging, enabling the combination of multiple finetuned models into a single model. Despite its empirical success, a clear theoretical understanding of why and when it works has been lacking. This paper provides a rigorous theoretical foundation for task arithmetic by establishing a direct connection between task vectors and gradients of the task losses. We show that under standard gradient descent, a task vector generated from one epoch of finetuning is exactly equivalent to the negative gradient of the loss, scaled by the learning rate. For the practical multi-epoch setting, we prove that this equivalence holds approximately, with a second-order error term that we explicitly bound for feed-forward networks. Our empirical analysis across seven vision benchmarks corroborates our theory, demonstrating that the first-epoch gradient dominates the finetuning trajectory in both norm and direction. A key implication is that merging models finetuned for only a single epoch often yields performance comparable to merging fully converged models. These findings reframe task arithmetic as a form of approximate multitask learning, providing a clear rationale for its effectiveness and highlighting the critical role of early training dynamics in model merging.

Cite this Paper

BibTeX

@InProceedings{pmlr-v322-zhou26a,
  title = 	 {On Task Vectors and Gradients},
  author =       {Zhou, Luca and Solombrino, Daniele and Crisostomi, Donato and Bucarelli, Maria Sofia and D'Inverno, Giuseppe Alessio and Silvestri, Fabrizio and Rodol\`{a}, Emanuele},
  booktitle = 	 {Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models},
  pages = 	 {398--417},
  year = 	 {2026},
  editor = 	 {Fumero, Marco and Domine, Clementine and L"ahner, Zorah and Cannistraci, Irene and Zhao, Bo and Williams, Alex},
  volume = 	 {322},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v322/main/assets/zhou26a/zhou26a.pdf},
  url = 	 {https://proceedings.mlr.press/v322/zhou26a.html},
  abstract = 	 {Task arithmetic has emerged as a simple yet powerful technique for model merging, enabling the combination of multiple finetuned models into a single model. Despite its empirical success, a clear theoretical understanding of why and when it works has been lacking. This paper provides a rigorous theoretical foundation for task arithmetic by establishing a direct connection between task vectors and gradients of the task losses. We show that under standard gradient descent, a task vector generated from one epoch of finetuning is exactly equivalent to the negative gradient of the loss, scaled by the learning rate. For the practical multi-epoch setting, we prove that this equivalence holds approximately, with a second-order error term that we explicitly bound for feed-forward networks. Our empirical analysis across seven vision benchmarks corroborates our theory, demonstrating that the first-epoch gradient dominates the finetuning trajectory in both norm and direction. A key implication is that merging models finetuned for only a single epoch often yields performance comparable to merging fully converged models. These findings reframe task arithmetic as a form of approximate multitask learning, providing a clear rationale for its effectiveness and highlighting the critical role of early training dynamics in model merging.}
}

Endnote

%0 Conference Paper
%T On Task Vectors and Gradients
%A Luca Zhou
%A Daniele Solombrino
%A Donato Crisostomi
%A Maria Sofia Bucarelli
%A Giuseppe Alessio D’Inverno
%A Fabrizio Silvestri
%A Emanuele Rodolà
%B Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models
%C Proceedings of Machine Learning Research
%D 2026
%E Marco Fumero
%E Clementine Domine
%E Zorah L"ahner
%E Irene Cannistraci
%E Bo Zhao
%E Alex Williams	
%F pmlr-v322-zhou26a
%I PMLR
%P 398--417
%U https://proceedings.mlr.press/v322/zhou26a.html
%V 322
%X Task arithmetic has emerged as a simple yet powerful technique for model merging, enabling the combination of multiple finetuned models into a single model. Despite its empirical success, a clear theoretical understanding of why and when it works has been lacking. This paper provides a rigorous theoretical foundation for task arithmetic by establishing a direct connection between task vectors and gradients of the task losses. We show that under standard gradient descent, a task vector generated from one epoch of finetuning is exactly equivalent to the negative gradient of the loss, scaled by the learning rate. For the practical multi-epoch setting, we prove that this equivalence holds approximately, with a second-order error term that we explicitly bound for feed-forward networks. Our empirical analysis across seven vision benchmarks corroborates our theory, demonstrating that the first-epoch gradient dominates the finetuning trajectory in both norm and direction. A key implication is that merging models finetuned for only a single epoch often yields performance comparable to merging fully converged models. These findings reframe task arithmetic as a form of approximate multitask learning, providing a clear rationale for its effectiveness and highlighting the critical role of early training dynamics in model merging.

APA

Zhou, L., Solombrino, D., Crisostomi, D., Bucarelli, M.S., D’Inverno, G.A., Silvestri, F. & Rodolà, E.. (2026). On Task Vectors and Gradients. Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 322:398-417 Available from https://proceedings.mlr.press/v322/zhou26a.html.

On Task Vectors and Gradients

Abstract

Cite this Paper

Related Material