Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting

Sunny Sanyal; Hayden Prairie; Rudrajit Das; Ali Kavis; Sujay Sanghavi

Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting

Sunny Sanyal, Hayden Prairie, Rudrajit Das, Ali Kavis, Sujay Sanghavi

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:52922-52957, 2025.

Abstract

Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities, a phenomenon known as "catastrophic forgetting". This is especially an issue when one does not have access to the data and recipe used to develop the pre-trained model. Under this constraint, most existing methods for mitigating forgetting are inapplicable. To address this challenge, we propose a sample weighting scheme for the fine-tuning data solely based on the pre-trained model’s losses. Specifically, we upweight the easy samples on which the pre-trained model’s loss is low and vice versa to limit the drift from the pre-trained model. Our approach is orthogonal and yet complementary to existing methods; while such methods mostly operate on parameter or gradient space, we concentrate on the sample space. We theoretically analyze the impact of fine-tuning with our method in a linear setting, showing that it stalls learning in a certain subspace, which inhibits overfitting to the target task. We empirically demonstrate the efficacy of our method on both language and vision tasks. As an example, when fine-tuning Gemma 2 2B on MetaMathQA, our method results in only a $0.8$% drop in accuracy on GSM8K (another math dataset) compared to standard fine-tuning, while preserving $5.4$% more accuracy on the pre-training datasets.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-sanyal25a,
  title = 	 {Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting},
  author =       {Sanyal, Sunny and Prairie, Hayden and Das, Rudrajit and Kavis, Ali and Sanghavi, Sujay},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {52922--52957},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/sanyal25a/sanyal25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/sanyal25a.html},
  abstract = 	 {Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities, a phenomenon known as "catastrophic forgetting". This is especially an issue when one does not have access to the data and recipe used to develop the pre-trained model. Under this constraint, most existing methods for mitigating forgetting are inapplicable. To address this challenge, we propose a sample weighting scheme for the fine-tuning data solely based on the pre-trained model’s losses. Specifically, we upweight the easy samples on which the pre-trained model’s loss is low and vice versa to limit the drift from the pre-trained model. Our approach is orthogonal and yet complementary to existing methods; while such methods mostly operate on parameter or gradient space, we concentrate on the sample space. We theoretically analyze the impact of fine-tuning with our method in a linear setting, showing that it stalls learning in a certain subspace, which inhibits overfitting to the target task. We empirically demonstrate the efficacy of our method on both language and vision tasks. As an example, when fine-tuning Gemma 2 2B on MetaMathQA, our method results in only a $0.8$% drop in accuracy on GSM8K (another math dataset) compared to standard fine-tuning, while preserving $5.4$% more accuracy on the pre-training datasets.}
}

Endnote

%0 Conference Paper
%T Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting
%A Sunny Sanyal
%A Hayden Prairie
%A Rudrajit Das
%A Ali Kavis
%A Sujay Sanghavi
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-sanyal25a
%I PMLR
%P 52922--52957
%U https://proceedings.mlr.press/v267/sanyal25a.html
%V 267
%X Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities, a phenomenon known as "catastrophic forgetting". This is especially an issue when one does not have access to the data and recipe used to develop the pre-trained model. Under this constraint, most existing methods for mitigating forgetting are inapplicable. To address this challenge, we propose a sample weighting scheme for the fine-tuning data solely based on the pre-trained model’s losses. Specifically, we upweight the easy samples on which the pre-trained model’s loss is low and vice versa to limit the drift from the pre-trained model. Our approach is orthogonal and yet complementary to existing methods; while such methods mostly operate on parameter or gradient space, we concentrate on the sample space. We theoretically analyze the impact of fine-tuning with our method in a linear setting, showing that it stalls learning in a certain subspace, which inhibits overfitting to the target task. We empirically demonstrate the efficacy of our method on both language and vision tasks. As an example, when fine-tuning Gemma 2 2B on MetaMathQA, our method results in only a $0.8$% drop in accuracy on GSM8K (another math dataset) compared to standard fine-tuning, while preserving $5.4$% more accuracy on the pre-training datasets.

APA

Sanyal, S., Prairie, H., Das, R., Kavis, A. & Sanghavi, S.. (2025). Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:52922-52957 Available from https://proceedings.mlr.press/v267/sanyal25a.html.

Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting

Abstract

Cite this Paper

Related Material