Hindsight Merging: Diverse Data Generation with Language Models

Veniamin Veselovsky, Benedikt Stroebl, Gianluca Bencomo, Dilip Arumugam, Lisa Schut, Arvind Narayanan, Thomas L. Griffiths
Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, PMLR 286:4349-4369, 2025.

Abstract

Pre-training a language model equips it with a broad understanding of the world, while fine- tuning refines it into a helpful assistant. However, fine-tuning does not exclusively enhance task- specific behaviors but also suppresses some of the beneficial variability from pre-training. This reduction in diversity is partly due to the optimization process, which theoretically decreases model entropy in exchange for task performance. To counteract this, we introduce hindsight merging, a technique that combines a fine-tuned model with a previous training checkpoint using linear interpolation to restore entropy and improve performance. Hindsight-merged models retain strong instruction-following capabilities and alignment while displaying increased diversity present in the base model. Additionally, this results in improved inference scaling, achieving a consistent 20-50% increase in pass@10 relative to the instruction tuned model across a coding benchmark and series of models. Our findings suggest that hindsight merging is an effective strategy for generating diverse generations that follow instructions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v286-veselovsky25a, title = {Hindsight Merging: Diverse Data Generation with Language Models}, author = {Veselovsky, Veniamin and Stroebl, Benedikt and Bencomo, Gianluca and Arumugam, Dilip and Schut, Lisa and Narayanan, Arvind and Griffiths, Thomas L.}, booktitle = {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence}, pages = {4349--4369}, year = {2025}, editor = {Chiappa, Silvia and Magliacane, Sara}, volume = {286}, series = {Proceedings of Machine Learning Research}, month = {21--25 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v286/main/assets/veselovsky25a/veselovsky25a.pdf}, url = {https://proceedings.mlr.press/v286/veselovsky25a.html}, abstract = {Pre-training a language model equips it with a broad understanding of the world, while fine- tuning refines it into a helpful assistant. However, fine-tuning does not exclusively enhance task- specific behaviors but also suppresses some of the beneficial variability from pre-training. This reduction in diversity is partly due to the optimization process, which theoretically decreases model entropy in exchange for task performance. To counteract this, we introduce hindsight merging, a technique that combines a fine-tuned model with a previous training checkpoint using linear interpolation to restore entropy and improve performance. Hindsight-merged models retain strong instruction-following capabilities and alignment while displaying increased diversity present in the base model. Additionally, this results in improved inference scaling, achieving a consistent 20-50% increase in pass@10 relative to the instruction tuned model across a coding benchmark and series of models. Our findings suggest that hindsight merging is an effective strategy for generating diverse generations that follow instructions.} }
Endnote
%0 Conference Paper %T Hindsight Merging: Diverse Data Generation with Language Models %A Veniamin Veselovsky %A Benedikt Stroebl %A Gianluca Bencomo %A Dilip Arumugam %A Lisa Schut %A Arvind Narayanan %A Thomas L. Griffiths %B Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2025 %E Silvia Chiappa %E Sara Magliacane %F pmlr-v286-veselovsky25a %I PMLR %P 4349--4369 %U https://proceedings.mlr.press/v286/veselovsky25a.html %V 286 %X Pre-training a language model equips it with a broad understanding of the world, while fine- tuning refines it into a helpful assistant. However, fine-tuning does not exclusively enhance task- specific behaviors but also suppresses some of the beneficial variability from pre-training. This reduction in diversity is partly due to the optimization process, which theoretically decreases model entropy in exchange for task performance. To counteract this, we introduce hindsight merging, a technique that combines a fine-tuned model with a previous training checkpoint using linear interpolation to restore entropy and improve performance. Hindsight-merged models retain strong instruction-following capabilities and alignment while displaying increased diversity present in the base model. Additionally, this results in improved inference scaling, achieving a consistent 20-50% increase in pass@10 relative to the instruction tuned model across a coding benchmark and series of models. Our findings suggest that hindsight merging is an effective strategy for generating diverse generations that follow instructions.
APA
Veselovsky, V., Stroebl, B., Bencomo, G., Arumugam, D., Schut, L., Narayanan, A. & Griffiths, T.L.. (2025). Hindsight Merging: Diverse Data Generation with Language Models. Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 286:4349-4369 Available from https://proceedings.mlr.press/v286/veselovsky25a.html.

Related Material