Large Language Models to Diffusion Finetuning

Edoardo Cetin, Tianyu Zhao, Yujin Tang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:7035-7057, 2025.

Abstract

We propose a new finetuning method to provide pre-trained large language models (LMs) the ability to scale test-time compute through the diffusion framework. By increasing the number of diffusion steps, we show our finetuned models achieve monotonically increasing accuracy, directly translating to improved performance across downstream tasks. Furthermore, our finetuned models can expertly answer questions on specific topics by integrating powerful guidance techniques, and autonomously determine the compute required for a given problem by leveraging adaptive ODE solvers. Our method is applicable to any foundation model pre-trained with cross-entropy and does not modify any of its original weights, fully preserving its strong single-step generation capabilities. We show our method can be more effective and is fully compatible with traditional finetuning and search approaches, introducing an orthogonal new direction to unify the strengths of the autoregressive and diffusion frameworks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-cetin25a, title = {Large Language Models to Diffusion Finetuning}, author = {Cetin, Edoardo and Zhao, Tianyu and Tang, Yujin}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {7035--7057}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/cetin25a/cetin25a.pdf}, url = {https://proceedings.mlr.press/v267/cetin25a.html}, abstract = {We propose a new finetuning method to provide pre-trained large language models (LMs) the ability to scale test-time compute through the diffusion framework. By increasing the number of diffusion steps, we show our finetuned models achieve monotonically increasing accuracy, directly translating to improved performance across downstream tasks. Furthermore, our finetuned models can expertly answer questions on specific topics by integrating powerful guidance techniques, and autonomously determine the compute required for a given problem by leveraging adaptive ODE solvers. Our method is applicable to any foundation model pre-trained with cross-entropy and does not modify any of its original weights, fully preserving its strong single-step generation capabilities. We show our method can be more effective and is fully compatible with traditional finetuning and search approaches, introducing an orthogonal new direction to unify the strengths of the autoregressive and diffusion frameworks.} }
Endnote
%0 Conference Paper %T Large Language Models to Diffusion Finetuning %A Edoardo Cetin %A Tianyu Zhao %A Yujin Tang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-cetin25a %I PMLR %P 7035--7057 %U https://proceedings.mlr.press/v267/cetin25a.html %V 267 %X We propose a new finetuning method to provide pre-trained large language models (LMs) the ability to scale test-time compute through the diffusion framework. By increasing the number of diffusion steps, we show our finetuned models achieve monotonically increasing accuracy, directly translating to improved performance across downstream tasks. Furthermore, our finetuned models can expertly answer questions on specific topics by integrating powerful guidance techniques, and autonomously determine the compute required for a given problem by leveraging adaptive ODE solvers. Our method is applicable to any foundation model pre-trained with cross-entropy and does not modify any of its original weights, fully preserving its strong single-step generation capabilities. We show our method can be more effective and is fully compatible with traditional finetuning and search approaches, introducing an orthogonal new direction to unify the strengths of the autoregressive and diffusion frameworks.
APA
Cetin, E., Zhao, T. & Tang, Y.. (2025). Large Language Models to Diffusion Finetuning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:7035-7057 Available from https://proceedings.mlr.press/v267/cetin25a.html.

Related Material