The Diffusion Duality

Subham Sekhar Sahoo; Justin Deschenaux; Aaron Gokaslan; Guanghan Wang; Justin T Chiu; Volodymyr Kuleshov

The Diffusion Duality

Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin T Chiu, Volodymyr Kuleshov

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:52584-52619, 2025.

Abstract

Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code and model checkpoints on the project page: https://s-sahoo.github.io/duo

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-sahoo25a,
  title = 	 {The Diffusion Duality},
  author =       {Sahoo, Subham Sekhar and Deschenaux, Justin and Gokaslan, Aaron and Wang, Guanghan and Chiu, Justin T and Kuleshov, Volodymyr},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {52584--52619},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/sahoo25a/sahoo25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/sahoo25a.html},
  abstract = 	 {Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code and model checkpoints on the project page: https://s-sahoo.github.io/duo}
}

Endnote

%0 Conference Paper
%T The Diffusion Duality
%A Subham Sekhar Sahoo
%A Justin Deschenaux
%A Aaron Gokaslan
%A Guanghan Wang
%A Justin T Chiu
%A Volodymyr Kuleshov
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-sahoo25a
%I PMLR
%P 52584--52619
%U https://proceedings.mlr.press/v267/sahoo25a.html
%V 267
%X Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code and model checkpoints on the project page: https://s-sahoo.github.io/duo

APA

Sahoo, S.S., Deschenaux, J., Gokaslan, A., Wang, G., Chiu, J.T. & Kuleshov, V.. (2025). The Diffusion Duality. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:52584-52619 Available from https://proceedings.mlr.press/v267/sahoo25a.html.

The Diffusion Duality

Abstract

Cite this Paper

Related Material