Diffusion Adversarial Post-Training for One-Step Video Generation

Shanchuan Lin, Xin Xia, Yuxi Ren, Ceyuan Yang, Xuefeng Xiao, Lu Jiang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:37959-37974, 2025.

Abstract

The diffusion models are widely used for image and video generation, but their iterative generation process is slow and expansive. While existing distillation approaches have demonstrated the potential for one-step generation in the image domain, they still suffer from significant quality degradation. In this work, we propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation. To improve the training stability and quality, we introduce several improvements to the model architecture and training procedures, along with an approximated R1 regularization objective. Empirically, our experiments show that our adversarial post-trained model can generate two-second, 1280x720, 24fps videos in real-time using a single forward evaluation step. Additionally, our model is capable of generating 1024px images in a single step, achieving quality comparable to state-of-the-art methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-lin25m, title = {Diffusion Adversarial Post-Training for One-Step Video Generation}, author = {Lin, Shanchuan and Xia, Xin and Ren, Yuxi and Yang, Ceyuan and Xiao, Xuefeng and Jiang, Lu}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {37959--37974}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lin25m/lin25m.pdf}, url = {https://proceedings.mlr.press/v267/lin25m.html}, abstract = {The diffusion models are widely used for image and video generation, but their iterative generation process is slow and expansive. While existing distillation approaches have demonstrated the potential for one-step generation in the image domain, they still suffer from significant quality degradation. In this work, we propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation. To improve the training stability and quality, we introduce several improvements to the model architecture and training procedures, along with an approximated R1 regularization objective. Empirically, our experiments show that our adversarial post-trained model can generate two-second, 1280x720, 24fps videos in real-time using a single forward evaluation step. Additionally, our model is capable of generating 1024px images in a single step, achieving quality comparable to state-of-the-art methods.} }
Endnote
%0 Conference Paper %T Diffusion Adversarial Post-Training for One-Step Video Generation %A Shanchuan Lin %A Xin Xia %A Yuxi Ren %A Ceyuan Yang %A Xuefeng Xiao %A Lu Jiang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-lin25m %I PMLR %P 37959--37974 %U https://proceedings.mlr.press/v267/lin25m.html %V 267 %X The diffusion models are widely used for image and video generation, but their iterative generation process is slow and expansive. While existing distillation approaches have demonstrated the potential for one-step generation in the image domain, they still suffer from significant quality degradation. In this work, we propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation. To improve the training stability and quality, we introduce several improvements to the model architecture and training procedures, along with an approximated R1 regularization objective. Empirically, our experiments show that our adversarial post-trained model can generate two-second, 1280x720, 24fps videos in real-time using a single forward evaluation step. Additionally, our model is capable of generating 1024px images in a single step, achieving quality comparable to state-of-the-art methods.
APA
Lin, S., Xia, X., Ren, Y., Yang, C., Xiao, X. & Jiang, L.. (2025). Diffusion Adversarial Post-Training for One-Step Video Generation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:37959-37974 Available from https://proceedings.mlr.press/v267/lin25m.html.

Related Material