VCT: Training Consistency Models with Variational Noise Coupling

Gianluigi Silvestri, Luca Ambrogioni, Chieh-Hsin Lai, Yuhta Takida, Yuki Mitsufuji
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:55657-55683, 2025.

Abstract

Consistency Training (CT) has recently emerged as a strong alternative to diffusion models for image generation. However, non-distillation CT often suffers from high variance and instability, motivating ongoing research into its training dynamics. We propose Variational Consistency Training (VCT), a flexible and effective framework compatible with various forward kernels, including those in flow matching. Its key innovation is a learned noise-data coupling scheme inspired by Variational Autoencoders, where a data-dependent encoder models noise emission. This enables VCT to adaptively learn noise-to-data pairings, reducing training variance relative to the fixed, unsorted pairings in classical CT. Experiments on multiple image datasets demonstrate significant improvements: our method surpasses baselines, achieves state-of-the-art FID among non-distillation CT approaches on CIFAR-10, and matches SoTA performance on ImageNet 64x64 with only two sampling steps. Code is available at https://github.com/sony/vct.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-silvestri25a, title = {{VCT}: Training Consistency Models with Variational Noise Coupling}, author = {Silvestri, Gianluigi and Ambrogioni, Luca and Lai, Chieh-Hsin and Takida, Yuhta and Mitsufuji, Yuki}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {55657--55683}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/silvestri25a/silvestri25a.pdf}, url = {https://proceedings.mlr.press/v267/silvestri25a.html}, abstract = {Consistency Training (CT) has recently emerged as a strong alternative to diffusion models for image generation. However, non-distillation CT often suffers from high variance and instability, motivating ongoing research into its training dynamics. We propose Variational Consistency Training (VCT), a flexible and effective framework compatible with various forward kernels, including those in flow matching. Its key innovation is a learned noise-data coupling scheme inspired by Variational Autoencoders, where a data-dependent encoder models noise emission. This enables VCT to adaptively learn noise-to-data pairings, reducing training variance relative to the fixed, unsorted pairings in classical CT. Experiments on multiple image datasets demonstrate significant improvements: our method surpasses baselines, achieves state-of-the-art FID among non-distillation CT approaches on CIFAR-10, and matches SoTA performance on ImageNet 64x64 with only two sampling steps. Code is available at https://github.com/sony/vct.} }
Endnote
%0 Conference Paper %T VCT: Training Consistency Models with Variational Noise Coupling %A Gianluigi Silvestri %A Luca Ambrogioni %A Chieh-Hsin Lai %A Yuhta Takida %A Yuki Mitsufuji %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-silvestri25a %I PMLR %P 55657--55683 %U https://proceedings.mlr.press/v267/silvestri25a.html %V 267 %X Consistency Training (CT) has recently emerged as a strong alternative to diffusion models for image generation. However, non-distillation CT often suffers from high variance and instability, motivating ongoing research into its training dynamics. We propose Variational Consistency Training (VCT), a flexible and effective framework compatible with various forward kernels, including those in flow matching. Its key innovation is a learned noise-data coupling scheme inspired by Variational Autoencoders, where a data-dependent encoder models noise emission. This enables VCT to adaptively learn noise-to-data pairings, reducing training variance relative to the fixed, unsorted pairings in classical CT. Experiments on multiple image datasets demonstrate significant improvements: our method surpasses baselines, achieves state-of-the-art FID among non-distillation CT approaches on CIFAR-10, and matches SoTA performance on ImageNet 64x64 with only two sampling steps. Code is available at https://github.com/sony/vct.
APA
Silvestri, G., Ambrogioni, L., Lai, C., Takida, Y. & Mitsufuji, Y.. (2025). VCT: Training Consistency Models with Variational Noise Coupling. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:55657-55683 Available from https://proceedings.mlr.press/v267/silvestri25a.html.

Related Material