Analyzing and Mitigating Model Collapse in Reflow Methods

Huminhao Zhu, Fangyikang Wang, Tianyu Ding, Qing Qu, Zhihui Zhu
Conference on Parsimony and Learning, PMLR 328:314-340, 2026.

Abstract

Generative models increasingly encounter synthetic data produced by earlier model snapshots, either unintentionally through data contamination or deliberately through self-training procedures such as Reflow. In rectified flow and related diffusion/flow systems, Reflow retrains on model-generated samples to straighten trajectories and accelerate sampling, but repeated self-training can degrade sample quality and diversity. We provide a mechanistic analysis of this failure mode and a principled mitigation strategy. Using a linear denoising autoencoder (DAE) as a tractable surrogate for Reflow-style recursion, we show that under purely synthetic recursive training the end-to-end linear map contracts: its operator norm decays to zero at a geometric rate, reflecting a progressive loss of representational power. We further prove that augmenting each Reflow round with a fixed fraction of real data prevents this degeneration by keeping the operator norm bounded away from zero. Finally, we validate that the qualitative trends implied by the theory are observable in practical Reflow pipelines on toy settings and image benchmarks, and we show that simple real-data–augmented Reflow schemes preserve Reflow’s sampling-speed benefits while maintaining image quality.

Cite this Paper


BibTeX
@InProceedings{pmlr-v328-zhu26a, title = {Analyzing and Mitigating Model Collapse in Reflow Methods}, author = {Zhu, Huminhao and Wang, Fangyikang and Ding, Tianyu and Qu, Qing and Zhu, Zhihui}, booktitle = {Conference on Parsimony and Learning}, pages = {314--340}, year = {2026}, editor = {Burkholz, Rebekka and Liu, Shiwei and Ravishankar, Saiprasad and Redman, William and Huang, Wei and Su, Weijie and Zhu, Zhihui}, volume = {328}, series = {Proceedings of Machine Learning Research}, month = {23--26 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v328/main/assets/zhu26a/zhu26a.pdf}, url = {https://proceedings.mlr.press/v328/zhu26a.html}, abstract = {Generative models increasingly encounter synthetic data produced by earlier model snapshots, either unintentionally through data contamination or deliberately through self-training procedures such as Reflow. In rectified flow and related diffusion/flow systems, Reflow retrains on model-generated samples to straighten trajectories and accelerate sampling, but repeated self-training can degrade sample quality and diversity. We provide a mechanistic analysis of this failure mode and a principled mitigation strategy. Using a linear denoising autoencoder (DAE) as a tractable surrogate for Reflow-style recursion, we show that under purely synthetic recursive training the end-to-end linear map contracts: its operator norm decays to zero at a geometric rate, reflecting a progressive loss of representational power. We further prove that augmenting each Reflow round with a fixed fraction of real data prevents this degeneration by keeping the operator norm bounded away from zero. Finally, we validate that the qualitative trends implied by the theory are observable in practical Reflow pipelines on toy settings and image benchmarks, and we show that simple real-data–augmented Reflow schemes preserve Reflow’s sampling-speed benefits while maintaining image quality.} }
Endnote
%0 Conference Paper %T Analyzing and Mitigating Model Collapse in Reflow Methods %A Huminhao Zhu %A Fangyikang Wang %A Tianyu Ding %A Qing Qu %A Zhihui Zhu %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2026 %E Rebekka Burkholz %E Shiwei Liu %E Saiprasad Ravishankar %E William Redman %E Wei Huang %E Weijie Su %E Zhihui Zhu %F pmlr-v328-zhu26a %I PMLR %P 314--340 %U https://proceedings.mlr.press/v328/zhu26a.html %V 328 %X Generative models increasingly encounter synthetic data produced by earlier model snapshots, either unintentionally through data contamination or deliberately through self-training procedures such as Reflow. In rectified flow and related diffusion/flow systems, Reflow retrains on model-generated samples to straighten trajectories and accelerate sampling, but repeated self-training can degrade sample quality and diversity. We provide a mechanistic analysis of this failure mode and a principled mitigation strategy. Using a linear denoising autoencoder (DAE) as a tractable surrogate for Reflow-style recursion, we show that under purely synthetic recursive training the end-to-end linear map contracts: its operator norm decays to zero at a geometric rate, reflecting a progressive loss of representational power. We further prove that augmenting each Reflow round with a fixed fraction of real data prevents this degeneration by keeping the operator norm bounded away from zero. Finally, we validate that the qualitative trends implied by the theory are observable in practical Reflow pipelines on toy settings and image benchmarks, and we show that simple real-data–augmented Reflow schemes preserve Reflow’s sampling-speed benefits while maintaining image quality.
APA
Zhu, H., Wang, F., Ding, T., Qu, Q. & Zhu, Z.. (2026). Analyzing and Mitigating Model Collapse in Reflow Methods. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 328:314-340 Available from https://proceedings.mlr.press/v328/zhu26a.html.

Related Material