Improving Generation Quality of Long-Tailed Diffusion via Disentangled Latent Representations

Esther Rodriguez; Monica Welfert; Samuel McDowell; Nathan Stromberg; Julian Antolin Camarena; Lalitha Sankar

Improving Generation Quality of Long-Tailed Diffusion via Disentangled Latent Representations

Esther Rodriguez, Monica Welfert, Samuel McDowell, Nathan Stromberg, Julian Antolin Camarena, Lalitha Sankar

Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models, PMLR 322:199-211, 2026.

Abstract

Diffusion models have achieved impressive performance in generating high-quality and diverse synthetic data. However, their success typically assumes a class-balanced training distribution. In real-world settings, multi-class data often follow a long-tailed distribution, where standard diffusion models struggle—producing low-diversity and lower-quality samples for underrepresented (tail) classes. While this degradation is well-documented, its underlying cause remains poorly understood. In this work, we investigate the behavior of diffusion models trained on long-tailed datasets and identify a key issue: the latent representations (from the bottleneck layer of the U-Net) for tail class subspaces exhibit significant overlap with those of head classes, leading to feature borrowing and poor generation quality. Importantly, we show that this is not merely due to limited data per class, but that the relative class imbalance significantly contributes to this phenomenon. To address this, we propose **CO**ntrastive **R**egularization for **A**ligning **L**atents (CORAL), a contrastive latent alignment framework that leverages supervised contrastive losses to encourage well-separated latent class representations. Experiments demonstrate that CORAL significantly improves both the diversity and visual quality of samples generated for tail classes relative to state-of-the-art methods.

Cite this Paper

BibTeX

@InProceedings{pmlr-v322-rodriguez26a,
  title = 	 {Improving Generation Quality of Long-Tailed Diffusion via Disentangled Latent Representations},
  author =       {Rodriguez, Esther and Welfert, Monica and McDowell, Samuel and Stromberg, Nathan and Camarena, Julian Antolin and Sankar, Lalitha},
  booktitle = 	 {Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models},
  pages = 	 {199--211},
  year = 	 {2026},
  editor = 	 {Fumero, Marco and Domine, Clementine and L"ahner, Zorah and Cannistraci, Irene and Zhao, Bo and Williams, Alex},
  volume = 	 {322},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v322/main/assets/rodriguez26a/rodriguez26a.pdf},
  url = 	 {https://proceedings.mlr.press/v322/rodriguez26a.html},
  abstract = 	 {Diffusion models have achieved impressive performance in generating high-quality and diverse synthetic data. However, their success typically assumes a class-balanced training distribution. In real-world settings, multi-class data often follow a long-tailed distribution, where standard diffusion models struggle—producing low-diversity and lower-quality samples for underrepresented (tail) classes. While this degradation is well-documented, its underlying cause remains poorly understood. In this work, we investigate the behavior of diffusion models trained on long-tailed datasets and identify a key issue: the latent representations (from the bottleneck layer of the U-Net) for tail class subspaces exhibit significant overlap with those of head classes, leading to feature borrowing and poor generation quality. Importantly, we show that this is not merely due to limited data per class, but that the relative class imbalance significantly contributes to this phenomenon. To address this, we propose **CO**ntrastive **R**egularization for **A**ligning **L**atents (CORAL), a contrastive latent alignment framework that leverages supervised contrastive losses to encourage well-separated latent class representations. Experiments demonstrate that CORAL significantly improves both the diversity and visual quality of samples generated for tail classes relative to state-of-the-art methods.}
}

Endnote

%0 Conference Paper
%T Improving Generation Quality of Long-Tailed Diffusion via Disentangled Latent Representations
%A Esther Rodriguez
%A Monica Welfert
%A Samuel McDowell
%A Nathan Stromberg
%A Julian Antolin Camarena
%A Lalitha Sankar
%B Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models
%C Proceedings of Machine Learning Research
%D 2026
%E Marco Fumero
%E Clementine Domine
%E Zorah L"ahner
%E Irene Cannistraci
%E Bo Zhao
%E Alex Williams	
%F pmlr-v322-rodriguez26a
%I PMLR
%P 199--211
%U https://proceedings.mlr.press/v322/rodriguez26a.html
%V 322
%X Diffusion models have achieved impressive performance in generating high-quality and diverse synthetic data. However, their success typically assumes a class-balanced training distribution. In real-world settings, multi-class data often follow a long-tailed distribution, where standard diffusion models struggle—producing low-diversity and lower-quality samples for underrepresented (tail) classes. While this degradation is well-documented, its underlying cause remains poorly understood. In this work, we investigate the behavior of diffusion models trained on long-tailed datasets and identify a key issue: the latent representations (from the bottleneck layer of the U-Net) for tail class subspaces exhibit significant overlap with those of head classes, leading to feature borrowing and poor generation quality. Importantly, we show that this is not merely due to limited data per class, but that the relative class imbalance significantly contributes to this phenomenon. To address this, we propose **CO**ntrastive **R**egularization for **A**ligning **L**atents (CORAL), a contrastive latent alignment framework that leverages supervised contrastive losses to encourage well-separated latent class representations. Experiments demonstrate that CORAL significantly improves both the diversity and visual quality of samples generated for tail classes relative to state-of-the-art methods.

APA

Rodriguez, E., Welfert, M., McDowell, S., Stromberg, N., Camarena, J.A. & Sankar, L.. (2026). Improving Generation Quality of Long-Tailed Diffusion via Disentangled Latent Representations. Proceedings of UniReps: the Third Edition of the Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 322:199-211 Available from https://proceedings.mlr.press/v322/rodriguez26a.html.

Improving Generation Quality of Long-Tailed Diffusion via Disentangled Latent Representations

Abstract

Cite this Paper

Related Material