Compositional Generalization via Forced Rendering of Disentangled Latents

Qiyao Liang; Daoyuan Qian; Liu Ziyin; Ila R Fiete

Compositional Generalization via Forced Rendering of Disentangled Latents

Qiyao Liang, Daoyuan Qian, Liu Ziyin, Ila R Fiete

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:37370-37395, 2025.

Abstract

Composition—the ability to generate myriad variations from finite means—is believed to underlie powerful generalization. However, compositional generalization remains a key challenge for deep learning. A widely held assumption is that learning disentangled (factorized) representations naturally supports this kind of extrapolation. Yet, empirical results are mixed, with many generative models failing to recognize and compose factors to generate out-of-distribution (OOD) samples. In this work, we investigate a controlled 2D Gaussian "bump" generation task with fully disentangled $(x,y)$ inputs, demonstrating that standard generative architectures still fail in OOD regions when training with partial data, by re-entangling latent representations in subsequent layers. By examining the model’s learned kernels and manifold geometry, we show that this failure reflects a "memorization" strategy for generation via data superposition rather than via composition of the true factorized features. We show that when models are forced—through architectural modifications with regularization or curated training data—to render the disentangled latents into the full-dimensional representational (pixel) space, they can be highly data-efficient and effective at composing in OOD regions. These findings underscore that disentangled latents in an abstract representation are insufficient and show that if models can represent disentangled factors directly in the output representational space, it can achieve robust compositional generalization.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-liang25n,
  title = 	 {Compositional Generalization via Forced Rendering of Disentangled Latents},
  author =       {Liang, Qiyao and Qian, Daoyuan and Ziyin, Liu and Fiete, Ila R},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {37370--37395},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liang25n/liang25n.pdf},
  url = 	 {https://proceedings.mlr.press/v267/liang25n.html},
  abstract = 	 {Composition—the ability to generate myriad variations from finite means—is believed to underlie powerful generalization. However, compositional generalization remains a key challenge for deep learning. A widely held assumption is that learning disentangled (factorized) representations naturally supports this kind of extrapolation. Yet, empirical results are mixed, with many generative models failing to recognize and compose factors to generate out-of-distribution (OOD) samples. In this work, we investigate a controlled 2D Gaussian "bump" generation task with fully disentangled $(x,y)$ inputs, demonstrating that standard generative architectures still fail in OOD regions when training with partial data, by re-entangling latent representations in subsequent layers. By examining the model’s learned kernels and manifold geometry, we show that this failure reflects a "memorization" strategy for generation via data superposition rather than via composition of the true factorized features. We show that when models are forced—through architectural modifications with regularization or curated training data—to render the disentangled latents into the full-dimensional representational (pixel) space, they can be highly data-efficient and effective at composing in OOD regions. These findings underscore that disentangled latents in an abstract representation are insufficient and show that if models can represent disentangled factors directly in the output representational space, it can achieve robust compositional generalization.}
}

Endnote

%0 Conference Paper
%T Compositional Generalization via Forced Rendering of Disentangled Latents
%A Qiyao Liang
%A Daoyuan Qian
%A Liu Ziyin
%A Ila R Fiete
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-liang25n
%I PMLR
%P 37370--37395
%U https://proceedings.mlr.press/v267/liang25n.html
%V 267
%X Composition—the ability to generate myriad variations from finite means—is believed to underlie powerful generalization. However, compositional generalization remains a key challenge for deep learning. A widely held assumption is that learning disentangled (factorized) representations naturally supports this kind of extrapolation. Yet, empirical results are mixed, with many generative models failing to recognize and compose factors to generate out-of-distribution (OOD) samples. In this work, we investigate a controlled 2D Gaussian "bump" generation task with fully disentangled $(x,y)$ inputs, demonstrating that standard generative architectures still fail in OOD regions when training with partial data, by re-entangling latent representations in subsequent layers. By examining the model’s learned kernels and manifold geometry, we show that this failure reflects a "memorization" strategy for generation via data superposition rather than via composition of the true factorized features. We show that when models are forced—through architectural modifications with regularization or curated training data—to render the disentangled latents into the full-dimensional representational (pixel) space, they can be highly data-efficient and effective at composing in OOD regions. These findings underscore that disentangled latents in an abstract representation are insufficient and show that if models can represent disentangled factors directly in the output representational space, it can achieve robust compositional generalization.

APA

Liang, Q., Qian, D., Ziyin, L. & Fiete, I.R.. (2025). Compositional Generalization via Forced Rendering of Disentangled Latents. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:37370-37395 Available from https://proceedings.mlr.press/v267/liang25n.html.

Compositional Generalization via Forced Rendering of Disentangled Latents

Abstract

Cite this Paper

Related Material