Disentangled 3D Scene Generation with Layout Learning

Dave Epstein, Ben Poole, Ben Mildenhall, Alexei A Efros, Aleksander Holynski
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:12547-12559, 2024.

Abstract

We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs—each representing its own object—along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-epstein24a, title = {Disentangled 3{D} Scene Generation with Layout Learning}, author = {Epstein, Dave and Poole, Ben and Mildenhall, Ben and Efros, Alexei A and Holynski, Aleksander}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {12547--12559}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/epstein24a/epstein24a.pdf}, url = {https://proceedings.mlr.press/v235/epstein24a.html}, abstract = {We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs—each representing its own object—along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation.} }
Endnote
%0 Conference Paper %T Disentangled 3D Scene Generation with Layout Learning %A Dave Epstein %A Ben Poole %A Ben Mildenhall %A Alexei A Efros %A Aleksander Holynski %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-epstein24a %I PMLR %P 12547--12559 %U https://proceedings.mlr.press/v235/epstein24a.html %V 235 %X We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs—each representing its own object—along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation.
APA
Epstein, D., Poole, B., Mildenhall, B., Efros, A.A. & Holynski, A.. (2024). Disentangled 3D Scene Generation with Layout Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:12547-12559 Available from https://proceedings.mlr.press/v235/epstein24a.html.

Related Material