SceneWeaver: Text-Driven Scene Generation with Geometry-aware Gaussian Splatting

Xiaolu Hou, Mingcheng Li, Jiawei Chen, Dingkang Yang, Ziyun Qian, Lihua Zhang
Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:1288-1303, 2025.

Abstract

With the widespread use of virtual reality applications, 3D scene generation has become a challenging new research frontier. 3D scenes have highly complex structures, so it is crucial to ensure that the output is dense, coherent, and includes all necessary structures. Many current 3D scene generation methods rely on pre-trained text-to-image diffusion models and monocular depth estimators, but they often lack rich geometric constraint information within the scene, leading to geometric distortion in the generated results. Therefore, we propose a two-stage geometry-aware progressive scene generation framework, SceneWeaver, which creates diverse, high-quality 3D scenes from text or image inputs. In the first stage, we introduce a multi-level depth refinement mechanism combined with image inpainting and point cloud updating strategies to construct a high-quality initial point cloud. In the second stage, 3D Gaussians are initialized based on the point cloud and continuously optimized. To address the challenge of insufficient geometric constraints in the Gaussian Splatting optimization process, we utilize the rich appearance and geometry information within the scene to perform a geometry-aware optimization, resulting in high-quality scene generation results. Comprehensive experiments across multiple scenes demonstrate the significant potential and advantages of our framework compared with several baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v260-hou25a, title = {{SceneWeaver}: {T}ext-Driven Scene Generation with Geometry-aware Gaussian Splatting}, author = {Hou, Xiaolu and Li, Mingcheng and Chen, Jiawei and Yang, Dingkang and Qian, Ziyun and Zhang, Lihua}, booktitle = {Proceedings of the 16th Asian Conference on Machine Learning}, pages = {1288--1303}, year = {2025}, editor = {Nguyen, Vu and Lin, Hsuan-Tien}, volume = {260}, series = {Proceedings of Machine Learning Research}, month = {05--08 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v260/main/assets/hou25a/hou25a.pdf}, url = {https://proceedings.mlr.press/v260/hou25a.html}, abstract = {With the widespread use of virtual reality applications, 3D scene generation has become a challenging new research frontier. 3D scenes have highly complex structures, so it is crucial to ensure that the output is dense, coherent, and includes all necessary structures. Many current 3D scene generation methods rely on pre-trained text-to-image diffusion models and monocular depth estimators, but they often lack rich geometric constraint information within the scene, leading to geometric distortion in the generated results. Therefore, we propose a two-stage geometry-aware progressive scene generation framework, SceneWeaver, which creates diverse, high-quality 3D scenes from text or image inputs. In the first stage, we introduce a multi-level depth refinement mechanism combined with image inpainting and point cloud updating strategies to construct a high-quality initial point cloud. In the second stage, 3D Gaussians are initialized based on the point cloud and continuously optimized. To address the challenge of insufficient geometric constraints in the Gaussian Splatting optimization process, we utilize the rich appearance and geometry information within the scene to perform a geometry-aware optimization, resulting in high-quality scene generation results. Comprehensive experiments across multiple scenes demonstrate the significant potential and advantages of our framework compared with several baselines.} }
Endnote
%0 Conference Paper %T SceneWeaver: Text-Driven Scene Generation with Geometry-aware Gaussian Splatting %A Xiaolu Hou %A Mingcheng Li %A Jiawei Chen %A Dingkang Yang %A Ziyun Qian %A Lihua Zhang %B Proceedings of the 16th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Vu Nguyen %E Hsuan-Tien Lin %F pmlr-v260-hou25a %I PMLR %P 1288--1303 %U https://proceedings.mlr.press/v260/hou25a.html %V 260 %X With the widespread use of virtual reality applications, 3D scene generation has become a challenging new research frontier. 3D scenes have highly complex structures, so it is crucial to ensure that the output is dense, coherent, and includes all necessary structures. Many current 3D scene generation methods rely on pre-trained text-to-image diffusion models and monocular depth estimators, but they often lack rich geometric constraint information within the scene, leading to geometric distortion in the generated results. Therefore, we propose a two-stage geometry-aware progressive scene generation framework, SceneWeaver, which creates diverse, high-quality 3D scenes from text or image inputs. In the first stage, we introduce a multi-level depth refinement mechanism combined with image inpainting and point cloud updating strategies to construct a high-quality initial point cloud. In the second stage, 3D Gaussians are initialized based on the point cloud and continuously optimized. To address the challenge of insufficient geometric constraints in the Gaussian Splatting optimization process, we utilize the rich appearance and geometry information within the scene to perform a geometry-aware optimization, resulting in high-quality scene generation results. Comprehensive experiments across multiple scenes demonstrate the significant potential and advantages of our framework compared with several baselines.
APA
Hou, X., Li, M., Chen, J., Yang, D., Qian, Z. & Zhang, L.. (2025). SceneWeaver: Text-Driven Scene Generation with Geometry-aware Gaussian Splatting. Proceedings of the 16th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 260:1288-1303 Available from https://proceedings.mlr.press/v260/hou25a.html.

Related Material