[edit]
SceneWeaver: Text-Driven Scene Generation with Geometry-aware Gaussian Splatting
Proceedings of the 16th Asian Conference on Machine Learning, PMLR 260:1288-1303, 2025.
Abstract
With the widespread use of virtual reality applications, 3D scene generation has become a challenging new research frontier. 3D scenes have highly complex structures, so it is crucial to ensure that the output is dense, coherent, and includes all necessary structures. Many current 3D scene generation methods rely on pre-trained text-to-image diffusion models and monocular depth estimators, but they often lack rich geometric constraint information within the scene, leading to geometric distortion in the generated results. Therefore, we propose a two-stage geometry-aware progressive scene generation framework, SceneWeaver, which creates diverse, high-quality 3D scenes from text or image inputs. In the first stage, we introduce a multi-level depth refinement mechanism combined with image inpainting and point cloud updating strategies to construct a high-quality initial point cloud. In the second stage, 3D Gaussians are initialized based on the point cloud and continuously optimized. To address the challenge of insufficient geometric constraints in the Gaussian Splatting optimization process, we utilize the rich appearance and geometry information within the scene to perform a geometry-aware optimization, resulting in high-quality scene generation results. Comprehensive experiments across multiple scenes demonstrate the significant potential and advantages of our framework compared with several baselines.