Retrieval-Augmented Score Distillation for Text-to-3D Generation

Junyoung Seo; Susung Hong; Wooseok Jang; Inès Hyeonsu Kim; Min-Seop Kwak; Doyup Lee; Seungryong Kim

Retrieval-Augmented Score Distillation for Text-to-3D Generation

Junyoung Seo, Susung Hong, Wooseok Jang, Inès Hyeonsu Kim, Min-Seop Kwak, Doyup Lee, Seungryong Kim

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:44190-44211, 2024.

Abstract

Text-to-3D generation has achieved significant success by incorporating powerful 2D diffusion models, but insufficient 3D prior knowledge also leads to the inconsistency of 3D geometry. Recently, since large-scale multi-view datasets have been released, fine-tuning the diffusion model on the multi-view datasets becomes a mainstream to solve the 3D inconsistency problem. However, it has confronted with fundamental difficulties regarding the limited quality and diversity of 3D data, compared with 2D data. To sidestep these trade-offs, we explore a retrieval-augmented approach tailored for score distillation, dubbed ReDream. We postulate that both expressiveness of 2D diffusion models and geometric consistency of 3D assets can be fully leveraged by employing the semantically relevant assets directly within the optimization process. To this end, we introduce novel framework for retrieval-based quality enhancement in text-to-3D generation. We leverage the retrieved asset to incorporate its geometric prior in the variational objective and adapt the diffusion model’s 2D prior toward view consistency, achieving drastic improvements in both geometry and fidelity of generated scenes. We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency. Project page is available at https://ku-cvlab.github.io/ReDream/.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-seo24a,
  title = 	 {Retrieval-Augmented Score Distillation for Text-to-3{D} Generation},
  author =       {Seo, Junyoung and Hong, Susung and Jang, Wooseok and Kim, In\`{e}s Hyeonsu and Kwak, Min-Seop and Lee, Doyup and Kim, Seungryong},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {44190--44211},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/seo24a/seo24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/seo24a.html},
  abstract = 	 {Text-to-3D generation has achieved significant success by incorporating powerful 2D diffusion models, but insufficient 3D prior knowledge also leads to the inconsistency of 3D geometry. Recently, since large-scale multi-view datasets have been released, fine-tuning the diffusion model on the multi-view datasets becomes a mainstream to solve the 3D inconsistency problem. However, it has confronted with fundamental difficulties regarding the limited quality and diversity of 3D data, compared with 2D data. To sidestep these trade-offs, we explore a retrieval-augmented approach tailored for score distillation, dubbed ReDream. We postulate that both expressiveness of 2D diffusion models and geometric consistency of 3D assets can be fully leveraged by employing the semantically relevant assets directly within the optimization process. To this end, we introduce novel framework for retrieval-based quality enhancement in text-to-3D generation. We leverage the retrieved asset to incorporate its geometric prior in the variational objective and adapt the diffusion model’s 2D prior toward view consistency, achieving drastic improvements in both geometry and fidelity of generated scenes. We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency. Project page is available at https://ku-cvlab.github.io/ReDream/.}
}

Endnote

%0 Conference Paper
%T Retrieval-Augmented Score Distillation for Text-to-3D Generation
%A Junyoung Seo
%A Susung Hong
%A Wooseok Jang
%A Inès Hyeonsu Kim
%A Min-Seop Kwak
%A Doyup Lee
%A Seungryong Kim
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-seo24a
%I PMLR
%P 44190--44211
%U https://proceedings.mlr.press/v235/seo24a.html
%V 235
%X Text-to-3D generation has achieved significant success by incorporating powerful 2D diffusion models, but insufficient 3D prior knowledge also leads to the inconsistency of 3D geometry. Recently, since large-scale multi-view datasets have been released, fine-tuning the diffusion model on the multi-view datasets becomes a mainstream to solve the 3D inconsistency problem. However, it has confronted with fundamental difficulties regarding the limited quality and diversity of 3D data, compared with 2D data. To sidestep these trade-offs, we explore a retrieval-augmented approach tailored for score distillation, dubbed ReDream. We postulate that both expressiveness of 2D diffusion models and geometric consistency of 3D assets can be fully leveraged by employing the semantically relevant assets directly within the optimization process. To this end, we introduce novel framework for retrieval-based quality enhancement in text-to-3D generation. We leverage the retrieved asset to incorporate its geometric prior in the variational objective and adapt the diffusion model’s 2D prior toward view consistency, achieving drastic improvements in both geometry and fidelity of generated scenes. We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency. Project page is available at https://ku-cvlab.github.io/ReDream/.

APA


Seo, J., Hong, S., Jang, W., Kim, I.H., Kwak, M., Lee, D. & Kim, S.. (2024). Retrieval-Augmented Score Distillation for Text-to-3D Generation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:44190-44211 Available from https://proceedings.mlr.press/v235/seo24a.html.

Retrieval-Augmented Score Distillation for Text-to-3D Generation

Abstract

Cite this Paper

Related Material