NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion

Jiatao Gu, Alex Trevithick, Kai-En Lin, Joshua M. Susskind, Christian Theobalt, Lingjie Liu, Ravi Ramamoorthi
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:11808-11826, 2023.

Abstract

Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input. Existing approaches condition neural radiance fields (NeRF) on local image features, projecting points to the input image plane, and aggregating 2D features to perform volume rendering. However, under severe occlusion, this projection fails to resolve uncertainty, resulting in blurry renderings that lack details. In this work, we propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test-time. We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views. Our approach significantly outperforms existing NeRF-based and geometry-free approaches on challenging datasets including ShapeNet, ABO, and Clevr3D.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-gu23a, title = {{N}erf{D}iff: Single-image View Synthesis with {N}e{RF}-guided Distillation from 3{D}-aware Diffusion}, author = {Gu, Jiatao and Trevithick, Alex and Lin, Kai-En and Susskind, Joshua M. and Theobalt, Christian and Liu, Lingjie and Ramamoorthi, Ravi}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {11808--11826}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/gu23a/gu23a.pdf}, url = {https://proceedings.mlr.press/v202/gu23a.html}, abstract = {Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input. Existing approaches condition neural radiance fields (NeRF) on local image features, projecting points to the input image plane, and aggregating 2D features to perform volume rendering. However, under severe occlusion, this projection fails to resolve uncertainty, resulting in blurry renderings that lack details. In this work, we propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test-time. We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views. Our approach significantly outperforms existing NeRF-based and geometry-free approaches on challenging datasets including ShapeNet, ABO, and Clevr3D.} }
Endnote
%0 Conference Paper %T NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion %A Jiatao Gu %A Alex Trevithick %A Kai-En Lin %A Joshua M. Susskind %A Christian Theobalt %A Lingjie Liu %A Ravi Ramamoorthi %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-gu23a %I PMLR %P 11808--11826 %U https://proceedings.mlr.press/v202/gu23a.html %V 202 %X Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input. Existing approaches condition neural radiance fields (NeRF) on local image features, projecting points to the input image plane, and aggregating 2D features to perform volume rendering. However, under severe occlusion, this projection fails to resolve uncertainty, resulting in blurry renderings that lack details. In this work, we propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test-time. We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views. Our approach significantly outperforms existing NeRF-based and geometry-free approaches on challenging datasets including ShapeNet, ABO, and Clevr3D.
APA
Gu, J., Trevithick, A., Lin, K., Susskind, J.M., Theobalt, C., Liu, L. & Ramamoorthi, R.. (2023). NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:11808-11826 Available from https://proceedings.mlr.press/v202/gu23a.html.

Related Material