Sharf: Shape-conditioned Radiance Fields from a Single View

Konstantinos Rematas, Ricardo Martin-Brualla, Vittorio Ferrari
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8948-8958, 2021.

Abstract

We present a method for estimating neural scenes representations of objects given only a single image. The core of our method is the estimation of a geometric scaffold for the object and its use as a guide for the reconstruction of the underlying radiance field. Our formulation is based on a generative process that first maps a latent code to a voxelized shape, and then renders it to an image, with the object appearance being controlled by a second latent code. During inference, we optimize both the latent codes and the networks to fit a test image of a new object. The explicit disentanglement of shape and appearance allows our model to be fine-tuned given a single image. We can then render new views in a geometrically consistent manner and they represent faithfully the input object. Additionally, our method is able to generalize to images outside of the training domain (more realistic renderings and even real photographs). Finally, the inferred geometric scaffold is itself an accurate estimate of the object’s 3D shape. We demonstrate in several experiments the effectiveness of our approach in both synthetic and real images.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-rematas21a, title = {Sharf: Shape-conditioned Radiance Fields from a Single View}, author = {Rematas, Konstantinos and Martin-Brualla, Ricardo and Ferrari, Vittorio}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {8948--8958}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/rematas21a/rematas21a.pdf}, url = {https://proceedings.mlr.press/v139/rematas21a.html}, abstract = {We present a method for estimating neural scenes representations of objects given only a single image. The core of our method is the estimation of a geometric scaffold for the object and its use as a guide for the reconstruction of the underlying radiance field. Our formulation is based on a generative process that first maps a latent code to a voxelized shape, and then renders it to an image, with the object appearance being controlled by a second latent code. During inference, we optimize both the latent codes and the networks to fit a test image of a new object. The explicit disentanglement of shape and appearance allows our model to be fine-tuned given a single image. We can then render new views in a geometrically consistent manner and they represent faithfully the input object. Additionally, our method is able to generalize to images outside of the training domain (more realistic renderings and even real photographs). Finally, the inferred geometric scaffold is itself an accurate estimate of the object’s 3D shape. We demonstrate in several experiments the effectiveness of our approach in both synthetic and real images.} }
Endnote
%0 Conference Paper %T Sharf: Shape-conditioned Radiance Fields from a Single View %A Konstantinos Rematas %A Ricardo Martin-Brualla %A Vittorio Ferrari %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-rematas21a %I PMLR %P 8948--8958 %U https://proceedings.mlr.press/v139/rematas21a.html %V 139 %X We present a method for estimating neural scenes representations of objects given only a single image. The core of our method is the estimation of a geometric scaffold for the object and its use as a guide for the reconstruction of the underlying radiance field. Our formulation is based on a generative process that first maps a latent code to a voxelized shape, and then renders it to an image, with the object appearance being controlled by a second latent code. During inference, we optimize both the latent codes and the networks to fit a test image of a new object. The explicit disentanglement of shape and appearance allows our model to be fine-tuned given a single image. We can then render new views in a geometrically consistent manner and they represent faithfully the input object. Additionally, our method is able to generalize to images outside of the training domain (more realistic renderings and even real photographs). Finally, the inferred geometric scaffold is itself an accurate estimate of the object’s 3D shape. We demonstrate in several experiments the effectiveness of our approach in both synthetic and real images.
APA
Rematas, K., Martin-Brualla, R. & Ferrari, V.. (2021). Sharf: Shape-conditioned Radiance Fields from a Single View. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8948-8958 Available from https://proceedings.mlr.press/v139/rematas21a.html.

Related Material