Given the RGBD input, we first extract the geometric and texture pixel feature using two encoders. Then, we construct the continuous surface representation upon the discrete surface feature. Next, we introduce a two-stage paradigm to learn the generalizable geometric and texture prior, optimized via multiple objectives. Finally, the learnt prior can be further optimized on a specific scene to obatin a high-fidelity reconstruction.
Here we show the per-scene reconstruction results on ScanNet. With the learnt NFP, we can achieve the high-fidelity reconstruction with photo-realisitc texture within 20 mins.
Zoom in the model viewer by scrolling. You can toggle the “Single Sided” option in Model Inspector (pressing I key) to enable back-face culling (see through walls). Select “Matcap” to inspect the geometry without textures.
ManhattanSDF* | MonoSDF* | Ours (NFP) |
---|
Note that the ManhattanSDF* and MonoSDF* are trained under the same setting with ground truth depths as ours.
We additionally show the feed-forwarding reconstruction results without any optimizations, which could further demonstrate the generalizability of our learnt priors. Comparing with the existing works which require time-consuming per-scene optimization, our method can achieve comparable results in around 10 sceonds. Feed-forwarding reconstruction results are shown in the left column, while the per-scene optimizaiton results are shown in the right column as the reference.
The video is captured from the living room of Isaac Deutsch@NVIDIA. We would like to thank Isaac Deutsch for sharing this data.
Given a sinlge RGB-D image, our approach can also generate some nearby views via learnt neural priors. The following results are generated from the input images shown below.
Given an RGB-D image, we propose to decompose the NFPs into the geometric neural prior and the texuture neural prior, and contruct the signed distance fields and the radiance fields from a continuous surface representation,
With the pretrained NFPs, we could achieve the reconstruction by a single feedwarding step without any optimization. To obtain high-fidelity reconstruction, we further optimize the priors along with the pretrained decoders.
@article{}