TL;DR We use a D2-DRDF model to predict a 3D implicit function from a single input image. Unlike other methods, D2-DRDF does not depend on mesh supervision during training and can directly operate with raw RGB-D data obtained from scene captures.
Sample results obtained from previously unseen images sourced from the Matterport3D (top) and OmniData (bottom) datasets. D2-DRDF exhibits the capability to reconstruct hidden sections of the floor and the rear portion of the couch.
Abstract |
We introduce a method that can learn to predict scene-level implicit functions for 3D reconstruction from posed RGBD data.
At test time, our system maps a previously unseen RGB image to a 3D reconstruction of a scene via implicit functions.
While implicit functions for 3D reconstruction have often been tied to meshes, we show that we can train one using only a set of posed RGBD images.
This setting may help 3D reconstruction unlock the sea of accelerometer+RGBD data that is coming with new phones.
Our system, D2-DRDF, can match and sometimes outperform current methods that use mesh supervision and shows better robustness to sparse data.
Overview |
In this paper, we propose a method for reconstructing 3D scenes, including occluded regions, based on RGB images that have not been previously seen. To train our approach, we utilize posed RGB and depth data. Our model represents the 3D scene using an the Directed Ray Distance Function (DRDF).
In this work, we demonstrate how to geometrically supervise this DRDF function by leveraging partial observations obtained from auxiliary views. We summarize our key insights as follows:
Approach |
We present an approach to train a model to predict 3D from single images.
Our model is supervised with Posed RGBD data, below we highlight the key ideas.
![]() |
Learning from Auxiliary Views.
For each red ray originating from the reference camera (R), we extract depth information from an auxiliary image view for points along the ray. Views (a), (b), and (c) on the right capture distinct occluded segments along the ray, providing valuable free-space information. This information enables the creation of penalty functions to train the DRDF function.
![]() |
Segment Types.
When the ray from the reference camera is seen by an auxiliary views, there are segments of freespace. Depending on how these segments start and end, they place different constraints on the DRDF. Here we show a segment that starts with a disocclusion and ends with an intersection. The space between the s and e events is unoccupied and we convert this information to a penalty function that is used to train the model.
We show an interactive demo of this penalty plot in the next section for different segment types |
![]() |
Method Overview. During the training process, or when considering a specific ray from the reference view, we utilize auxiliary views to determine the free-space segments along the ray. Subsequently, for each 3D point on this ray, we employ our network to predict the DRDF value and calculate the associated penalty. During inference, our network is tasked with predicting the DRDF function for points within the image frustum of a single image. For more information on the DRDF function, you can visit this link.
Interactive Demo for Segment Penalty Plots |
We demonstrate the influence of different segments along the ray on the imposition of distinct penalty functions for predicted DRDF valuesn show below. On the X-axis, we interactively select the positions of Intersection (I) and Occlusion (O) events along the ray. The placement of these events determines which DRDF functions are penalized as inconsistent with the observed intersection or occlusion. We generate a heat map to represent the penalty associated with the DRDF value (Y-axis) for a specific point along the ray (X-axis). Regions depicted in dark red indicate high penalty magnitude, while light grey regions indicate low penalty.
The key property of the DRDF function is for any point along the ray, \(z \in [0, Z]\), if \(DRDF(z)= d\) then there is an intersection at point \( z + d \) on the ray. All the penalty segments below are the implicition of this equation. |
Overview Video |
Paper |
CodeComing soon... |
[GitHub] |
Acknowledgements |