3D-RelNet: Joint Object and Relational Network for 3D Prediction

Nilesh Kulkarni¹

Ishan Misra²

Shubham Tulsiani²

Abhinav Gupta¹

¹Carnegie Mellon University

²Facebook AI Research

We study the problem of layout estimation in 3D by reasoning about relationships between objects. Given an image and object detection boxes, we first predict the 3D pose (translation, rotation, scale) of each object and the relative pose between each pair of objects. We combine these predictions and ensure consistent relationships between objects to predict the final 3D pose of each object. (b) Output: An example result of our method that takes as input the 2D image and generates the 3D layout.

We propose an approach to predict the 3D shape and pose for the objects present in a scene. Existing learning based methods that pursue this goal make independent predictions per object, and do not leverage the relationships amongst them. We argue that reasoning about these relationships is crucial, and present an approach to incorporate these in a 3D prediction framework. In addition to independent per-object predictions, we predict pairwise relations in the form of relative 3D pose, and demonstrate that these can be easily incorporated to improve object level estimates. We report performance across different datasets (SUNCG, NYUv2), and show that our approach significantly improves over independent prediction approaches while also outper- forming alternate implicit reasoning methods.

Paper

Kulkarni, Misra, Tulsiani, Gupta.

3D-RelNet: Joint Object and Relational Network for 3D Prediction.

[pdf]

[Bibtex]

Code

[GitHub]

Acknowledgements

When this work was done IM was at CMU, and ST was at UC Berkeley. We thank Saurabh Gupta for his help with setting up the evaluation for NYUv2 dataset. This webpage template was borrowed from some colorful folks.