A new method for converting a single RGB-D input image into a 3D photo has been proposed by a team of researchers from Virginia Tech, National Tsing Hua University and Facebook.

It is based on a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view.

The research team used a Layered Depth Image with explicit pixel connectivity as underlying representation, and present a learning-based inpainting model that synthesizes new local color-and-depth content into the occluded region in a spatial context-aware manner. The resulting 3D photos can be efficiently rendered with motion parallax using standard graphics engines.

“We validate the effectiveness of our method on a wide range of challenging everyday scenes and show less artifacts compared with the state of the arts,” the researchers stated.

Literally the team uses deep learning-based image inpainting model that can synthesize color and depth structures in regions occluded in the original view.

The resulting images compared to those created with previous state-of-the-art approaches (the method, which is based on a standard CNN) showed fewer artifacts during the image conversion process.

The model was trained using an NVIDIA V100 GPU with the cuDNN-accelerated PyTorch deep learning framework. The model can be trained using any image dataset without the need for annotated data. For this project, the team used the MS COCO dataset, with the pretrained  MegaDepth model first published by Cornell University researchers in 2018.

Project source code is available on GitHub.


Tags: , , , , , , , , , ,