Deep Scene-scale Material Estimation from Multi-view Indoor Captures

Computers & Graphics (2022)

Abstract

The movie and video game industries have adopted photogrammetry as a way to create digital 3D assets from multiple photographs of a real-world scene. But photogrammetry algorithms typically output an RGB texture atlas of the scene that only serves as visual guidance for skilled artists to create material maps suitable for physically-based rendering. We present a learning-based approach that automatically produces digital assets ready for physically-based rendering, by estimating approximate material maps from multi-view captures of indoor scenes that are used with retopologized geometry. We base our approach on a material estimation Convolutional Neural Network (CNN) that we execute on each input image. We leverage the view-dependent visual cues provided by the multiple observations of the scene by gathering, for each pixel of a given image, the color of the corresponding point in other images. This image-space CNN provides us with an ensemble of predictions, which we merge in texture space as the last step of our approach. Our results demonstrate that the recovered assets can be directly used for physically-based rendering and editing of real indoor scenes from any viewpoint and novel lighting. Our method generates approximate material maps in a fraction of time compared to the closest previous solutions.

Summary

Teaser

We present a method which takes multiple photographs of a scene as input (a) and predicts surface materials in the form of material maps corresponding to each input view. Merging these image-space predictions in texture space yields a texture atlas (b) that can be mapped onto retopologized geometry to produce digital 3D assets ready for full physically-based rendering, i.e., rendering from any viewpoint, changing/adding lights and objects, for example in (c) the golden statuette and the white mug have been added to the scene that has modified lighting and is rendered from a viewpoint not in the input.

Supplementary Video

BibTeX

@Article{PRBD22,
  author       = "Prakash, Siddhant and Rainer, Gilles and Bousseau, Adrien and Drettakis, George",
  title        = "Deep Scene-scale Material Estimation from Multi-view Indoor Captures",
  journal      = "Computers & Graphics (2022)",
  volume       = "109",
  pages        = "15-29",
  month        = "October",
  year         = "2022",
  doi          = "https://doi.org/10.1016/j.cag.2022.09.010",
  url          = "https://www.sciencedirect.com/science/article/pii/S0097849322001789"}
 

Acknowledgments and Funding

This research was funded by the ERC Advanced grant FUNGRAPH No 788065. The authors are grateful to Inria Sophia Antipolis-Méditerranée "Nef" computation cluster and the OPAL infrastructure from Université Côte d'Azur for providing resources and support. We thank the associate editor and the anonymous reviewers for their insightful comments which helped improve the manuscript. We also thank B. Bitterli for the scenes in his rendering repository. The authors also extend their thanks to Felix Hähnlein and Emilie Yu for their helpful discussion and support for the project and especially thank the 3D artist Stefania Kousoula for mesh refinement and re-topology.