Abstract
Humans are able to interpret an object's 3D shape from a single image. If the lighting is changed, or the object's BRDF is changed, there may be large changes in the image, yet the perceived shape tends to remain the same. Humans can also perceive 3D shape from line drawings, although the image corresponds to no realistic physical process. There have been many attempts to give similar capabilities to computer vision systems, and this has led to multiple specialized systems such as "shape from shading" (assuming Lambertian surfaces), "shape from texture" (assuming a stationary texture process), and "shape from line drawings," (assuming lines are drawn at specific places). However, a unified solution has been lacking. Here, we take a step toward unification. We have built a computer vision system that can be trained to estimate 3D shape from multiple types of image data. The system operates by matching localized image patches with shape candidates from a training database of exemplars. We construct a graph of compatibility relationships between nearby patches and infer a globally consistent interpretation using loopy belief propagation. A major difficulty in applying an example-based approach to shape interpretation is the combinatorial explosion of shape possibilities that occurs at occluding contours. We introduce a new shape patch representation that allows for flexible matching of overlapping patches, avoiding the combinatorial explosion by letting patches explain only the parts of the image they best fit. Our method can thus interpret objects with multiple layers of depth and self-occlusion, and, because it is data-driven, it can interpret these objects whether they are rendered with matte surfaces, glossy surfaces, textured surfaces, or line drawings. In tests, our system proposes shape interpretations that are almost always qualitatively correct.
Meeting abstract presented at VSS 2012