Abstract
We describe a training-based method applicable for various low-level vision problems of estimating scenes from images. The scene quantities to be estimated might be projected object velocities, surface shapes and reflectance patterns, or missing high frequency details. The input image data can be one picture or several from a sequence. We first generate a synthetic world of scenes and their corresponding rendered images, storing about 100,000 pairs of example local image and scene patches in a training database. Given an image to analyze, for each local image patch, we find from the training database a set of candidate scene interpretations consistent with the local image. We model the probabilistic relationship between neighboring scene patches as a Markov network, assigning one node to each scene patch. To specify the compatibility between scene candidates at neighboring nodes, we use overlapping scene patches, and assign compatibilities based on the agreement between scene values in the region of overlap. Finally, we use Bayesian belief propagation, a local, message-passing algorithm, to infer a best scene interpretation at each image patch. The training data provides candidate scene interpretations at each location; the belief propagation integrates this local evidence to form a single, best scene interpretation. We illustrate this approach for the problem of super-resolution (estimating missing high frequency details from a low-resolution image), showing good results on natural images. For the motion estimation problem in a “blobs world”, we show figure/ground discrimination, solution of the aperture problem, and filling-in arising from application of the same probabilistic machinery.