Abstract
A human observer perceives the Mona Lisa neither as paint on a flat canvas nor as a strange shape with uniform albedo, but instead as a familiar woman with light skin and dark hair. This suggests a statistical formulation of the problem, where certain shapes and albedos are more likely than others. We address the problem of recovering the most likely albedo and shape that explain a single image, using techniques from natural image statistics, but applied separately to the albedo and geometry of a scene.
Our technique is based on multiscale generalizations of the Roth & Black Field of Experts, which was originally proposed for image denoising. The FOE consists of a filter bank and a series of “experts” which each model the heavy-tailed response to one of the filters. A multiscale optimization algorithm is presented for searching over the space of shapes and albedos such that the likelihoods of the FOE models are maximized while the target image is exactly reconstructed. We find that multiscale priors, representations, and optimization are central to the success of our approach.
Our approach can be considered a generalization of “shape from shading” algorithms which is robust to variable albedo, or as a generalization of “intrinsic image” algorithms which explicitly recovers shape, rather than simply recovering shading. Our technique solves a superset of these two problems, and outperforms the previous best individual algorithms (such as those based on Retinex) on problems such as recovering shape and albedo from images of terrain and faces, on the MIT Intrinsic Images dataset.
Our results provide a normative model of how humans may solve this fundamental vision task, and suggests that other work in natural image statistics may aid our understanding of shape and albedo interpretation.