Abstract
The Retinex model of lightness perception relies on assumptions about the statistics of the “intrinsic images.” Reflectance is assumed to be piecewise constant, and illumination to be smoothly varying. This works for so-called Mondrians, but illumination isn't reliably smooth in natural scenes. Consider an image of crumpled paper, where numerous light-dark edges are caused by shading. Retinex classifies the edges as paint, but humans are not fooled. In order to discover more sophisticated statistics, we have used supervised learning on natural and synthetic scenes for which we have ground truth. Given a patch of image, the machine learns the probable constraints on the intrinsic images that made it. As with Retinex, the estimation occurs not in the pixel domain but in another domain such as the gradient domain. This system can learn to distinguish features that arise from different sources, by taking account of the neighborhood in which the feature occurs. For instance, an edge-like feature that is near a corner-like feature is more likely to be due to reflectance than one on an extended contour. The non-linear processing is used to estimate linear constraints, and the intrinsic images are retrieved with a pseudoinverse. The resulting system gives improved performance over prior systems, and suggests a variety of statistical rules that might be exploited by the human visual system.
Supported by NTT, NSF, and ONR/MURI