Abstract
The goal of low-level vision is to estimate an underlying scene (e.g., shape or reflectance) from an observed image. Real-world images and scenes can be very complex, conventionally requiring high dimensional representations that are difficult both to estimate and to store. We propose a low-dimensional representation, called a scene recipe that relies on the image itself to describe the complex scene configurations. The scene recipe is a formula telling how to transform the local image information to the desired scene quantities. In many situations, scene and image are closely related and it is possible to find such a functional relationship.
Shape recipes are an example: these are the regression coefficients that predict the bandpassed shape from local image data (in some cases, after a point non-linearity). This representation can have appealing properties such as slow variation over space and scale. We show how to exploit the slow variation over scale by first learning the recipes relating image to shape at low spatial frequencies, then applying those recipes at high spatial frequencies to infer high resolution shape, improving the initial shape estimate. Shape recipes implicitly contain information about lighting and materials and they may also be useful for material segmentation.
These scene representations always require that the image be available in order for the scene recipe to compute the shape or other scene quantity. In that sense, they are consistent with theories that suggest that the visual system uses the world as a visual memory, not storing in the brain what can be obtained by looking.