Abstract
Results from several laboratories suggest that the human object-recognition processes display only limited ability to generalize across changes in viewpoint for 3D objects. However, for a given viewpoint, recognition is remarkably robust against degradations in image quality (such as those caused by blurring) and appearance changes caused by illumination variations. The question we ask in this work is — What kind of internal representations can support such recognition performance? We suggest that a candidate answer may be found in the response properties of early visual neurons. Based on available neurophysiological evidence, we develop a scheme that conceptualizes early visual neurons as rapidly saturating contrast edge detectors with large supports. This idealization leads to a representation scheme wherein objects are encoded as sets of qualitative image measurements over coarse image regions. The use of qualitative measurements leads not only to a reduction in the problem's computational complexity, but also renders the representations invariant to sensor noise and significant changes in object appearance. We develop our ideas in the context of an ecologically important task — detecting variously illuminated human faces. Our approach uses qualitative photometric measurements to construct a face signature that is largely invariant to illumination changes. We have tested a computer implementation of this scheme on a large database of real images containing frontal faces and have found the results to be encouraging (70% hit rate and 10% false alarm rate). Importantly, additional tests with other object domains (full human bodies, cars and natural scenes) suggest that the representation scheme may be a relatively general object encoding strategy. We have also devised learning routines to automatically extract qualitative object signatures from example sets.