Abstract
One of the central questions in human visual object recognition is whether there is a bottleneck of processing which involves detecting key features prior to recognizing whole objects. If visual recognition is dominantly a bottom-up process it implies that detection of local features precedes higher-level processing. In contrast if recognition uses holistic processing then high-level information provides top-down control for detection of features.
To address this question we carried out psychophysics experiments on 12 subjects who were asked to detect partial and whole face images embedded in visual noise. Control images only contained Gaussian random noise with the same noise variance. To equate the amount of information available in whole and partial images we adjusted the level of noise variance according to the revealed area so that both partial and whole images contained the same total contrast energy (Pelli et al 2003). The features revealed in partial images were selected via a computational model (Ullman et al 2002) that finds category-specific features with high mutual information for the category and high likelihood for the category.
We compared detection performance on the whole images, partial images containing 1, 2 and 3 features respectively and partial images in which the location of the features were spatially rearranged. We found a progressive increase in detection accuracy when more features were revealed despite the progressively reduced local saliency of individual features. Further, detection accuracy was significantly higher for whole images than any of the partial images. Preliminary data suggests that having the normal configuration of features provides better detection performance than rearranged images. Overall these data imply that the detection of local features is not the bottleneck of processing for face perception and even a simple task as face detection is influenced by holistic information.