On average, subjects' identification accuracy was markedly worse for the isolated parts, 55%, compared to the 73% accuracy for identifying face composites,
t(10) = 4.2,
p = 0.002, even though the face backgrounds were identical for the target and foil.
Figure 4a shows identification performance for each of the three features (eyes, nose, and mouth), and for each target–foil combination of the exemplars, such as eyes1 versus eyes2 (indicated by e1e2), with the feature either in isolation or in the identical face context, respectively. All nine combinations showed an advantage of identification of the composite faces than face parts in isolation, confirmed by a paired
t test:
t(8) = 4.7,
p < 0.002. This perceptual effect was reflected in the image-based similarity analysis (
Figure 4b). The Euclidean distance between the Gabor feature vectors of the target and foil was also larger for the part in face context than part in isolation for every feature and exemplar,
t(8) = 4.3,
p < 0.003.
Because the composite faces were made by placing different feature exemplars in the same spatial configuration, a pixel-intensity based representation would yield exactly the same distances between parts and between whole faces. Why did the Euclidean of the Gabor representation yield the advantage of the composite over the isolated parts? We suggest that the advantage of the composite arises from the overlap of the receptive fields of face neurons, as illustrated in
Figure 2. The interaction between local features such as those arising from the eyes and nose and the contextual face background created additional visual features, which were picked up by the kernels covering those areas, especially those with larger receptor fields. To test this hypothesis, we assessed the proportion of the variance of the whole face advantage that was predictable from the largest RFs (with the lowest SF of 8 cycles/face) versus the smallest RFs (with the highest SF of 32 c/f) components of the Gabor features of each face, separately. The results are shown in
Figure 5. The mean distances between the isolated parts were significantly lower than that between composite faces in both components, both
ts(8) > 4,
p < 0.005. However, the mean difference in the distances between the parts and wholes (e.g., distance between isolated Mouth 1 and Mouth 2, minus the distance between composite faces with Mouth 1 and Mouth 2) was 19 for the high SF and small RFs components and 56 for low SF and large RFs components, a sizable differences in the distances, confirmed as reliable by a post-hoc
t test: t(8) = 3.6,
p < 0.01. The identification accuracy was better correlated with the distance measurement in the large RF band (low SF) (
r = 0.66,
p < 0.003) than in the small RF (high SF) band (
r = 0.51,
p < 0.03). This result suggests that the configural effect was mediated to a larger extent by the neural encoding of larger receptor fields or lower spatial, or a combination of the two.