Abstract
Face recognition plays a crucial role in our daily social lives. Considerable research has suggested that faces constitute a special category of stimuli that can engage more holistic processing than other object categories. However, there is still a lack of clarity as to why holistic encoding strategies may be preferred in the case of face recognition. Does holistic processing arise from the more extensive training that people acquire for faces as compared to other objects, or might it also depend on the inherent visual structure of faces? To explore these questions, we compared convolutional neural networks (CNNs) that were trained to discriminate faces or subordinate-level objects (i.e., dogs or birds) on image manipulations known to impair holistic/configural processing. Specifically, we evaluated how CNN performance was affected by misalignment of the top- and bottom-halves of the image, random scrambling of local chunks of the image, and stimulus inversion. We found that face-trained CNNs were more severely impaired by all three image manipulations than object-trained CNNs. We further evaluated CNNs trained with a developmentally motivated sequence of blurry faces followed by progressively clearer ones (Jang & Tong, JoV, 2021), which revealed greater effects of holistic processing than were found for CNNs trained with clear faces only. By contrast, object-trained CNNs were not affected by blurry-to-clear image training. Finally, we investigated the performance of CNNs trained with heterogeneous face views ranging from front view to 90-degree profile. Even with the greater variability of the training and test images, the same pronounced effects of holistic face processing were replicated. A spatial frequency analysis revealed that face-trained CNNs preferred much lower spatial frequencies than the subordinate-level object-trained CNNs. Taken together, our findings provide novel computational evidence that the inherent visual structure of facial images can support more holistic processing than other subordinate-level object categories.