Abstract
Human face recognition is highly accurate, and exhibits a number of distinctive and well documented behavioral and neural "signatures" such as the face-inversion effect, the other-race effect and neural specialization for faces. How does the remarkable human ability of face recognition arise in development? Is experience with faces required, and if so, what kind of experience? We cannot straightforwardly manipulate visual experience during development in humans, but we can ask what is possible in machines. Here, I will present our work testing whether convolutional neural networks (CNNs) optimized on different tasks with varying visual experience capture key aspects of human face perception. We find that only face-trained – not object-trained or untrained – CNNs achieved human-level performance on face recognition and exhibited behavioral signatures of human face perception. Moreover, these signatures emerged only in CNNs trained for face identification, not in CNNs that were matched in the amount of face experience but trained on a face detection task. Critically, similar to human visual cortex, CNNs trained on both face and object recognition spontaneously segregated themselves into distinct subsystems for each. These results indicate that humanlike face perception abilities and neural characteristics emerge in machines and could in principle arise in humans (through development or evolution or both) after extensive training on real-world face recognition without face-specific predispositions, but that experience with objects alone is not sufficient. I will conclude by discussing how this computational approach offers novel ways to illuminate how and why visual recognition works the way it does.