Abstract
Face-recognition Deep Convolutional Neural-Networks (DCNNs) show excellent generalization under variations in face appearance, such as changes in pose, illumination or expression. To what extent this view-invariant representation depends on training with faces, or may also emerge following training with non-face objects? To examine the emergence of a view-invariant representation across the network’s layers, we measured the representation similarity of different head views across different identities in DCNNs trained with faces or with objects. We found similar, view-selective representation, in lower layers of the face and object networks. A view-invariant representation emerged at higher layers of the face-trained network, but not the object-trained network, which was view-selective across all its layers. To examine whether these representations depend on facial information used by humans to extract view-invariant information from faces, we examined the sensitivity of the face and object networks to a subset of facial features that remain invariant across head views. This subset of facial features were also shown to be critical for human face recognition. Lower layers of the face network and all layers of the object network were not sensitive to this subset of critical, view-invariant features, whereas higher layers of the face network were sensitive to these view-invariant facial features. We conclude that a face-trained DCNN, but not an object-trained DCNN, displays a hierarchical process of extracting view-invariant facial features, similar to humans. These findings imply that invariant face recognition depends on experience with faces, during which the system learns to extract these invariant features, and demonstrate the advantage of separate neural systems for faces and objects. These results may generate predictions for neurophysiological studies aimed at discovering the type of facial information used through the hierarchy of the face and object-processing systems.