Abstract
Deep neural networks (DNN) perform well for identifying faces from diverse sets of images. However, the details of the processing within such networks as they perform invariant face categorizations remains unclear. Here we study the development of categorical representations across layers of a DNN trained to categorize facial identity. We generated a data set of 26,250,000 face images using a photorealistic 3d generative model, comprising 2,000 unique facial identities (balanced across gender and two ethnicities) visualized with different ages and emotional expressions, and across multiple orientations, illuminations, scalings and translations. We trained a 10-layer ResNet with 70% of these images to recognize the 2000 individual identities. We tested with 10,000 images (not used for training) and obtained 99% correct identification. To understand how categorization develops within the network we computed, for each layer, a 10,000 x 10,000 Representational Dissimilarity Matrix (RDM) from the correlation between network activations for each pair of testing images. We then compared the RDM derived at each layer with the discrete categorical RDM model for each of the main category factors entering stimulus generation. We found the early layers of the network most strongly represent orientation of the face, together with ethnicity. In later layers, invariance to orientation develops, and representation of the identity factors gender, ethnicity and age increase. No layers show a strong representation of illumination. Prior to identity readout in layer 10, ethnicity is the most strongly represented category, with representation peaking in the middle layers (5,6,7) before decreasing in the upper-most layers. By tightly controlling the categorical sources of variance of an image set used to train a DNN, we can derive an understanding of the implicit categorizations that it can achieve at each layer. Our results also shed light on the computational complexity of different facial categorizations.
Meeting abstract presented at VSS 2018