Abstract
Deep neural networks (DNN) have been very effective in identifying human faces from 2D images, on par with human-level performance. However, little is known about how they do it. In fact, their complexity makes their mechanisms about as opaque as those of the brain. Here, unlike previous research that generally treats DNNs as black boxes, we use rigorous psychophysical methods to better understand the representations and mechanisms underlying the categorization behavior of DNNs. We trained a state-of-the-art 10-layer ResNet to recognize 2,000 human identities generated from a 3D face model where we controlled age (25, 45, 65 years of age), emotional expression (happy, surprise, fear, disgust, anger, sad, neutral), gender (male, female), 2 facial orientation axes X and Y (each with 5 levels from -30 to +30 deg), vertical and horizontal illuminations (each with 5 levels from -30 to +30), plus random scaling and translation of the resulting 26,250,000 2D images (see S1). At training, we used two conditions of similarity of images (most similar; most different using subsets of face generation parameters) to test generalization of identity across the full set of face generation parameters. We found catastrophic (i.e. not graceful) degradation of performance in the most different condition, particularly when combining the factors of orientation and illumination. To understand the visual information the network learned to represent and identify faces, we applied Gosselin & Schyns (2001) Bubbles procedure at testing. We found striking differences in the features that the network represents compared with those typically used in humans. To our knowledge, this is the first time that a rigorous psychophysical approach controlling the dimensions of face variance is applied to better understand the behavior and information coding of complex DNNs. Our results inform fundamental differences between categorization mechanisms and representations of DNNs and the human brain.
Meeting abstract presented at VSS 2017