Abstract
Historically, studies aimed at understanding neural codes have probed either individual neurons or patterns of neural activation. Here, we integrated these two levels of encoding by investigating individual simulated neurons (i.e., units) and high-level coding patterns in a deep convolutional neural network (DCNN) trained for face identification. These networks simultaneously encode identity, gender, and viewpoint (Parde et al., 2017) and allow for an investigation of representations at multiple scales. First, we measured individual units’ capacity to distinguish identities, genders, and viewpoints (“attributes”). Second, we re-expressed face representations as directions in the high-dimensional space, quantified using principal component analysis (PCA), and measured PCs’ capacity to distinguish face attributes. Coding capacity in individual units was measured by effect sizes in one-way ANOVAs for distinguishing identity (mean R^2 = 0.71, SD = 0.016), gender (mean R^2 = 0.004, SD = 0.007), and viewpoint (mean R^2 = 0.002, SD = 0.002). Although the effects for gender and viewpoint were small, they were of consistent magnitude across units, and predictions from the ensemble of units were accurate (gender-classification accuracy 92.3%, viewpoint estimation within 7.8 degrees). All units provided significant identity information, 71% provided gender information, and 50% provided viewpoint information (all p < 0.05, Bonferroni corrected). To investigate the organization of the three attributes in the PCA space, we computed the cosine similarity between each PC and directions diagnostic of identity, gender, and viewpoint separation. This analysis shows that the attributes are separated into subspaces such that identity information is encoded along axes that explain the most variance, followed by gender, and then viewpoint. Combined, these results indicate that the ensemble code that emerges from the DCNN organizes attributes semantically, though the individual units entangle this information. Therefore, these units cannot be interpreted as simple visual feature detectors in a traditional sense.