Purchase this article with an account.
Connor J Parde, Y. Ivette Colon, Matthew Q Hill, Rajeev Ranjan, Carlos Castillo, Alice J O’Toole; Density of Top-Layer Codes in Deep Convolutional Neural Networks Trained for Face Identification. Journal of Vision 2019;19(10):261d. doi: https://doi.org/10.1167/19.10.261d.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Deep convolutional neural networks (DCNNs) are the state-of-the-art in automatic face recognition and currently outperform humans on some tasks. These networks perform tens of millions of neuron-like computations to produce a “top-layer” face representation that is highly compact. This representation supports robust face identification, while retaining information about face gender and viewpoint. We tested the density of information encoded in the top-layer face representation by sub-sampling feature units and measuring the cost to identification accuracy as the number of units decreased. Using a 101-layer ResNet-based DCNN trained for face recognition (Ranjan et al., 2017), subsets of features (N = 512, 256, 128, 64, 32, 16, 8, 4, 2) were selected randomly without replacement from the full 512-dimensional top-layer feature set. Identification accuracy was tested using the IJB-C dataset (141,332 unconstrained images, 3,532 identities) and measured by the area under the ROC curve (AUC). Face identification performance declined at a remarkably slow rate as the subset size decreased (AUCs: 512, 256, 128, 64 features=0.98, 32 features=0.97, 16 features=0.95, 8 features=0.91, 4 features=0.85, 2 features=0.75). Next, we used the subsamples to predict the viewpoint and gender of the input faces. Viewpoint prediction error (in degrees) and gender prediction accuracy (percent correct) were measured as a function of decreasing subset size. Performance declined more rapidly for viewpoint and gender than for identity. Viewpoint prediction error increased from 7.87°, with the full feature set, to chance (~16°), with subsamples of 64 features or fewer. Gender prediction accuracy decreased from 95.0%, with the full feature set, to near chance (~70%) with subsamples of 32 features or fewer. This indicates that DCNN face representations require input from only a small number of features for successful identification, but require coordinated responses from the full complement of top-layer features to predict image information.
This PDF is available to Subscribers Only