Abstract
Deep convolutional neural networks (DCNNs) are the state-of-the-art in automatic face recognition and currently outperform humans on some tasks. These networks perform tens of millions of neuron-like computations to produce a “top-layer” face representation that is highly compact. This representation supports robust face identification, while retaining information about face gender and viewpoint. We tested the density of information encoded in the top-layer face representation by sub-sampling feature units and measuring the cost to identification accuracy as the number of units decreased. Using a 101-layer ResNet-based DCNN trained for face recognition (Ranjan et al., 2017), subsets of features (N = 512, 256, 128, 64, 32, 16, 8, 4, 2) were selected randomly without replacement from the full 512-dimensional top-layer feature set. Identification accuracy was tested using the IJB-C dataset (141,332 unconstrained images, 3,532 identities) and measured by the area under the ROC curve (AUC). Face identification performance declined at a remarkably slow rate as the subset size decreased (AUCs: 512, 256, 128, 64 features=0.98, 32 features=0.97, 16 features=0.95, 8 features=0.91, 4 features=0.85, 2 features=0.75). Next, we used the subsamples to predict the viewpoint and gender of the input faces. Viewpoint prediction error (in degrees) and gender prediction accuracy (percent correct) were measured as a function of decreasing subset size. Performance declined more rapidly for viewpoint and gender than for identity. Viewpoint prediction error increased from 7.87°, with the full feature set, to chance (~16°), with subsamples of 64 features or fewer. Gender prediction accuracy decreased from 95.0%, with the full feature set, to near chance (~70%) with subsamples of 32 features or fewer. This indicates that DCNN face representations require input from only a small number of features for successful identification, but require coordinated responses from the full complement of top-layer features to predict image information.
Acknowledgement: This research is based upon work supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA R&D Contract No. 2014-14071600012. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. We thank NVIDIA for donating of the K40 GPU used in this work.