Abstract
Convolutional neural networks have demonstrated human-like ability in face recognition, with recent networks achieving as high as 97% accuracy (Taigman, 2014). It is thought that non-linear operations (e.g. maximum-pooling) are key for developing position and size invariance (Riesenhuber & Poggio, 1999). However, it is unknown how training contributes to invariant face recognition. Here, we tested how training affects invariant face recognition across position, size, and resolution. We used a convolutional neural network architecture of TensorFlow (tensorflow.org). We trained the network to recognize 101 faces that varied in age, gender, and ethnicity across views (15 views/face, spanning 0 to ±105°). The network was trained on 80% of views, randomly selected, and tested on the remaining 20% of views. During training faces were shown centrally and presented in one size and resolution. Then, we tested face recognition across views for new positions, sizes, and resolutions not shown during training. Results show that face recognition performance progressively declined for faces shown in different positions (Figure 1A) or sizes (Figure 1B) than shown during training. However, face recognition performance generalized across resolutions (Figure 1C). Further experiments using a constant number of training examples, but different training regimes, revealed that training with random positions (Figure 1D) or random sizes (Figure 1E) generated more robust performance than training with faces in 5 positions (Figure 1D) or 5 sizes (Figure 1E). Additionally, the network displayed better performance on faces shown in new sizes than new positions. Overall, our results indicate that the architecture of the neural network is (1) sufficient for invariant face recognition across resolutions, (2) but insufficient for invariant face recognition across size and position unless trained with many faces varying in size and position. By understanding the limits of convolutional neural networks we can gain insights to understanding factors that enable successful face recognition.
Meeting abstract presented at VSS 2017