September 2017
Volume 17, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2017
Training a deep convolutional neural network with multiple face sizes and positions, but not resolutions, is necessary for generating invariant face recognition across these transformations
Author Affiliations
  • Megha Srivastava
    Computer Science Department, Stanford University
    Psychology Department, Stanford University
  • Kalanit Grill-Spector
    Psychology Department, Stanford University
    Stanford Neurosciences Institute, Stanford University
Journal of Vision August 2017, Vol.17, 247. doi:10.1167/17.10.247
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Megha Srivastava, Kalanit Grill-Spector; Training a deep convolutional neural network with multiple face sizes and positions, but not resolutions, is necessary for generating invariant face recognition across these transformations. Journal of Vision 2017;17(10):247. doi: 10.1167/17.10.247.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Convolutional neural networks have demonstrated human-like ability in face recognition, with recent networks achieving as high as 97% accuracy (Taigman, 2014). It is thought that non-linear operations (e.g. maximum-pooling) are key for developing position and size invariance (Riesenhuber & Poggio, 1999). However, it is unknown how training contributes to invariant face recognition. Here, we tested how training affects invariant face recognition across position, size, and resolution. We used a convolutional neural network architecture of TensorFlow (tensorflow.org). We trained the network to recognize 101 faces that varied in age, gender, and ethnicity across views (15 views/face, spanning 0 to ±105°). The network was trained on 80% of views, randomly selected, and tested on the remaining 20% of views. During training faces were shown centrally and presented in one size and resolution. Then, we tested face recognition across views for new positions, sizes, and resolutions not shown during training. Results show that face recognition performance progressively declined for faces shown in different positions (Figure 1A) or sizes (Figure 1B) than shown during training. However, face recognition performance generalized across resolutions (Figure 1C). Further experiments using a constant number of training examples, but different training regimes, revealed that training with random positions (Figure 1D) or random sizes (Figure 1E) generated more robust performance than training with faces in 5 positions (Figure 1D) or 5 sizes (Figure 1E). Additionally, the network displayed better performance on faces shown in new sizes than new positions. Overall, our results indicate that the architecture of the neural network is (1) sufficient for invariant face recognition across resolutions, (2) but insufficient for invariant face recognition across size and position unless trained with many faces varying in size and position. By understanding the limits of convolutional neural networks we can gain insights to understanding factors that enable successful face recognition.

Meeting abstract presented at VSS 2017

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×