September 2021
Volume 21, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2021
Using task-optimized neural networks to understand how experience might shape human face perception
Author Affiliations & Notes
  • Katharina Dobs
    Justus-Liebig University Giessen
    Massachusetts Institute of Technology
  • Joanne Yuan
    Massachusetts Institute of Technology
  • Julio Martinez
    Massachusetts Institute of Technology
  • Nancy Kanwisher
    Massachusetts Institute of Technology
  • Footnotes
    Acknowledgements  This work was supported by a Feodor-Lynen postdoctoral fellowship of the Humboldt Foundation to K.D., NIH grant Grant DP1HD091947 to N.K and National Science Foundation Science and Technology Center for Brains, Minds, and Machines.
Journal of Vision September 2021, Vol.21, 2292. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Katharina Dobs, Joanne Yuan, Julio Martinez, Nancy Kanwisher; Using task-optimized neural networks to understand how experience might shape human face perception. Journal of Vision 2021;21(9):2292. doi:

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

We perceive and recognize faces quickly and seemingly effortlessly. How does this remarkable ability develop and what is the role of experience? Here we addressed these long-standing questions by leveraging recent successes in deep convolutional neural networks (CNNs) as models for human visual recognition. Specifically, we asked whether training on generic object recognition is sufficient for CNNs to capture human face behavior, or whether face-specific training is required. To measure human face perception, subjects (n=14) performed a similarity arrangement task on 80 different face images (five images of 16 different identities). Using representational similarity analysis, we compared the behavioral representational dissimilarity matrices (RDMs) to RDMs obtained from different layers of CNNs (i.e., VGG16) trained on either object (Object CNN) or face identity (Face CNN) categorization. Importantly, the face identities used as stimuli were not included in the training, and thus “unfamiliar” to both the Face and Object CNN. We found that the human face behavior RDM was more similar to layer-specific RDMs of the Face CNN (max. Spearman’s r=.42, reaching the noise ceiling) compared to the Object CNN (max. Spearman’s r=.21). Moreover, late layers in the Face CNN matched human face behavior better than early layers. These results show that face-trained CNNs capture human face behavior better than object-trained CNNs. Further, these results suggest that humanlike face perception abilities do not automatically arise from generic visual experience with objects. Instead, face-specific experience during development may shape and fine-tune human face perception.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.