October 2020
Volume 20, Issue 11
Open Access
Vision Sciences Society Annual Meeting Abstract  |   October 2020
Comparing representations that support object, scene, and face recognition using representational trajectory analysis.
Author Affiliations
  • Aylin Kallmayer
    Goethe University Frankfurt
  • Jacob Prince
    Harvard University
  • Talia Konkle
    Harvard University
Journal of Vision October 2020, Vol.20, 861. doi:https://doi.org/10.1167/jov.20.11.861
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Aylin Kallmayer, Jacob Prince, Talia Konkle; Comparing representations that support object, scene, and face recognition using representational trajectory analysis.. Journal of Vision 2020;20(11):861. https://doi.org/10.1167/jov.20.11.861.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Deep convolutional neural networks have become an increasingly powerful tool to probe the nature of visual representation in the primate visual system. With extensive training, these networks learn to transform pictoral inputs through several intermediate stages into a format that is optimized for a given task. To what degree do networks trained specifically on faces, places, or objects have similar or different representational stages at each layer? Here we introduce a tool to visualize the “representational trajectories” that models take from input stages to output stages. Specifically, we trained models with a common base architecture (Alexnets) on either object categorization, scene categorization, or face identification. Next, we measured the responses in all units of each layer to a mixed probe set consisting of objects, scenes, and faces. Then, we computed the representational dissimilarity between all layers of all models. Finally, we used multidimensional scaling to plot layers with similar geometries close to each other, which provides a simple visualization of where along the processing stages models have similar and divergent representational formats. This trajectory analysis revealed that all three networks learned similar geometries in early layers, despite having different visual experience. The models diverged in mid and late layers, with object- and scene-trained networks learning more similar geometries than face trained networks. A randomly-weighted, untrained Alexnet showed no similarity to the other three network trajectories, indicating that the early representational similarity is not solely induced through a hierarchical architecture itself. Taken together, these computational results indicate that face, place, and object stimulus domains naturally share early and intermediate level image features, before diverging towards more specialized feature spaces. Further, this work introduces representational trajectory analysis as a comparative approach for understanding what is learned by deep neural network models across variations in architecture, training sets, and tasks.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.