September 2015
Volume 15, Issue 12
Vision Sciences Society Annual Meeting Abstract  |   September 2015
Mapping human visual representations in space and time by neural networks
Author Affiliations
  • Radoslaw Cichy
    Computer Science and Artificial Intelligence Laboratory, MIT
  • Aditya Khosla
    Computer Science and Artificial Intelligence Laboratory, MIT
  • Dimitrios Pantazis
    Department of Brain and Cognitive Sciences, MIT
  • Antonio Torralba
    Computer Science and Artificial Intelligence Laboratory, MIT
  • Aude Oliva
    Computer Science and Artificial Intelligence Laboratory, MIT
Journal of Vision September 2015, Vol.15, 376. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Radoslaw Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, Aude Oliva; Mapping human visual representations in space and time by neural networks. Journal of Vision 2015;15(12):376.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

The neural machinery underlying visual object recognition comprises a hierarchy of cortical regions in the ventral visual stream. The spatiotemporal dynamics of information flow in this hierarchy of regions is largely unknown. Here we tested the hypothesis that there is a correspondence between the spatiotemporal neural processes in the human brain and the layer hierarchy of a deep convolutional neural network (CNN). We presented 118 images of real-world objects to human participants (N=15) while we measured their brain activity with functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG). We trained an 8 layer (5 convolutional layers, 3 fully connected layers) CNN to predict 683 object categories with 900K training images from the ImageNet dataset. We obtained layer-specific CNN responses to the same 118 images. To compare brain-imaging data with the CNN in a common framework, we used representational similarity analysis. The key idea is that if two conditions evoke similar patterns in brain imaging data, they should also evoke similar patterns in the computer model. We thus determined ‘where’ (fMRI) and ‘when’ (MEG) the CNNs predicted brain activity. We found a correspondence in hierarchy between cortical regions, processing time, and CNN layers. Low CNN layers predicted MEG activity early and high layers relatively later; low CNN layers predicted fMRI activity in early visual regions, and high layers in late visual regions. Surprisingly, the correspondence between CNN layer hierarchy and cortical regions held for the ventral and dorsal visual stream. Results were dependent on amount of training and type of training material. Our results show that CNNs are a promising formal model of human visual object recognition. Combined with fMRI and MEG, they provide an integrated spatiotemporal and algorithmically explicit view of the first few hundred milliseconds of object recognition.

Meeting abstract presented at VSS 2015


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.