September 2018
Volume 18, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2018
Spatiotemporal dynamics of categorical representations in the human brain and deep convolutional neural networks
Author Affiliations
  • Yalda Mohsenzadeh
    Computer Science and Artificial Intelligence Laboratory, MITMcGovern Institute for Brain Research, MIT
  • Caitlin Mullin
    Computer Science and Artificial Intelligence Laboratory, MIT
  • Bolei Zhou
    Computer Science and Artificial Intelligence Laboratory, MIT
  • Dimitrios Pantazis
    McGovern Institute for Brain Research, MIT
  • Aude Oliva
    Computer Science and Artificial Intelligence Laboratory, MIT
Journal of Vision September 2018, Vol.18, 400. doi:10.1167/18.10.400
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Yalda Mohsenzadeh, Caitlin Mullin, Bolei Zhou, Dimitrios Pantazis, Aude Oliva; Spatiotemporal dynamics of categorical representations in the human brain and deep convolutional neural networks. Journal of Vision 2018;18(10):400. doi: 10.1167/18.10.400.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Not all information in our visual environment is processed equally. Some stimuli are more behaviorally relevant to our growth and survival and thus necessitates faster and more efficient processing. Here we combine the high spatial resolution of fMRI data with the high temporal resolution of MEG data to trace the perceptual dynamics of different categories of relevant visual categories (faces, objects, bodies, animates, scenes). MEG-fMRI fusion revealed that these image categories follow unique paths throughout the ventral visual cortex. For instance, while the neural signal for object and scene categories both reach early visual cortex by 75ms, from there the signal travels laterally and medially, respectively, at distinct speeds. Results from dynamic multidimensional scaling representations reveal that faces separate themselves from the rest of the categories as early as ~50ms. This is especially remarkable as this separation precedes that of animates vs. non-animates, thought to be one of the earliest (most rapid) high-level visual discriminations. Given these category-specific dynamic neural activation maps, we then examined the underlying neural computations that may be driving them. We compared features extracted from layers of state-of-the-art deep networks with our MEG and fMRI data revealing the spatiotemporal correspondence of these networks with the human brain. Results revealed that the early layers of the network corresponded with the analysis of low-level features (peak at 85ms) in early visual cortex, while the later layers corresponded with peak times after 160ms and with brain regions associated with high level semantic processing (i.e. PHC, Fusiform, Lateral Occipital). The integration of neuroimaging techniques with state-of-the-art neural networks can inform on the spatiotemporal dynamics of human visual recognition and the underlying neural computations.

Meeting abstract presented at VSS 2018

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×