Purchase this article with an account.
Yalda Mohsenzadeh, Caitlin Mullin, Bolei Zhou, Dimitrios Pantazis, Aude Oliva; Spatiotemporal dynamics of categorical representations in the human brain and deep convolutional neural networks. Journal of Vision 2018;18(10):400. doi: https://doi.org/10.1167/18.10.400.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Not all information in our visual environment is processed equally. Some stimuli are more behaviorally relevant to our growth and survival and thus necessitates faster and more efficient processing. Here we combine the high spatial resolution of fMRI data with the high temporal resolution of MEG data to trace the perceptual dynamics of different categories of relevant visual categories (faces, objects, bodies, animates, scenes). MEG-fMRI fusion revealed that these image categories follow unique paths throughout the ventral visual cortex. For instance, while the neural signal for object and scene categories both reach early visual cortex by 75ms, from there the signal travels laterally and medially, respectively, at distinct speeds. Results from dynamic multidimensional scaling representations reveal that faces separate themselves from the rest of the categories as early as ~50ms. This is especially remarkable as this separation precedes that of animates vs. non-animates, thought to be one of the earliest (most rapid) high-level visual discriminations. Given these category-specific dynamic neural activation maps, we then examined the underlying neural computations that may be driving them. We compared features extracted from layers of state-of-the-art deep networks with our MEG and fMRI data revealing the spatiotemporal correspondence of these networks with the human brain. Results revealed that the early layers of the network corresponded with the analysis of low-level features (peak at 85ms) in early visual cortex, while the later layers corresponded with peak times after 160ms and with brain regions associated with high level semantic processing (i.e. PHC, Fusiform, Lateral Occipital). The integration of neuroimaging techniques with state-of-the-art neural networks can inform on the spatiotemporal dynamics of human visual recognition and the underlying neural computations.
Meeting abstract presented at VSS 2018
This PDF is available to Subscribers Only