Purchase this article with an account.
Leyla Tarhan, Talia Konkle; Low and high level features explain neural response tuning during action observation. Journal of Vision 2017;17(10):989. doi: 10.1167/17.10.989.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Among humans' cognitive faculties, the ability to process others' actions is essential. We can recognize the meaning behind running, eating, and finer movements like tool use. How does the visual system process and transform information about actions? To explore this question, we collected 120 action videos, spanning a range of every-day activities sampled from the American Time Use Survey. Next, we used behavioral ratings and computational approaches to measure how these videos vary within three distinct feature spaces: visual shape features ("gist"), kinematic features (e.g., body parts involved), and intentional features (e.g., used to communicate). Finally, using fMRI, we obtained neural responses for each of these 2.5s action clips in 9 participants. To analyze the structure in these neural responses, we used an encoding-model approach (Mitchell et al., 2008) to fit tuning models for each voxel along each feature space, and assess how well each model predicts responses to individual actions. We found that a large proportion of cortex along the intraparietal sulcus and occipitotemporal surface was moderately well fit by all three models (median r=0.23-0.31). In a leave-two-out validation procedure, all three models could accurately classify between two action videos in ventral and dorsal stream sectors (65-80%, SEM=1.1%-2.6%). In addition, we observed a significant shift in classification accuracy between early visual cortex (EVC) and higher-level visual cortex: the gist model best in early visual cortex, whereas the high-level models out-performed gist in occipito-temporal and parietal regions. These results demonstrate action representations can be successfully predicted using an encoding-model approach. More broadly, the pattern of fits for different feature models reveals that visual information is transformed from low- to high-level representational spaces in the course of action processing. These findings begin to formalize the progression of the kinds of action information being processed along the visual stream.
Meeting abstract presented at VSS 2017
This PDF is available to Subscribers Only