Abstract
Every day, we encounter people performing various different types of actions, which we are able to understand quickly and effortlessly. Despite this seemingly trivial task, our knowledge regarding the way in which the human brain represents observed actions and assigns meaning to them is still limited. In order to address these questions, we examined the cognitive and neural organization of 100 actions depicted as static images, taking inspiration from the rich literature on object representations. First, to examine the (1) category- and (2) feature-based organization, we performed several behavioral studies using inverse multidimensional scaling analysis and feature ratings. To compare the two resulting category- and feature-based model against a semantic model, we constructed a third model using word embeddings (Bidirectional Encoder Representations from Transformers; BERT). Next, we conducted an fMRI study and performed a representational similarity analysis (RSA) using the three models. A ROI-based RSA in occipito-temporal, parietal and frontal nodes of the action observation network revealed a significant correlation between the neural data and the feature-based model in the left lateral occipitotemporal cortex (LOTC) and the left inferior parietal lobule. The category model significantly correlated with neural data in the right LOTC. The semantic model significantly correlated with neural data in the left and right LOTC. An additional whole-brain GLM-based searchlight RSA revealed the peak for the feature model in the left lingual gyrus, for the category model in the right middle temporal gyrus (MTG), and for the semantic model in the right LOTC. Overall, our results highlight the importance of occipito-temporal regions for the processing of information related to features, categories and semantics and suggest a division of labor between the left and right LOTC.