Purchase this article with an account.
Diana C. Dima, Tyler Tomita, Christopher Honey, Leyla Isik; The representational space of action perception. Journal of Vision 2020;20(11):1161. doi: https://doi.org/10.1167/jov.20.11.1161.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Humans can easily extract meaning from complex actions, but the computations enabling this are not well understood. What are the organizing dimensions of human action representations? Their visual appearance? Their social content? It is likely that this rich behavioral space cannot be characterized with small-scale controlled experiments alone, but naturalistic stimulus sets pose additional challenges in disentangling the many potential contributing features.
Here, we addressed this by first curating a large-scale set of ~200 three-second videos of everyday actions from the Moments in Time dataset (Monfort et al., 2019). We annotated the videos with relevant features, including action type and presence of a social interaction. The videos belonged to 17 action categories and were balanced in terms of environment, number of agents, and agent gender. Second, we used representational similarity analysis to evaluate how visual and social stimulus features predicted behavioral similarity measured via a video arrangement task.
In a pilot online experiment using the Meadows platform, participants arranged a subset of 50 videos according to their similarity, broadly defined so as to emphasize natural behavior. Inverse multi-dimensional scaling (Kriegeskorte & Mur, 2012) was performed to calculate behavioral representational dissimilarity matrices (RDM) reflecting Euclidean distances between stimuli, which were correlated to model RDMs of predicted dissimilarity. These models were based on visual properties (such as convolutional neural network activations and spatial envelope) and social properties (such as action category, valence, number of agents, and presence of a social interaction). Social models (particularly action category and number of agents), but not visual models, explained a significant portion of the variance in behavior.
Our results suggest that the behavioral space of action representation reflects socially relevant dimensions. This opens exciting avenues for disentangling the visual and social components of action recognition within a unified framework, using ecologically valid stimuli and computational models.
This PDF is available to Subscribers Only