October 2020
Volume 20, Issue 11
Open Access
Vision Sciences Society Annual Meeting Abstract  |   October 2020
The representational space of action perception
Author Affiliations & Notes
  • Diana C. Dima
    Johns Hopkins University
  • Tyler Tomita
    Johns Hopkins University
  • Christopher Honey
    Johns Hopkins University
  • Leyla Isik
    Johns Hopkins University
  • Footnotes
    Acknowledgements  This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216.
Journal of Vision October 2020, Vol.20, 1161. doi:https://doi.org/10.1167/jov.20.11.1161
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Diana C. Dima, Tyler Tomita, Christopher Honey, Leyla Isik; The representational space of action perception. Journal of Vision 2020;20(11):1161. doi: https://doi.org/10.1167/jov.20.11.1161.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Humans can easily extract meaning from complex actions, but the computations enabling this are not well understood. What are the organizing dimensions of human action representations? Their visual appearance? Their social content? It is likely that this rich behavioral space cannot be characterized with small-scale controlled experiments alone, but naturalistic stimulus sets pose additional challenges in disentangling the many potential contributing features. Here, we addressed this by first curating a large-scale set of ~200 three-second videos of everyday actions from the Moments in Time dataset (Monfort et al., 2019). We annotated the videos with relevant features, including action type and presence of a social interaction. The videos belonged to 17 action categories and were balanced in terms of environment, number of agents, and agent gender. Second, we used representational similarity analysis to evaluate how visual and social stimulus features predicted behavioral similarity measured via a video arrangement task. In a pilot online experiment using the Meadows platform, participants arranged a subset of 50 videos according to their similarity, broadly defined so as to emphasize natural behavior. Inverse multi-dimensional scaling (Kriegeskorte & Mur, 2012) was performed to calculate behavioral representational dissimilarity matrices (RDM) reflecting Euclidean distances between stimuli, which were correlated to model RDMs of predicted dissimilarity. These models were based on visual properties (such as convolutional neural network activations and spatial envelope) and social properties (such as action category, valence, number of agents, and presence of a social interaction). Social models (particularly action category and number of agents), but not visual models, explained a significant portion of the variance in behavior. Our results suggest that the behavioral space of action representation reflects socially relevant dimensions. This opens exciting avenues for disentangling the visual and social components of action recognition within a unified framework, using ecologically valid stimuli and computational models.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.