August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Semantic representations of human actions across vision and language
Author Affiliations & Notes
  • Diana C Dima
    Western University
  • Jody Culham
    Western University
  • Yalda Mohsenzadeh
    Western University
  • Footnotes
    Acknowledgements  This work was supported by BrainsCAN at Western University through the Canada First Research Excellence Fund (CFREF). Diana Dima is generously supported by a Western Postdoctoral Fellowship.
Journal of Vision August 2023, Vol.23, 5511. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Diana C Dima, Jody Culham, Yalda Mohsenzadeh; Semantic representations of human actions across vision and language. Journal of Vision 2023;23(9):5511.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Humans can visually recognize many actions performed by others, as well as communicate about them. How are action concepts organized in the mind? Recent work has uncovered shared neural representations of actions across vision and language, yet the semantic structure of these representations is not well understood. To address this, we curated a multimodal, naturalistic action set containing 95 videos of everyday actions from the Moments in Time dataset (Monfort et al., 2019) and 95 naturalistic sentences describing the same actions. We labeled each action with four semantic features: a specific action verb (e.g., chopping); an everyday activity (representing a set of actions; e.g., preparing food); the target of the action (e.g., an object); and a broad action class (e.g., manipulation; Orban et al., 2021). We also annotated the actions with other relevant social and action-related features. We used these features to predict behavioral similarity measured in two multiple arrangement experiments. Participants arranged the videos (N = 39) or sentences (N = 32) according to the actions’ similarity in meaning. In both experiments, the action target explained more unique variance in behavior than any other feature. In a cross-modal analysis, we mapped the semantic, action, and social features to the video similarity judgments to predict the sentence similarity judgments, and vice versa. Our feature set explained approximately 80% of the shared variance across modalities. Of all features, action target and action class were the best cross-modal predictors. Together, our results demonstrate the shared semantic organization of human actions across vision and language. This organization reflects broad semantic features, including action target and action class. Our results challenge commonly used definitions of action categories, and open exciting avenues for understanding how action concepts are represented in the mind and brain.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.