August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Both Purely Visual and Simulation-based Models Uniquely Explain Human Social Interaction Judgements
Author Affiliations
  • Manasi Malik
    Johns Hopkins University
  • Leyla Isik
    Johns Hopkins University
Journal of Vision August 2023, Vol.23, 5111. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Manasi Malik, Leyla Isik; Both Purely Visual and Simulation-based Models Uniquely Explain Human Social Interaction Judgements. Journal of Vision 2023;23(9):5111.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Humans are very adept at detecting and recognizing social interactions. However, the underlying computations that enable us to extract social information from visual scenes are still largely unknown. One theory proposes that humans recognize social relationships by simulating the inferred goals of others, and has been instantiated using generative inverse planning models. In contrast, recent behavioral and neural evidence has suggested that social interaction perception is a bottom-up, visual process separate from complex mental simulation. Relatedly, recent work has found that a purely visual model with relational inductive biases can successfully model human social interaction judgments, lending computational support to this bottom-up theory. To directly compare these two alternatives, we look at the relationship between our purely visual model (SocialGNN), and a generative inverse planning model (SIMPLE) with human ratings of animated shape videos resembling real-life social interactions. Using representational similarity analysis, we found that both SocialGNN and SIMPLE are significantly correlated with human judgments (r = .45 & r = .49, respectively). Interestingly, there is a significant amount of variance in human judgments that is uniquely explained by each model (sr = .30 & sr = .37 respectively), suggesting that humans engage both bottom-up and simulation-based processes to recognize social interactions, with each process possibly representing a different aspect of the stimulus. This work provides important insight into the extent to which humans rely on visual processing versus mental simulation to interpret different social scenes.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.