September 2021
Volume 21, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2021
A large-scale, naturalistic dataset of two-person social actions
Author Affiliations & Notes
  • Emalie McMahon
    Johns Hopkins University
  • Michael Bonner
    Johns Hopkins University
  • Leyla Isik
    Johns Hopkins University
  • Footnotes
    Acknowledgements  This work was funded in part by NSF award DGE-1746891.
Journal of Vision September 2021, Vol.21, 1962. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Emalie McMahon, Michael Bonner, Leyla Isik; A large-scale, naturalistic dataset of two-person social actions. Journal of Vision 2021;21(9):1962. doi:

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

During everyday tasks like navigating a crowded store or deciding where to sit on a bus, we perceive rich details about the social interactions of others. The ability to perceive others’ social interactions is a core component of human cognition, but its neural computations are only beginning to be understood. While recent work has made progress in understanding the neural mechanisms of social interaction perception with tightly controlled stimuli, continued progress requires a dataset of social and nonsocial actions that is representative of everyday life and captures variance along a range of social dimensions, e.g. valence, cooperativity, and interpersonal relationships. However, using naturalistic stimuli to investigate social interaction perception presents significant challenges. First, the number of people in a given video is highly predictive of the presence of a social interaction–a video of one person likely shows a nonsocial action, while a video of three or more people likely shows a social interaction. Further, existing datasets of social actions are not representative of everyday actions. They dramatically oversample sporting activities, for instance. We addressed this challenge by curating a dataset of two-person social and nonsocial actions from the Moments in Time dataset. Our videos were selected from common action categories based on the American Time Use Survey. We also balanced our dataset on visual features such as whether the action occurred indoors or outdoors. Finally, we selected our final set of videos so that sociality was not predictable by early layers of a deep convolutional neural network. This dataset will be an important tool in benchmarking human social perception against computer vision models and can pave the way for the same progress that has been achieved in object and scene perception.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.