December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
The dynamics of scene understanding
Author Affiliations & Notes
  • Daniel Harari
    Weizmann AI Center, Department of Computer Science and Applied Mathematics, Weizmann Institute of Science
  • Alex Mars
    Weizmann AI Center, Department of Computer Science and Applied Mathematics, Weizmann Institute of Science
  • Hanna Benoni
    Department of Psychology, The College of Management Academic Studies
  • Shimon Ullman
    Weizmann AI Center, Department of Computer Science and Applied Mathematics, Weizmann Institute of Science
  • Footnotes
    Acknowledgements  Robin Chemers Neustein Artificial Intelligence Fellows Program
Journal of Vision December 2022, Vol.22, 3555. doi:https://doi.org/10.1167/jov.22.14.3555
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Daniel Harari, Alex Mars, Hanna Benoni, Shimon Ullman; The dynamics of scene understanding. Journal of Vision 2022;22(14):3555. https://doi.org/10.1167/jov.22.14.3555.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Visual scene understanding involves processing and integration from different levels of visual tasks, including recognition of objects, actions and interactions. Here we study the dynamics of scene understanding over time. In particular, we study the time trajectory of scene interpretation, by controlling the exposure time with perceptual masking. 140 MTurk participants were instructed to provide a detailed free-recall description to 14 stimuli images portraying various interactions between animate agents (humans and pets) and other agents and objects. They were instructed to report the type of objects and agents in the image with their properties and inter-relations. For each image, subjects were assigned to one of seven exposure conditions: 50, 75, 100, 125, 200, 500 and 2000ms followed by a mask. A fixation cross at the center of the image frame appeared prior to image display. Participants had 15 minutes for task completion. Evaluation of the subjects’ responses was conducted by 4 scorers, who followed a detailed analysis protocol, which minimized subjective judgements. Preliminary results indicate consistent trends in the time evolution of scene perception: (i) human agents are reported earlier than objects and global scene description, even when objects appear at the center of fixation (e.g. ‘two men’ before ‘a park bench’); (ii) actions are reported earlier than the acted upon objects (e.g. ‘drinking’ before ‘cup’); (iii) for human agents, the number of agents is reported early, followed by age, and gender is reported on the average later (e.g. ‘two people’, before ‘two kids’, and then ‘two boys’). These findings are interesting from a modeling perspective since they do not fit the common scene understanding paradigm in computer vision, where objects are first detected and only then their inter-relations are processed. We will consider scene perception schemes that are more consistent with human dynamics of scene perception than current approaches.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×