July 2013
Volume 13, Issue 9
Free
Vision Sciences Society Annual Meeting Abstract  |   July 2013
Specifying the relationships between objects, gaze, and descriptions for scene understanding
Author Affiliations
  • Kiwon Yun
    Department of Computer Science, Stony Brook University
  • Yifan Peng
    Department of Computer Science, Stony Brook University
  • Hossein Adeli
    Department of Psychology, Stony Brook University
  • Tamara Berg
    Department of Computer Science, Stony Brook University
  • Dimitris Samaras
    Department of Computer Science, Stony Brook University
  • Gregory Zelinsky
    Department of Computer Science, Stony Brook University\nDepartment of Psychology, Stony Brook University
Journal of Vision July 2013, Vol.13, 1309. doi:10.1167/13.9.1309
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Kiwon Yun, Yifan Peng, Hossein Adeli, Tamara Berg, Dimitris Samaras, Gregory Zelinsky; Specifying the relationships between objects, gaze, and descriptions for scene understanding. Journal of Vision 2013;13(9):1309. doi: 10.1167/13.9.1309.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The objects that people choose to look at while viewing a scene provide an abundance of information about how a scene is ultimately understood. In Experiment 1, participants viewed a scene for 5 seconds, then described the scene’s content, with this description being our estimate of their scene understanding. There were 104 scenes (selected from SUN09), spanning 8 scene types, and analyses were limited to 22 categories of common objects for which bounding box information was available. In Experiment 2, participants viewed 1000 scenes (from PASCAL VOC), each for 3 seconds, in anticipation of a memory test. Analyses were limited to 20 object categories and descriptions were obtained using Mechanical Turk. For both experiments, we found that fixated objects tended also to be described (95.2% for PASCAL, 72.5% for SUN09) and described objects tended also to be fixated (86.6% for PASCAL, 73.7% for SUN09). Differences between experiments were likely due to the PASCAL images being less cluttered than the SUN09 images, thereby increasing the probability of fixations on selected objects. People also tended to look more often at animate objects (people, animals) or objects that conveyed animacy (televisions, computer monitors) than inanimate objects (e.g., tables, rugs, cabinets). Furthermore, by analyzing where fixations typically fell within the bounding boxes for different categories of objects (using object-based fixation density maps), we were able to discern distinct category-specific patterns of fixation behavior. For example, fixations on tables and chairs tended to be distributed in the extreme upper halves of bounding boxes, reflecting the fact that things usually sit on these objects, whereas fixations on cats and cows were distributed along the horizontal midline, reflecting a center-of-mass looking bias. Collectively, these findings suggest that embedded in viewing behavior is information about the content of a scene and how a scene is being understood.

Meeting abstract presented at VSS 2013

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×