October 2003
Volume 3, Issue 9
Free
Vision Sciences Society Annual Meeting Abstract  |   October 2003
Top-down control of visual attention in real world scenes
Author Affiliations
  • Aude Oliva
    Department of Psychology, Cognitive Science Program, Michigan State University, USA
  • Antonio Torralba
    Artificial Intelligence Laboratory, MIT, USA
  • Monica S Castelhano
    Department of Psychology, Cognitive Science Program, Michigan State University, USA
  • John M Henderson
    Department of Psychology, Cognitive Science Program, Michigan State University, USA
Journal of Vision October 2003, Vol.3, 3. doi:https://doi.org/10.1167/3.9.3
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Aude Oliva, Antonio Torralba, Monica S Castelhano, John M Henderson; Top-down control of visual attention in real world scenes. Journal of Vision 2003;3(9):3. https://doi.org/10.1167/3.9.3.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

During the first glance at a complex scene, attention of the observer is driven towards a particular region in the scene and the first saccade is programmed. Studies in scene recognition have acknowledged that significant structural and spatial layout information is extracted within a glance to form a semantic “gist” of the scene. Therefore, information included in the gist forms a visual context that is likely to modulate in a top-down manner where attention will land in a complex scene image. In this presentation, we extend a computational model of the gist that encodes the coarse spectral layout of a scene image to incorporate attentional guidance mechanisms and generate eye movements. The model uses the statistical correlations that exist between global scene structure (e.g. a street scene is in perspective) and object properties (e.g. location of pedestrian) to define a region of interest in the image that is relevant for solving a task (e.g., looking for people). Eye movements of 8 human observers were monitored, while instructed to search for a specific object (people) in 36 real world scenes. The region of interest scrutinized by observers and determined by the gist guidance schema overlap in more than 85% of the cases. Multiple fixation points (e.g. saccades) within the region of interest were generated by integrating a bottom-up saliency model with the top-down attentional guidance mechanism. Using a set of similarity metrics, we show that the locations of the multiple fixations of attention generated by the integrative model and by human observers were very similar. The results validate the proposition that top-down information from visual context modulates early the saliency of image regions during the task of object detection.

Oliva, A., Torralba, A., Castelhano, M. S., Henderson, J. M.(2003). Top-down control of visual attention in real world scenes [Abstract]. Journal of Vision, 3( 9): 3, 3a, http://journalofvision.org/3/9/3/, doi:10.1167/3.9.3. [CrossRef]
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×