August 2009
Volume 9, Issue 8
Vision Sciences Society Annual Meeting Abstract  |   August 2009
Modeling visual search in a thousand scenes: The roles of saliency, target features, and scene context
Author Affiliations
  • Krista Ehinger
    Department of Brain & Cognitive Sciences, MIT
  • Barbara Hidalgo-Sotelo
    Department of Brain & Cognitive Sciences, MIT
  • Antonio Torralba
    Computer Science & Artificial Intelligence Laboratory, MIT
  • Aude Oliva
    Department of Brain & Cognitive Sciences, MIT
Journal of Vision August 2009, Vol.9, 1199. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Krista Ehinger, Barbara Hidalgo-Sotelo, Antonio Torralba, Aude Oliva; Modeling visual search in a thousand scenes: The roles of saliency, target features, and scene context. Journal of Vision 2009;9(8):1199.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Three sources of guidance have been proposed to explain the deployment of attention during visual search tasks. (1) Saliency reflects the capture of attention by regions of an image that differ from their surroundings in low-level features (i.e., Itti & Koch, 2000). (2) Attention may also be guided towards image regions that look like the search target (Wolfe, 2007); for example, attention may be directed towards red objects when searching for a red-colored target. (3) The context of a scene is also likely to guide attention: in the real world, objects are constrained to appear in particular locations (for example, cars appear on streets), so attention may be guided to these locations during search (Torralba et al., 2007).

We attempted to predict human search fixations using computational models of the three sources of guidance (saliency, target features, and scene context) in a large database of human fixation data (14 observers searching for pedestrians in 912 outdoor scenes). When tested individually, each model performed above chance but scene context provided the best prediction of human fixation locations. A combined model incorporating all three sources of guidance outperformed each of the single-source models, with performance driven predominantly by the context model. The combined model performed at 94% of the level of human agreement in the search task, as measured by the area under the ROC curve.

We compared performance of the three-source model of search guidance to an empirically-derived model of scene context. For this comparison, a “context oracle” was created by asking human observers to specify the scene region where a target was most likely to appear. This context oracle predicted human fixations as well as the three-source computational model. We discuss the implication of these results for future models of visual search.

Ehinger, K. Hidalgo-Sotelo, B. Torralba, A. Oliva, A. (2009). Modeling visual search in a thousand scenes: The roles of saliency, target features, and scene context [Abstract]. Journal of Vision, 9(8):1199, 1199a,, doi:10.1167/9.8.1199. [CrossRef]
 Funded by NSF CAREER awards (0546262) to A.O. and (0747120) to A.T. B.H.S. is funded by an NSF Graduate Fellowship.

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.