August 2009
Volume 9, Issue 8
Free
Vision Sciences Society Annual Meeting Abstract  |   August 2009
Feature-based and contextual guidance mechanisms in complex natural visual search
Author Affiliations
  • Cheston Tan
    McGovern Institute for Brain Research, and Department of Brain and Cognitive Sciences, MIT
  • Thomas Serre
    McGovern Institute for Brain Research, and Department of Brain and Cognitive Sciences, MIT
  • Sharat Chikkerur
    McGovern Institute for Brain Research, and Department of Electrical Engineering and Computer Science, MIT
  • Tomaso Poggio
    McGovern Institute for Brain Research, and Department of Brain and Cognitive Sciences, MIT
Journal of Vision August 2009, Vol.9, 1189. doi:10.1167/9.8.1189
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Cheston Tan, Thomas Serre, Sharat Chikkerur, Tomaso Poggio; Feature-based and contextual guidance mechanisms in complex natural visual search. Journal of Vision 2009;9(8):1189. doi: 10.1167/9.8.1189.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Predicting human eye movements during search in complex natural images is an important part of understanding visual processing. Previously, several mechanisms have been shown to influence the deployment of attention and eye movements: bottom-up cues (Itti & Koch 2001) as well as top-down feature-based (Wolfe 1994, Navalpakkam & Itti 2005) and contextual cues (Torralba 2003). How the visual system combines these cues and what the underlying neural circuits are, remain largely unknown.

Here, we consider a Bayesian framework to study the contributions of bottom-up cues and top-down (feature-based and contextual) cues during visual search. We implemented a computational model of visual attention that combines feature-based and contextual priors. The model relies on a realistic population of shape-based units in intermediate areas of the ventral stream (Serre et al 2005), which is modulated by the prefrontal and parietal cortex. The posterior probability of the resulting locations is combined within parietal areas to generate a task-specific attentional map.

We compared the predictions by the model with human psychophysics. We used a database of street-scene images and instructed subjects to count either the number of cars or pedestrians while we tracked their eye movements. We found a surprisingly high level of agreement among subjects‘ initial fixations, despite the presence of multiple targets. We also found that the pattern of fixations is highly task-dependent, suggesting that top-down task-dependent cues played a larger role than bottom-up task-independent cues.

We found that our proposed model, which combines bottom-up and top-down cues, can predict fixations more accurately than alternative models, including purely bottom-up approaches. Also, neither feature-based nor contextual mechanisms alone could account for human fixations. Our results thus suggest that human subjects efficiently use all available prior information when searching for objects in complex visual scenes.

Tan, C. Serre, T. Chikkerur, S. Poggio, T. (2009). Feature-based and contextual guidance mechanisms in complex natural visual search [Abstract]. Journal of Vision, 9(8):1189, 1189a, http://journalofvision.org/9/8/1189/, doi:10.1167/9.8.1189. [CrossRef]
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×