bottom‐up theories, on the other hand, propose that image properties drive the allocation of attention. They can be viewed as a low-level visual search problem where features are analysed in parallel across the visual field and highly contrasting regions that
pop out are allocated attention. These models are biologically plausible (de Brecht & Saiki,
2006; Itti & Koch,
2000; Koch & Ullman,
1985; Treue,
2003) and can be implemented for a variety of features. Typically, each feature has a feature map, and these feature maps are combined to build a saliency map. The model proposed by Itti and Koch (
2000) hypothesized the order of viewer fixations by inhibiting the return of previously fixated areas. This allows the use of the saliency map to traverse from high to low regions drawing attention. These saliency maps match well with early saccadic movements whether participants are encoding pictures in preparation for a memory test (Foulsham & Underwood,
2007; Underwood & Foulsham,
2006; Underwood et al.,
2006) or free viewing (Parkhurst, Law, & Niebur,
2002). Parkhurst et al. (
2002) have shown that the saliency at fixation is better than a chance fixation distribution. A flickering light or an abrupt onset can involuntarily capture attention (Posner,
1980), a property widely used by police cars, ambulances, and railroad crossings. This has been found to occur independently of goals and the task at hand (Christ & Abrams,
2006, Mulckhuyse, Van Zoest, & Theeuwes,
2008; Neo & Chua,
2006; Schreij, Owens, & Theeuwes,
2008). Although spatial and temporal feature contrasts could explain this finding, a probabilistic-based formulation has been proposed where salient regions may be defined by the amount of bottom‐up Bayesian Surprise (Itti & Baldi,
2006). Contrasting studies have found that this attention capture effect can be modulated by top‐down factors (e.g., Lien, Ruthruff, Goodin, & Remington,
2008). Hollingworth and Henderson (
1998) suggest that the processing of object information is functionally independent of the scene context. This finding not only conflicts with evidence of a consistent or inconsistent scene advantage but also supports the idea that image feature contrasts objectively draw attention. Recent studies have also found a relationship between image salience and high-level cognitive functions, for tasks involving working memory (Fine & Minnery,
2009) and for scene labeling (Elazary & Itti,
2008). To obtain a more accurate model of scene perception, presumably both bottom‐up and top‐down factors should be incorporated, and advances have been made to achieve this (Navalpakkam & Itti,
2005).