When looking at natural scene images, our gaze is attracted to particular regions called
salient regions. A lot of research has attempted to understand why some regions are salient regarding their statistical properties, using behavioral experiments with eye movement recording and/or computational models of the human visual system (Baddeley & Tatler,
2006; Buswell,
1935; Henderson & Hollingworth,
1999; Itti, Koch & Niebur,
1998; Le Meur, Le Callet, Barba, & Thoreau,
2006; Mannan, Ruddock, & Wooding,
1997; Marat et al.,
2009; Parkhurst, Law, & Niebur,
2002; Privitera & Stark,
2000; Reinagel & Zador,
1999; Torralba, Oliva, Castelhano, & Henderson,
2006; Yarbus,
1967).
Visual saliency depends mainly on two factors: one is task independent and the other one is task dependent. The first one refers to bottom-up processes and is mainly driven by stimulus visual features (Koch & Ullman,
1985; Treisman & Gelade,
1980); the latter refers to top-down processes and is mainly driven by the task (Castelhano, Mack, & Henderson,
2009; Henderson & Hollingworth,
1999; Yarbus,
1967). Most of the saliency models, also called visual attention models, simulate bottom-up processes to look for salient regions in visual stimuli; these regions are supposed to attract attention and, hence, observers' gazes. Most computational models of visual attention are inspired by the
Feature Integration Theory (FIT) of Treisman and Gelade (
1980) and modeled on bottom-up processes. According to this theory, visual stimuli are first broken down into several feature maps such as intensity, color, and orientation; these features were shown to be encoded in the primary visual cortex and to evoke responses from different cortical cells (Hubel, Wiesel, & Stryker,
1977). A region is salient if its features differ from the surrounding features. The features are represented by separate feature maps, which are then combined to create a master saliency map. This map emphasizes salient regions. Besides intensity, color, and orientation visual features as mentioned in the FIT, there are several other salient visual features such as edges, spatial frequencies, and motion (Baddeley & Tatler,
2006; Wolfe & Horowitz,
2004). Usually, the following set of features—intensity, orientation, color, and spatial frequency—is taken into account in visual attention models to predict eye movements for exploring static scenes (Itti et al.,
1998; Le Meur et al.,
2006; Torralba et al.,
2006). In this framework, it is accepted that color information contributes to fixation locations. As in the above models, several studies have shown the important role of color in visual attention (Frey, Honey, & König,
2008; Jost, Ouerhani, Wartburg, Müri, & Hügli,
2005; Peters & Itti,
2008). It was revealed that in free viewing, gaze is attracted by color depending on the semantic category of the visual scene (Frey et al.,
2008; Parkhurst et al.,
2002). For example, Frey et al. (
2008) used seven categories of stimuli: face, flower and animal, forest, fractal, landscape, man-made, and rainforest and found a difference in observers' fixation locations between color and grayscale scenes for the rainforest category. The role of color was also proved for different types of cognitive tasks. In recognition tasks, a saliency map taking into account color features better correlates with human fixations than a saliency map using only grayscale visual information (Jost et al.,
2005). Similarly, a broad range of visual features such as orientation, intensity, color, flicker, motion, and their combinations is also tested for the prediction of eye movements in video games (Peters & Itti,
2008). The results of this study emphasize the role of color, whether it is used alone or combined with other visual features. Both in visual attention and also in object recognition, which might be associated with high-level vision, it was shown that color plays an important role making for faster object recognition, for example (see Tanaka, Weiskopf, & Williams,
2001 for a review).