The surrounding world contains a tremendous amount of visual information, which the visual system cannot fully process (Tsotsos,
1990). The visual system thus faces the problem of how to allocate its processing resources to focus on important aspects of a scene. Despite the limited amount of visual information the system can handle, sampled by discontinuous fixations and covert shifts of attention, we experience a seamless, continuous world. Humans and many other animals thrive using this heavily downsampled visual information. Visual attention as overtly reflected in eye movements partially reveals the sampling strategy of the visual system and is of great research interest as an essential component of visual cognition. Psychologists have investigated visual attention for many decades using psychophysical experiments, such as visual search tasks, with carefully controlled stimuli. Sophisticated mathematical models have been built to account for the wide variety of human performance data (e.g., Bundesen,
1990; Treisman & Gelade,
1980; Wolfe, Cave, & Franzel,
1989). With the development of affordable and easy-to-use modern eye-tracking systems, the locations that people fixate when they perform certain tasks can be explicitly recorded and can provide insight into how people allocate their attention when viewing complex natural scenes. The proliferation of eye-tracking data over the last two decades has led to a number of computational models attempting to account for the data and addressing the question of what attracts attention. Most models have focused on bottom-up attention, where the subjects are free-viewing a scene and salient objects attract attention. Many of these saliency models use findings from psychology and neurobiology to construct plausible mechanisms for guiding attention allocation (Itti, Koch, & Niebur,
1998; Koch & Ullman,
1985; Wolfe et al.,
1989). More recently, a number of models attempt to explain attention based on more mathematically motivated principles that address the goal of the computation (Bruce & Tsotsos,
2006; Chauvin, Herault, Marendaz, & Peyrin,
2002; Gao & Vasconcelos,
2004,
2007; Harel, Koch, & Perona,
2007; Kadir & Brady,
2001, Oliva, Torralba, Castelhano, & Henderson,
2003; Renninger, Coughlan, Verghese, & Malik,
2004; Torralba, Oliva, Castelhano, & Henderson,
2006; Zhang, Tong, & Cottrell,
2007). Both types of models tend to rely solely on the statistics of the current test image for computing the saliency of a point in the image. We argue here that natural statistics (the statistics of visual features in natural scenes, which an organism would learn through experience) must also play an important role in this process.