Our rapid understanding of complex visual scenes (Potter & Faulconer,
1975; Schyns & Oliva,
1994; Thorpe, Fize, & Marlot,
1996) suggests that a surprisingly large amount of information can be captured within a glance. Such rapidly extracted information leads to an image
gist (Friedman,
1979; Oliva,
2005; Potter,
1976) that not only provides information about the spatial layout of the scene, but also information about a few objects and their surface characteristics (Rensink,
2000). This information is sufficient to assign a semantic label to the scene, its category or function (Biederman,
1981; Biederman, Rabinowitz, Glass, & Stacy,
1974; Fei-Fei, Iyer, Koch, & Perona,
2007; Joubert, Rousselet, Fize, & Fabre-Thorpe,
2007; Rousselet, Joubert, & Fabre-Thorpe,
2005). Experimental evidence has shown that categorical information about scenes or objects can involve visual processes that do not require focused attention and are largely parallel (Li, VanRullen, Koch, & Perona,
2002; Rousselet, Fabre-Thorpe, & Thorpe,
2002; Rousselet, Thorpe, & Fabre-Thorpe,
2004; VanRullen, Reddy, & Koch,
2004; but see Evans & Treisman,
2005; Walker, Stafford, & Davis,
2008). However, beyond the question of their possible feed-forward nature (Serre, Oliva, & Poggio,
2007; Thorpe, Delorme, & Van Rullen,
2001; Thorpe et al.,
1996; VanRullen,
2007; Vogel, Schwaninger, Wallraven, & Bulthoff,
2006), the kinds of information extracted by these fast visual processes are still unclear. To account for the speed of complex scene categorization, Oliva and Torralba proposed that diagnostic information was present at different image scales, including the global statistical properties described by the Fourier spectral signature of the scene (Oliva,
2005; Oliva & Torralba,
2001).