The behavioral measures supporting this view invariably include not only the time needed for visual processing but also that required for response execution, making it difficult to assess the relative contribution of each to the observed reaction times (DiCarlo & Maunsell,
2005; Johnson & Olshausen,
2003; VanRullen & Thorpe,
2001). However, in the light of recent results, it appears that processing times can be even shorter than previously thought: In a forced choice task, where two images are simultaneously flashed to the left and right of fixation, reliable saccadic eye movement responses to the side of the animal can be initiated as early as 130 ms after stimulus onset (Kirchner & Thorpe,
2006). Given that this time includes saccade preparation, this seems to imply that the underlying visual processing can be done in 100 ms or less. This seems extremely short given the values usually proposed for higher level responses in humans. For example, face selective ERPs such as the N170 typically start substantially later (Itier & Taylor,
2004; Liu, Harris, & Kanwisher,
2002; Rousselet, Mace, & Fabre-Thorpe,
2004). This raises the possibility that this sort of task is not truly making use of the highest levels of the visual system, but could rather be based on simpler heuristics that do not specifically involve the detection of animals as such, but rather low-level image attributes that happen to be associated with images containing animals. If this sort of rapid vision really depends on low-level processing only, then processing times could be expected to be very short. As shown previously, natural scene categorization could indeed be at least partially explained by an analysis of low-level features, such as can be performed in early visual areas. Specifically, Torralba and Oliva have recently proposed that the distribution of orientations, computed at spatially defined locations of the image, can be diagnostic of particular categories of images (Torralba & Oliva,
2003). More specifically, these authors have shown that a linear classifier, analyzing the distributions of energy for orientation and spatial frequency-tuned channels in a 4 × 4 grid, can reach accuracy levels of 80% or more on tasks such as judging whether the target category is contained in a natural image. Similar findings have also been reported by a different team (Mermillod, Guyader, & Chauvin,
2005) that reinforces the low-level approach to natural scene categorization.