Abstract
Complex natural scenes are very quickly categorized, faster than 150 ms, suggesting a simple and efficient processing. Recent models of visual recognition have suggested that perceptual analysis may start with a parallel extraction of different spatial frequencies (SF), but using a preferential coarse-to-fine (CtF) sequence of SF processing. A rapid extraction of low spatial frequencies (LSF) may thus provide an initial and crude parsing of the scene, subsequently refined by slow but more detailed high spatial frequencies (HSF). However, a fine-to-coarse (FtC) being sometimes preferred to a CtF sequence depending on task demands. The present experiment aims to investigate whether a CtF processing allows faster scene categorization rather than a reverse FtC processing. To constrain SF processing according to these sequences, we presented brief movies of successive SF-filtered scenes with opposite SF sequences (either from LSF to HSF, or the reverse), allowing us to experimentally “decompose” the visual inputs in either CtF or FtC sequences. Movies last 150 ms and were composed of six SF-images of the same scene, filtered either at 1, 2, 3, 4, 5, 6 cycles/degree of visual angle for CtF movies or the reverse for FtC movies. Thirty five participants performed a categorization task (indoors vs. outdoors) on these movies. Results showed that they categorized CtF movies significantly faster than FtC movies. Using for the first time dynamic stimuli, these results provide critical support to recent models of vision. The current stimuli seem therefore well appropriate to highlight the neural basis of the CtF categorization.