Whether face detection or recognition builds on a limited spatial frequency band of the amplitude spectrum is as yet an unsettled issue. Studies on human face recognition in the last 25 years, using variable methods of spatial frequency filtering, yielded contradictory evidence as to what band of spatial frequencies is sufficient and optimal for face recognition (Costen, Parker, & Craw,
1994,
1996; Fiorentini, Maffei, & Sandini,
1983; Gold, Bennett, & Sekuler,
1999; Näsänen,
1999).
One hypothesis, postulating coarse-to-fine usage of spatial frequency in face processing, relies first on physiological evidence for faster processing along the magno-cellular pathway (tuned to lower spatial frequencies; Nowak & Bullier,
1998), and second on the observation of stronger occipital ERP activations to LSF face content (as compared to HSF content) in face detection experiments (Holmes, Winston, & Eimer,
2005; Pourtois, Dan, Grandjean, Sander, & Vuilleumier,
2005).
In contrast, a recent study on face detection (Halit, de Haan, Schyns, & Johnson,
2006) showed that both the face-sensitive N170 ERP-component as well as performance of detection suffered upon removing high spatial frequencies (>24 cycles/image) from faces. Their results confirm those of Fiorentini et al. (
1983) and support the hypothesis that the visual system flexibly exploits those spatial frequencies that are relevant to the task at hand (Flevaris, Robertson, & Bentin,
2008; Morrison & Schyns,
2001; Schyns & Oliva,
1997,
1999).
In summary these findings indicate that for face detection a wide range of spatial frequencies might be necessary for optimal performance.
In addition, it must be emphasized that the critical information for producing fast face biases is not just contained in the 1-D amplitude spectrum (collapsed over orientations) but that it is instead a property of the full 2-D amplitude spectrum across the different orientations. Indeed, numerous previous studies have revealed that the 1-D amplitude spectrum, and in particular its power coefficient
α (verifying
A(
f) ∝ 1 /
f α, where
A denotes the amplitude and
f the spatial frequency) can differ systematically across different image categories (Párraga, Troschianko, & Tolhurst,
2000; Tolhurst, Tadmor, & Chao,
1992). In our case, the face images in our database generally had a higher
α coefficient than the transport images (
Figures 5A and
5B). However, this was true at all levels of the phase and wavelet scrambling conditions. In fact, when an “ideal” observer was simulated, who systematically saccaded to the image with higher
α coefficient, we found that this observer would have chosen the face image on about 80% of trials in all conditions (
Figure 5C). While this could possibly account for the choices of our human observers when no scrambling was applied to the images (about 75% of face choices in this case, assuming equal contrast between the face and transport images), it would fail to explain the behavior of these subjects when scrambling was increased (
Figure 5C) as well as the interaction between scrambling method (phase/wavelet) and reaction time (fast/slow) observed in
Figure 4.
In conclusion, the fast face bias depends on the 2-D distribution of energy across spatial frequencies and orientations rather than the 1-D distribution of energy across spatial frequencies—commonly coined the “amplitude spectrum”; in addition, the comparison between phase and wavelet scrambling methods reveals that the fast face bias does not depend much on the absolute or relative spatial distribution of energy in the images, i.e., on phase information.