Free
Research Article  |   January 2009
Rapid visual categorization of natural scene contexts with equalized amplitude spectrum and increasing phase noise
Author Affiliations
Journal of Vision January 2009, Vol.9, 2. doi:10.1167/9.1.2
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Olivier R. Joubert, Guillaume A. Rousselet, Michèle Fabre-Thorpe, Denis Fize; Rapid visual categorization of natural scene contexts with equalized amplitude spectrum and increasing phase noise. Journal of Vision 2009;9(1):2. doi: 10.1167/9.1.2.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

This study aimed to determine the extent to which rapid visual context categorization relies on global scene statistics, such as diagnostic amplitude spectrum information. We measured performance in a Natural vs. Man-made context categorization task using a set of achromatic photographs of natural scenes equalized in average luminance, global contrast, and spectral energy. Results suggest that the visual system might use amplitude spectrum characteristics of the scenes to speed up context categorization processes. In a second experiment, we measured performance impairments with a parametric degradation of phase information applied to power spectrum averaged scenes. Results showed that performance accuracy was virtually unaffected up to 50% of phase blurring, but then rapidly fell to chance level following a sharp sigmoid curve. Response time analysis showed that subjects tended to make their fastest responses based on the presence of diagnostic man-made information; if no man-made characteristics enable to reach rapidly a decision threshold, because of a natural scene display or a high level of noise, the alternative decision for a natural response became increasingly favored. This two-phase strategy could maximize categorization performance if the diagnostic features of man-made environments tolerate higher levels of noise than natural features, as proposed recently.

Introduction
Our rapid understanding of complex visual scenes (Potter & Faulconer, 1975; Schyns & Oliva, 1994; Thorpe, Fize, & Marlot, 1996) suggests that a surprisingly large amount of information can be captured within a glance. Such rapidly extracted information leads to an image gist (Friedman, 1979; Oliva, 2005; Potter, 1976) that not only provides information about the spatial layout of the scene, but also information about a few objects and their surface characteristics (Rensink, 2000). This information is sufficient to assign a semantic label to the scene, its category or function (Biederman, 1981; Biederman, Rabinowitz, Glass, & Stacy, 1974; Fei-Fei, Iyer, Koch, & Perona, 2007; Joubert, Rousselet, Fize, & Fabre-Thorpe, 2007; Rousselet, Joubert, & Fabre-Thorpe, 2005). Experimental evidence has shown that categorical information about scenes or objects can involve visual processes that do not require focused attention and are largely parallel (Li, VanRullen, Koch, & Perona, 2002; Rousselet, Fabre-Thorpe, & Thorpe, 2002; Rousselet, Thorpe, & Fabre-Thorpe, 2004; VanRullen, Reddy, & Koch, 2004; but see Evans & Treisman, 2005; Walker, Stafford, & Davis, 2008). However, beyond the question of their possible feed-forward nature (Serre, Oliva, & Poggio, 2007; Thorpe, Delorme, & Van Rullen, 2001; Thorpe et al., 1996; VanRullen, 2007; Vogel, Schwaninger, Wallraven, & Bulthoff, 2006), the kinds of information extracted by these fast visual processes are still unclear. To account for the speed of complex scene categorization, Oliva and Torralba proposed that diagnostic information was present at different image scales, including the global statistical properties described by the Fourier spectral signature of the scene (Oliva, 2005; Oliva & Torralba, 2001). 
Following Marr (1982), the visual system could behave as a Fourier analyzer. The first steps of visual processing have often been described as filtering operations (Campbell & Robson, 1968; Field, 1999, Westheimer, 2001). In particular, some theories of sensory coding have suggested that one aim of the early visual system is to reduce redundancy in the coding of natural scenes, thus providing a more compact description of natural stimuli (Barlow, 1961). For illustration, given that the amplitude spectrum of natural scenes falls with increasing spatial frequency according to a 1/f rule, the compression of the visual information can be optimized by suppressing its high frequency content as can be seen in the visual system outside the fovea (Field, 1999). Fourier analysis captures a large amount of image redundancy since the power spectrum of a picture (the square of its amplitude spectrum) provides a direct measure of its autocorrelation. The power spectrum of natural scenes can be modified, if the frequency fall-off is preserved, without seriously impairing our understanding of these scenes—probably because these modifications mainly affect redundant information. Therefore, averaging the power spectrum of a set of visual stimuli (while keeping the original phase information of each stimulus) is potentially a very good way to discard low-level biases among experimental conditions without necessarily critically deteriorating the informative content. This method strictly equalizes global luminance and, rather importantly, averages the spatial frequency contents of all stimuli at each scale and orientation. 
On the other hand, scene redundancies could have typical characteristics that could be of importance for the visual system, for example by supporting fast scene categorization processes. Oliva and Torralba proposed that the visual system could construct a meaningful representation of scene gist directly from such global Fourier based low-level features (Oliva & Torralba, 2001; Torralba & Oliva, 2003). Their results showed that complex scenes exhibit different Fourier spectral signatures that can be used by computational models to infer scene categories. For example, natural scenes and man-made environments were accurately distinguished using the distribution of orientation within the amplitude spectrum: the frequency fall-off is specific at each orientation, and the relative distribution of energy at each orientation and spatial frequency is similar within broad scenes categories (Baddeley, 1997; Oliva & Torralba, 2001). 
Recent experiments have provided evidence that the human visual system exploits these low-level image statistics for performing categorization. In particular, Kaping, Tzvetanov, and Treue (2007) showed that the visual system adaptation to the amplitude spectrum of Natural or Man-made images biases rapid scene categorization toward the opposite category of the adapter. Similar results were obtained by Guyader, Chauvin, Peyrin, Hérault, and Marendaz (2004) for basic level categories. However, other studies that manipulated phase information have concluded that higher order statistics are needed to perceive scenes as natural because performance decreased following a sigmoid curve that characterizes categorical processes (Einhäuser et al., 2006). Recently, Loschky and Larson (2008) obtained similar results using a Natural vs. Man-made scene categorization task in which stimuli were equalized for average luminance and global contrast. But this sigmoidal performance was not reported in an Animal vs. Non-animal categorization task in which phase information was also systematically randomized (Wichmann, Braun, & Gegenfurtner, 2006). However, none of these studies provided a reaction time analysis, thus leaving open the issue of the early contribution of low-level spectral signatures to the speed of scene categorization. 
In the present study, we aimed to determine to what extent rapid scene categorization depends on Fourier spectrum signatures, and how much performance speed would be affected when this information is no longer diagnostic. Using a Natural vs. Man-made task already used to investigate context categorization (Joubert et al., 2007), we measured categorization speed in several conditions. First, we used an original set of achromatic photographs of natural scenes to establish a benchmark level of performance. Second, the same photographs were equalized in luminance and contrast. Third, luminance and contrast equalized images were also equalized in spectral energy by applying the same averaged power spectrum to each image. Finally, we measured performance impairments when the images were further altered by mixing phase noise with their original phase spectra. 
Experiment 1: Amplitude spectrum equalization
The first experiment was designed to evaluate the impact of low-level image statistics equalization on context categorization performance. Setting the power spectrum of each image to the average across the entire image set eliminates amplitude spectrum diagnostic information that distinguishes man-made from natural scenes (Oliva & Torralba, 2001). We also characterized performance in an intermediate condition in which photographs were just equalized for global luminance and contrast. 
Methods
Subjects and tasks
Twelve volunteers (9 men, mean age 25, range 21–30, 4 of them left handed) performed 2 rapid visual go/no-go context categorization tasks both using man-made and natural scenes stimuli: one task used natural scenes as the target category, the other task used man-made scenes as target category. Subjects were asked to lift their finger as quickly and as accurately as possible (go responses) in response to a target picture, and to withhold their response (no-go responses) otherwise. Subjects gave their written informed consent; all of them had normal or corrected to normal vision. 
The experiment consisted of two blocks, one for each of the natural and man-made categorization tasks. Each block was preceded by a training series of 96 trials. Each block was composed of 6 series of 96 trials containing an equal proportion of the three different stimulus conditions: original gray-level photographs (O), stimuli with luminance and contrast equalized (EL) and amplitude spectrum equalized stimuli (ELA). In each series, target and distractor trials were equally likely. The order in which tasks were performed (man-made and natural), the successive conditions tested (O, EL, ELA), and the stimuli used, were all counterbalanced among subjects. A given stimulus was seen by 4 subjects (twice as a target and twice as a distractor), but was never seen more than once (either as target or distractor) by a given subject. 
Stimuli
The set of 1152 original scenes ( Figure 1, O) was selected from a large commercial CD-ROM library (Corel Stock Photo Libraries) and included 576 man-made scenes and 576 natural scenes. Natural scenes were composed of sea scenes, mountain scenes, desert, iceberg, forest, and field scenes. Man-made scenes contained images of street scenes (with or without pedestrians), indoor scenes like kitchens, museums, churches, and aerial views of cities. None of the man-made scenes contained mountains or sea views, and none of the natural scenes contained buildings. Within each category, images were as diverse as possible. Images were gray-level coded, jpeg 8-bit format, with a size of 512 × 512 pixels (about 5 to 6° of visual angle on a 19 inches screen). 
Figure 1
 
Examples of stimuli used in Experiments 1 and 2. All stimuli were computed from gray-level pictures of man-made and natural scenes. In both experiments, subjects performed a go/no-go task in which man-made and natural scenes were alternatively targets or distractors. (A) In Experiment 1, stimuli were divided in different subsets: original scenes (O), global contrast and luminance equalized scenes (EL), amplitude spectrum equalized scenes (ELA). In Experiment 2, stimuli were ELA scenes that included a variable percentage of phase noise, from 0 to 100% with increments of 11%. In red the two conditions at which subjects' performance dropped to chance level. (B) Average power spectral signatures of man-made EL stimuli (blue), ELA stimuli (black) and natural EL stimuli (green). The contour plots represent 3 levels of energy of each spectral signature: the contour is selected so that the sum of the components inside the section represents 60, 80, and 90% of the total energy (energy is obtained by adding the square of the Fourier components; units are in cycles per pixel).
Figure 1
 
Examples of stimuli used in Experiments 1 and 2. All stimuli were computed from gray-level pictures of man-made and natural scenes. In both experiments, subjects performed a go/no-go task in which man-made and natural scenes were alternatively targets or distractors. (A) In Experiment 1, stimuli were divided in different subsets: original scenes (O), global contrast and luminance equalized scenes (EL), amplitude spectrum equalized scenes (ELA). In Experiment 2, stimuli were ELA scenes that included a variable percentage of phase noise, from 0 to 100% with increments of 11%. In red the two conditions at which subjects' performance dropped to chance level. (B) Average power spectral signatures of man-made EL stimuli (blue), ELA stimuli (black) and natural EL stimuli (green). The contour plots represent 3 levels of energy of each spectral signature: the contour is selected so that the sum of the components inside the section represents 60, 80, and 90% of the total energy (energy is obtained by adding the square of the Fourier components; units are in cycles per pixel).
Each of the 1152 stimuli of EL subset ( Figure 1, EL) was set to the same global luminance and the same global (RMS) contrast (corresponding to a luminance distribution with a mean of 128 and a standard deviation of 10.5), computed by taking the average luminance and RMS contrast of the 1152 original scenes (O). 
Each of the 1152 stimuli of ELA subset ( Figure 1, ELA) had the same amplitude spectrum, computed by averaging the power spectra across the 1152 EL stimuli. The ELA stimuli were constructed by inverse Fourier transforms using the square root of the averaged power spectrum and the original phase of each EL stimulus. Figure 1B illustrates the resulting spectral signatures of EL and ELA stimuli. 
Procedure
Subjects sat in a dimly lit room, 1 m away from a computer screen (resolution 1024 × 768, vertical refresh 75 Hz) connected to a PC computer. Stimulus display and behavioral response measurements were carried out using Presentation software (NeuroBehavioral Systems, http://nbs.neuro-bs.com/). 
Each trial started with a fixation cross (1° of visual angle) displayed at the center of a black screen for 300–900 ms randomly, immediately followed by the stimulus for two frames (26 ms), also in the middle of the screen. These brief presentations prevented exploratory eye movements and constrained the time available for information uptake. To start stimulus display, subjects had to place their fingers on a response pad equipped with infrared diodes that allowed timing with microsecond precision. For each target-image, subjects had to respond (finger lift) in less than 1000 ms; longer reaction times (RT) were considered as no-go responses. RT were computed between stimulus onset and finger lift. Following this 1-s period, a black screen was displayed during 300 ms before the next trial started. A trial lasted between 1600 and 2200 ms. 
Evaluation of performance
Performance was recorded in terms of accuracy and response speed. For each subject, task and condition, global accuracy, correct-go and false alarms rates, and median RT were computed. In order to evaluate statistical differences across subjects among the three conditions and the two tasks, we used the non-parametric two-way repeated measure Friedman test. When Friedman test showed statistical differences, a paired Wilcoxon test, Bonferonni corrected, was used to assess pairwise comparisons (in the text, only Wilcoxon p-values are indicated). Correct-go and false alarm responses were pooled for all subjects and expressed over time using 20 ms time bins in order to plot RT distributions. Minimal RT was determined as the first 10 ms bin for which correct responses significantly exceed errors using chi-square tests (target and distractor stimuli were equally likely). We used d′ = z hits − z FA, where z is the inverse of the normal distribution function (Macmillan & Creelman, 2005), as a sensitivity measure to compare conditions and tasks. In order to plot d′ as a function of time (d′ curve), we computed the cumulative proportions of hits and false alarms using 20 ms bins. The d′ curves describe performance time courses and provide estimates of the processing dynamics for the entire subject population (Rousselet et al., 2002, 2005). 
Results
Results for man-made and natural categorization tasks are summarized in Figure 2 for the three conditions (O, EL, ELA). Categorization time-courses are illustrated by RT distributions and d′ curves in Figure 3
Figure 2
 
Average performance in Experiment 1. For the three conditions O, EL, ELA described in Figure 1, green bars indicate average subject performance in the natural categorization task (targets: natural scenes, distractors: man-made scenes), and blue bars indicate average performance in the man-made categorization task (targets: man-made scenes, distractors: natural scenes) in the three conditions O, EL, ELA described in Figure 1. Error bars indicate the standard errors of the mean across subjects. (A) Average global accuracy for the 12 subjects. (B) Average median RT for correct-go responses (ms). (C) Average accuracy for correct-go responses (in %). (D) Average false alarm rate. Asterisks indicate statistically significant differences between conditions. Note that the ELA condition was associated with a systematic accuracy drop, longer reaction times in both categorization tasks. Moreover, in the ELA condition, subjects tend to bias their responses toward the natural category.
Figure 2
 
Average performance in Experiment 1. For the three conditions O, EL, ELA described in Figure 1, green bars indicate average subject performance in the natural categorization task (targets: natural scenes, distractors: man-made scenes), and blue bars indicate average performance in the man-made categorization task (targets: man-made scenes, distractors: natural scenes) in the three conditions O, EL, ELA described in Figure 1. Error bars indicate the standard errors of the mean across subjects. (A) Average global accuracy for the 12 subjects. (B) Average median RT for correct-go responses (ms). (C) Average accuracy for correct-go responses (in %). (D) Average false alarm rate. Asterisks indicate statistically significant differences between conditions. Note that the ELA condition was associated with a systematic accuracy drop, longer reaction times in both categorization tasks. Moreover, in the ELA condition, subjects tend to bias their responses toward the natural category.
Figure 3
 
Categorization time-course in Experiment 1. Left column: RT distributions for correct go-responses (thick curves) and false alarms (thin curves) expressed over time using 20 ms bins and RT pooled across all subjects. Minimal RTs were determined as the first 10 ms bin for which correct responses significantly exceeded errors (targets and distractors were equally likely). Right column: d′ expressed over time using 20 ms cumulative bins. Dash lines indicate the d′ reached at the minimal RT computed for the O condition (280 ms) and is used as a reference. (A) Comparison of natural and man-made tasks using original scenes (O condition). (B) Comparison of O, EL and ELA conditions in the natural task. (C) Comparison of O, EL and ELA conditions in the man-made task. (D) Comparison of natural and man-made tasks using amplitude spectrum equalized scenes (ELA condition). With amplitude spectrum equalized stimuli, subjects remain able to perform context man-made and natural categorization tasks but a processing shift toward longer latencies can be observed in both tasks. Stimulus detectability at 280 ms is clearly lower for ELA condition than for O or EL conditions.
Figure 3
 
Categorization time-course in Experiment 1. Left column: RT distributions for correct go-responses (thick curves) and false alarms (thin curves) expressed over time using 20 ms bins and RT pooled across all subjects. Minimal RTs were determined as the first 10 ms bin for which correct responses significantly exceeded errors (targets and distractors were equally likely). Right column: d′ expressed over time using 20 ms cumulative bins. Dash lines indicate the d′ reached at the minimal RT computed for the O condition (280 ms) and is used as a reference. (A) Comparison of natural and man-made tasks using original scenes (O condition). (B) Comparison of O, EL and ELA conditions in the natural task. (C) Comparison of O, EL and ELA conditions in the man-made task. (D) Comparison of natural and man-made tasks using amplitude spectrum equalized scenes (ELA condition). With amplitude spectrum equalized stimuli, subjects remain able to perform context man-made and natural categorization tasks but a processing shift toward longer latencies can be observed in both tasks. Stimulus detectability at 280 ms is clearly lower for ELA condition than for O or EL conditions.
Accuracy
Subjects were very efficient at categorizing scene contexts, but equalizing amplitude spectra significantly impaired performance accuracy. With original scenes, subjects reached an averaged global accuracy of 96% and 96.3% (Wilcoxon test, n.s.) respectively for natural and man-made categorization tasks ( Figure 2A). Equalizing global luminance and contrast did not significantly modify global accuracy (natural task: 96.2%, man-made task: 95.5%, both Wilcoxon tests between O and EL: n.s.). For ELA condition, results showed a significant accuracy decrease in both tasks (5.9 and 4%, respectively; Wilcoxon test between EL and ELA for natural categorization task: p = .003, man-made task: p = .004). No significant difference was found between the two ELA conditions for global accuracy. The ELA manipulation affected differently hits and false alarms. In the ELA condition, hits ( Figure 2C) decreased by 1.9% and 2.6% relatively to the O and EL conditions in the natural categorization task, and by 6.8% and 5.5% respectively in the man-made task (for all comparisons, Wilcoxon tests: p < .016). False alarms ( Figure 2D) increased by 9.4% and 9.1% relatively to O and EL conditions in natural categorization task, whereas the increase was smaller, 2.9% and 2.5%, in the man-made categorization task. This higher error rate on distractors was only significant in the natural categorization task (ELA Nat vs. EL NatWilcoxon test: p = .002). 
The effect of amplitude spectrum equalization was not identical in the two tasks. Subjects tended to make more correct go responses to natural targets (95.9%) than with man-made targets (91.1%, Wilcoxon test: p = .007). They also incorrectly categorized man-made distractors as natural scenes in the natural task more often than they categorized natural scenes as man-made in the man-made task (15.3% vs. 8.1% of false alarms, Wilcoxon test: p = .008). Thus, when using stimuli of equal amplitude spectrum, there was a clear bias toward categorizing scenes as “natural.” 
Categorization processing time-course
Context categorization was very fast with similar median reaction times regardless of the target category: about 411 ms in the natural categorization task and 400 ms in the man-made task ( Figure 2B, Wilcoxon test, n.s.). Categorization time-courses were also very similar in both tasks as illustrated by the virtually superimposed RT distributions and d′ curves ( Figure 3A). Minimal RT in the O condition was 280 ms in both tasks. A Wilcoxon test revealed no significant difference between O and EL median RT in both tasks (EL, natural task: 424 ms, man-made task: 403 ms). EL and O correct-go RT distributions were almost superimposed in both tasks ( Figures 3B and 3C), but the minimal RT was slightly delayed (natural task: 310 ms, man-made task: 290 ms). This delay between O and EL conditions was thus probably due to a larger number of fast false alarms in the EL condition. A similar shift toward longer latencies was observed for the d′ curves; the shift between conditions was defined as the time delay necessary to reach a threshold value. We defined that threshold as the d′ level reached at the minimal RT of 280 ms in the O condition ( Figures 3B and 3C horizontal and vertical dashed lines). This delay was about 20–30 ms in the natural task and 10–20 ms in the man-made task. In summary, equalizing global luminance and contrast introduced a 10–30 ms processing speed delay in discriminating targets from distractors in both categorization tasks, without further effects on median RT. 
In contrast, equalizing amplitude spectra produced a more pronounced effect on response latencies and the delay was visible on median RT (natural task: 448 ms, man-made task: 420 ms, Figure 2B). This 24 ms and 17 ms delay in median RT between EL and ELA conditions reached significance (Wilcoxon tests, natural task: p = .003, man-made task: p = .012), and was significantly longer for the natural categorization task (Wilcoxon test: p = .028). Compared to O and EL conditions, the ELA d′ curves were clearly delayed in both tasks (about 40 ms at the threshold d′ value defined as above, Figures 3B and 3C horizontal and vertical dashed lines). This latency shift was also observed on the RT distributions for correct-go responses at early latencies. The effect was clearly visible for natural categorization task and less visible but nevertheless present in the man-made categorization task. To summarize, equalizing the amplitude spectrum of man-made and natural scene stimuli introduced a clear median RT increase of about 20 ms for both categorization tasks. This delay could reach 40 ms when assessed using the d′ sensitivity measure for early response latencies. 
As with accuracy, the effect of amplitude spectrum equalization on reaction times was not the same in the two tasks. We have already mentioned that the latency shift between EL and ELA correct-go RT distributions was less pronounced for man-made than natural targets at early latencies and the same effect was seen for median RT. Figure 3D directly compares the two tasks in the ELA condition. Whereas the two distributions were almost superimposed with O stimuli, a complete shift of the natural correct ELA go response distribution toward longer latencies is clearly visible when compared to the man-made condition correct ELA go responses ( Figure 3D). We also observed a more subtle difference in false alarm RT distributions: in the man-made categorization task, most of the false alarm responses tended to be produced before 450 ms, while false alarms in the natural categorization task were more equally distributed. Thus, there was a tendency to produce faster responses on man-made targets, with an earlier distribution of false alarms in the man-made task. Such time-course characteristics will be further discussed in terms of processing strategy, since they suggest that man-made information could accumulate faster than natural one, as observed in the second experiment. 
Experiment 2: Phase alteration
Experiment 2 was designed to evaluate the additional effect of phase spectrum alteration on context categorization when other features such as global luminance, global contrast and amplitude spectrum were equalized across all stimuli. The higher order cues present in the phase spectrum were progressively blurred by adding phase noise. Behavioral performance was evaluated on both man-made and natural scene categorization tasks. 
Method
Subjects and task
Ten volunteers (6 men, mean age 27, range 24–31, 2 of them left handed, normal or corrected to normal vision) performed the same rapid visual go/no-go context categorization tasks described in Experiment 1. This experiment was designed in two task blocks (natural categorization task, man-made categorization task), each composed of 7 series of 100 trials, 50 targets and 50 distractors in a random order. In each series, 10 conditions of phase alteration ( Figure 1) were equally likely among target and distractor stimuli. Each task block was preceded by a training series of 90 trials. Each subject categorized each image in only one noise condition; across subjects, every image was seen in all noise conditions. Task order (natural, man-made) was counterbalanced among subjects. 
Stimuli
In order to counterbalance efficiently all noise conditions across scenes and subjects, we completed the image set from Experiment 1 with 248 extra pictures selected to contained a high diversity of scenes as described earlier. A total of 1400 pictures from the Corel Stock Photo Libraries were used; they included 700 man-made scenes and 700 natural scenes. We Fourier transformed each gray-level picture and computed an average power spectrum. Then, we constructed every stimulus by inverse Fourier transform using the average power spectrum, and using a phase spectrum that contained a variable percentage of the phase of the original image and an inversely proportional percentage of the phase of a white noise image. To this end, we applied the weighted mean phase (WMP) algorithm as proposed by Dakin, Hess, Ledgeway, and Achtman ( 2002) for phase blending, using Gaussian noise in the range [−pi; pi]. Thus, from one original image, 10 stimuli were created that contained 0%, 11%, 22%, 33%, 44%, 55%, 66%, 77%, 88% or 100% of phase noise. Thus, the complete set of 14,000 stimuli had the same global luminance and the same amplitude spectrum. Images were encoded in jpeg 8-bit format, with a size of 512 × 512 pixels (approximately 5 to 6° of visual angle). 
Control experiment
Subjects performing go/no-go tasks that include difficult test conditions could bias their behavior toward no-go responses even when instructed to respond on about 50% of the trials. In order to control for such a possible bias, we replicated the experiment using a 2-alternative forced-choice paradigm and 10 other subjects (6 men, mean age 27, range 23–32, 2 of them left handed, normal or corrected to normal vision). Stimuli, display and procedure were identical to the above description except for the response mode. At the beginning of the session, subjects associated left or right hand with natural or man-made category; to start a series, subjects had to place their middle fingers simultaneously on the two “control” keyboard buttons. They further responded to each stimulus by lifting up the finger of the correct hand from the key; no feedback was provided. For half of the subjects, the left and right control keys corresponded respectively to the natural and man-made categories, and the opposite applied to the other half. 
Results
Results on man-made and natural categorization tasks using phase spectrum alteration are summarized in Figure 4. Categorization time-courses are illustrated by RT distributions and d′ curves in Figure 5
Figure 4
 
Average performance in Experiment 2. Subjects' average performance as a function of percentage of phase noise. The area around each data point represents 95% confidence interval computed using percentile bootstrap with 2000 resamples. When satisfactory, a fit with a psychometric function a + b/(1 + exp − ((x − c)/d)) is superimposed. (A) Average accuracy. (B) Average median RT for correct-go responses (ms). (C) Average accuracy for correct-go responses. (D) Average false alarm rate. Green (blue) data points indicate performance in the natural (man-made) go/no-go categorization task. Subplots in (A, B, C) illustrate average performance recorded in the forced-choice control task using the same stimuli than the go/no-go tasks; for the forced-choice task, green and blue lines stand thus for performance on natural and man-made stimuli, respectively. Asterisks indicate statistically significant differences between go/no-go tasks. Global accuracies on man-made and natural categorization tasks are highly similar across phase noise levels. However, subjects' responses were strongly biased toward natural category when more than 55% of phase information was altered, while the opposite response bias was observed for less than 44% of phase noise stimuli. Reaction times were 60 ms longer in the natural categorization task.
Figure 4
 
Average performance in Experiment 2. Subjects' average performance as a function of percentage of phase noise. The area around each data point represents 95% confidence interval computed using percentile bootstrap with 2000 resamples. When satisfactory, a fit with a psychometric function a + b/(1 + exp − ((x − c)/d)) is superimposed. (A) Average accuracy. (B) Average median RT for correct-go responses (ms). (C) Average accuracy for correct-go responses. (D) Average false alarm rate. Green (blue) data points indicate performance in the natural (man-made) go/no-go categorization task. Subplots in (A, B, C) illustrate average performance recorded in the forced-choice control task using the same stimuli than the go/no-go tasks; for the forced-choice task, green and blue lines stand thus for performance on natural and man-made stimuli, respectively. Asterisks indicate statistically significant differences between go/no-go tasks. Global accuracies on man-made and natural categorization tasks are highly similar across phase noise levels. However, subjects' responses were strongly biased toward natural category when more than 55% of phase information was altered, while the opposite response bias was observed for less than 44% of phase noise stimuli. Reaction times were 60 ms longer in the natural categorization task.
Figure 5
 
Categorization time-course in Experiment 2. Green (blue) lines indicate performance in the natural (man-made) go/no-go categorization task, as a function of time using RT pooled across subjects. (A) RT distributions in the natural and man-made tasks for all low phase noise level stimuli pooled together (11–44%, thick line) compared to original phase stimuli (0%, filled histogram), using 20 ms bins. (B) RT distributions, in the natural and man-made tasks, of correct-go responses (thick curves) and false alarms (thin curves), for low phase noise level stimuli (11–44%), using 20 ms bin. (C) Comparison between correct-go response RT distributions (20 ms bins) for stimuli with low phase noise level (11–44%, same as B thick curves) shown with filled histograms and the stimuli with high phase noise level (55–88%, thick curves). (D) d′ curves computed using 20 m bin for, from the thickest to the thinnest lines, stimuli with phase noise levels 11–22%, 33–44%, 55–66% and 77–88%.
Figure 5
 
Categorization time-course in Experiment 2. Green (blue) lines indicate performance in the natural (man-made) go/no-go categorization task, as a function of time using RT pooled across subjects. (A) RT distributions in the natural and man-made tasks for all low phase noise level stimuli pooled together (11–44%, thick line) compared to original phase stimuli (0%, filled histogram), using 20 ms bins. (B) RT distributions, in the natural and man-made tasks, of correct-go responses (thick curves) and false alarms (thin curves), for low phase noise level stimuli (11–44%), using 20 ms bin. (C) Comparison between correct-go response RT distributions (20 ms bins) for stimuli with low phase noise level (11–44%, same as B thick curves) shown with filled histograms and the stimuli with high phase noise level (55–88%, thick curves). (D) d′ curves computed using 20 m bin for, from the thickest to the thinnest lines, stimuli with phase noise levels 11–22%, 33–44%, 55–66% and 77–88%.
Accuracy
Subjects' accuracy was measured for each of the 11% steps in phase noise levels ( Figure 4A). For both tasks, accuracy followed a sigmoid curve that started with a high plateau at above 80% accuracy for the lowest percentage of phase noise, presented an inflexion point around 50% of phase noise and quickly reached chance level for the highest noise levels. This accuracy pattern was highly similar for man-made and natural categorization tasks; it was also similar to the accuracy pattern obtained in the control experiment that used forced-choice responses. An ANOVA comparing the accuracy in both go/no-go tasks showed no significant task effect. There were no significant differences between 11%, 22%, 33%, and 44% noise conditions, and neither were there significant differences between the 55%, 66%, 77%, and 88% noise conditions. To simplify subsequent analyses, we merged conditions from 11% to 44% of phase noise (low noise stimuli), and those from 55 to 88% (high noise stimuli), leaving apart the condition with original phases preserved and the condition with complete random phases. Using these merged conditions, a paired t-test did not show any difference between man-made and natural categorization tasks (low noise: p = .59, high noise: p = .38). 
On the other hand, the separate analysis of correct-go responses and false alarms ( Figures 4C and 4D), revealed a different effect of phase degradation depending on the target category. Correct-go responses were significantly more accurate for man-made targets than for natural targets for low phase noise levels ( p < .002), and significantly less accurate for high noise levels ( p < .0001). More false alarms were recorded during man-made categorization task for low phase noise levels than during natural task ( p = .006), and conversely fewer false alarms were observed during the man-made categorization task with high phase noise levels ( p = .031). Thus, when most of the original phase was preserved, subjects were biased toward man-made responses but when little of the original phase information remained, subjects were biased toward natural responses. This pattern was also observed in the forced-choice task. 
Categorization time-course
Median RT was almost constant across the phase noise conditions in both tasks ( Figure 4B). Median RT during the man-made categorization task (464 ms) was about 60 ms shorter than during the natural categorization task (527 ms), irrespective of the phase noise level (paired t-test, low noise condition: p = .0012, high noise condition: .045). In the forced-choice task, a longer median RT was observed (540 ms); but again, response speed was also almost constant across phase noise levels. 
Categorization time-courses were also found to be remarkably similar across phase noise levels, as assessed by the RT distributions of correct-go responses. Figure 5A compares correct-go RT distributions for low phase noise stimuli to the one for the original phase noise condition (0% phase noise). For each categorization task, man-made and natural, the correct-go RT distributions with low phase noise and original phase noise conditions were very similar. Figure 5C shows the RT distributions for the high phase noise conditions. The lower hit accuracy in the man-made categorization task compared to the natural task is clearly observed, but with high or low phase noise level, within each task RT varied in the same range, i.e. if normalized, RT distributions would appear very similar. 
However, we observed differences in timing between the man-made and the natural categorization tasks. Man-made correct-go RT distributions always deviated from zero earlier than with the RT distributions for natural targets and a similar delay was observed for false alarm RT distributions ( Figures 5A, 5B, and 5C). A second observation can be made concerning the false alarm RT distributions ( Figure 5B): as in Experiment 1, the percentage of false alarms in the man-made categorization task was higher for short latency responses than for late ones, while false alarms in the natural categorization task were more evenly distributed along the overall response time range. Using d′ curves, we further explored the different timing characteristics between man-made and natural categorization tasks ( Figure 5D). In addition to the decrease in sensitivity associated with increasing amount of phase noise, d′ curves deviated earlier from zero with the natural categorization task than for man-made categorization task, unlike correct-go RT distributions. However, after a short plateau followed by a rebound from 350 to 360 ms for the natural categorization task, d′ curves reached similar values in both tasks. To better understand how this d′ pattern could be related to man-made and natural categories, we further explored categorization time courses using hit and false alarm data from both tasks by performing new analyses using the D–T value described below. 
Categorization time-course using D–T value
By analogy to d′ sensitivity measure, we designed a new measure (D–T value) that was intended to describe how a particular stimulus set was detected with respect to its task status in alternated crossed tasks. D-prime is a sensitivity measure based on the signal detection theory framework, which provides a measure of performance that is independent of decision bias like strategy choice or speed-accuracy trade off. By using hit and false alarm data, d′ measures how sensitive subjects are at distinguishing between two different stimulus sets. D-prime is often interpreted as a ‘physical’ distance between stimuli: in the analyses described above, d′ between man-made scene targets and natural scene distractors could be related for instance to the number of visual cues that were characteristic of man-made and natural scenes and that were preserved from noise blurring. 
Here we aimed to consider the current data at a higher level of representation, for example at a decision or motor stage. The D–T value is intended to measure a ‘distance’ between two internal representations related to two different behavioral outcomes and independently of their physical distance. The validity of the approach is supported by electrophysiological findings in monkey prefrontal cortex (Freedman, Riesenhuber, Poggio, & Miller, 2001) that showed differential neuronal responses based on behavioral outcomes rather than perceptual inputs. 
We took advantage of the current experimental paradigm in which the two stimulus sets are processed alternatively as targets and distractors in two different tasks. For a given stimulus set (the man-made stimuli for example), we considered together the correct go-responses when man-made scenes were targets in one task and the incorrect go-responses produced in task 2 when they were distractors. By using hits and false alarms to the same stimuli, we removed the ‘physical’ distance in order to enhance a ‘decision/motor’ factor. To consider together the data obtained from two different tasks, we made the reasonable assumption that the visuo-motor processes involved in both cases are similar. 
We thus designed a sensitivity measure by computing DT = z(pHits task 1) − z(pFA task 2) where z is the inverse of the normal distribution function, pHits task 1 being the proportion of hits for task 1, and pFA task 2 being the proportion of false alarms for task 2, where tasks 1 and 2 are the two crossed tasks using both the same two stimulus sets but alternating their status of target and distractor ( Figure 6A). In order to plot D–T value as a function of time (D–T curve), the cumulative proportions of hits and false alarms was computed using 10 ms bins. D–T curves were plotted separately for man-made and natural stimuli ( Figure 6B). Finally, by using the response time-courses of natural and man-made scenes separately, these D–T curves could capture the time-course characteristics of the decision processes involved in this context categorization task. 
Figure 6
 
Illustration of the D–T value computed on man-made and natural stimuli for several noise levels. We introduced the D–T value (see Result) in order to provide a sensitivity measure for the same stimuli considered alternatively as target and distractor in crossed go/no-go tasks. (A) Illustration of the crossed go/no-go tasks in which the same stimuli are processed as target or distractor depending on the task. Hits and false alarms recorded in both tasks were used to compute the D–T value of man-made and natural stimuli. Hits and false alarms were taken from the same scene category, only their task status (target or distractor) were different. (B) D–T curves for natural stimuli (green) and man-made stimuli (blue), for noise levels 11–22%, 33–44%, 55–66% and 77–88%, from the thickest to the thinnest lines. Natural scenes sensitivity, as measured by D–T value, takes about 150 ms longer to reach a similar level than the one for man-made scenes. For high phase noise levels, sensitivity decreases sharply for man-made scenes. It might be that, in certain scenes, some remaining original phase information that is diagnostic for man-made scenes is used immediately by subjects to decide for a man-made response; when remaining phase information is not sufficient nor diagnostic for man-made scenes, no man-made response is produced after 400 or 600 ms. (C) Direct comparison of D–T curves for natural stimuli (green) and man-made stimuli (blue), using average performance of 55–66% and 77–88% phase noise stimuli.
Figure 6
 
Illustration of the D–T value computed on man-made and natural stimuli for several noise levels. We introduced the D–T value (see Result) in order to provide a sensitivity measure for the same stimuli considered alternatively as target and distractor in crossed go/no-go tasks. (A) Illustration of the crossed go/no-go tasks in which the same stimuli are processed as target or distractor depending on the task. Hits and false alarms recorded in both tasks were used to compute the D–T value of man-made and natural stimuli. Hits and false alarms were taken from the same scene category, only their task status (target or distractor) were different. (B) D–T curves for natural stimuli (green) and man-made stimuli (blue), for noise levels 11–22%, 33–44%, 55–66% and 77–88%, from the thickest to the thinnest lines. Natural scenes sensitivity, as measured by D–T value, takes about 150 ms longer to reach a similar level than the one for man-made scenes. For high phase noise levels, sensitivity decreases sharply for man-made scenes. It might be that, in certain scenes, some remaining original phase information that is diagnostic for man-made scenes is used immediately by subjects to decide for a man-made response; when remaining phase information is not sufficient nor diagnostic for man-made scenes, no man-made response is produced after 400 or 600 ms. (C) Direct comparison of D–T curves for natural stimuli (green) and man-made stimuli (blue), using average performance of 55–66% and 77–88% phase noise stimuli.
Several observations can be made from these data. First, deviations from zero were observed much earlier for man-made stimuli than natural ones (around 250 ms for man-made stimuli, around 380 ms for natural stimuli). Second, man-made scenes reached a higher D–T plateau than natural scenes when the phase noise was below 50%. Finally, while D–T values increased with RT response times for all noise conditions with natural category stimuli, the D–T values computed for the highest noise conditions with man-made stimuli quickly decreases to zero baseline starting at about 360 ms. This return to baseline coincided with the rebound observed for man-made scenes and the increase in D–T value observed for natural scenes ( Figure 6C). 
Thus, the earliest correct responses were primarily observed when man-made stimuli were shown. When no rapid decision was made about man-made category, subjects increasingly biased their choice toward “natural” responses from around 360 ms. 
Discussion
Context categorization performance was analyzed using both average luminance and global contrast equalized scenes, and scenes in which the Fourier's amplitude spectrum had been equalized. Results from Experiment 1 suggest that the visual system uses amplitude spectrum characteristics of man-made and natural scenes to speed up context categorization processes. Equalizing the stimuli's Fourier amplitude resulted in an accuracy drop of about 6% and a median RT increase of about 20 ms. Furthermore, analyses of d′ showed a substantial 40 ms delay at the level of the earliest responses, indicating that fast processing of context was delayed by amplitude spectrum equalization. This reduced efficiency was either due to the amplitude averaging or/and the absence of the relevant amplitude spectrum information for each stimulus. 
When the specific amplitude spectral signatures of natural and man-made environments are no longer available as diagnostic cues, context categorization processes has to rely on other details captured by higher order statistics. Experiment 2 showed that performance accuracy was insensitive to phase alteration until a threshold of 50% of phase noise, but then rapidly fell to chance level following a sigmoid curve. Despite these large accuracy differences, response time-courses showed highly similar timing characteristics for all noise conditions. 
In the two experiments, the time course of both correct-go responses and false alarms in the man-made and natural tasks suggested that processing of context categorization may be sensitive earlier to man-made than to natural information. We tested this hypothesis by introducing the D–T value that took advantage of the crossed tasks design to provide a decision sensitivity measure for each context category. Analyses using this D–T value showed that when the information necessary for categorizing a scene as man-made was available, this information was extracted very early. When no decision toward a man-made response could be rapidly made, subjects appeared to opt for natural response as a default late response. 
Accessing context category from low-level features
In their modeling work, Oliva and Torralba (2001) found that the amplitude spectrum of pictures can be used to classify them as natural scenes and man-made environments. They used the fact that pictures of man-made environments tend to have an amplitude spectrum that is elongated on the horizontal and vertical frequencies, while pictures of natural scenes have a more uniform frequency content (Baddeley, 1997; Torralba & Oliva, 2003). This kind of global description of a scene that captures an intrinsic property of the visual world could thus provide information to help categorization and recognition processes. 
Averaging Fourier power spectrum equates a large amount of global information among stimuli. In addition to equate average luminance and global RMS contrast, we thus also equated for all stimuli the energy in all directions and frequency bands; for example global properties such as a symmetry in luminance distribution would be equal in each stimulus, although the specific localization of these luminance patterns could change depending on the particular scene considered. Under these conditions in Experiment 1, any characteristics of the amplitude spectrum that could systematically help differentiating between man-made and natural environments was removed. On the other hand, averaging the amplitude spectrum independently of the phase spectrum does not imply that these two sources of information do not interact. By combining the mean amplitude spectrum of all the scenes to the phase spectrum of each of the scene, we intended to evaluate the impact of low-level image statistics equalization on natural and man-made context categorization performance. It has to be noted that the performance decrease observed between EL and ELA conditions could not be entirely caused by the equalization of categorical information encoded in the mean amplitude spectrum, but could also arise from the perturbations of the relevant amplitude spectrum information specific to each stimulus. 
The results from Experiment 1 suggest that the information contained in the amplitude spectrum of scene pictures affected early discrimination between natural and man-made environments. Furthermore, it suggests that the use of this sort of visual information can be particularly fast, since with amplitude spectrum equalized stimuli, the whole reaction time distribution was shifted toward longer latencies, starting with the fastest responses at about 300 ms. These visual processes could take place from the initial part of the sensory encoding, that relies on parallel processing common across a large range of tasks (Bacon-Macé, Kirchner, Fabre-Thorpe, & Thorpe, 2007). 
Indeed, context categorization can be performed as fast as other natural scene categorization tasks. The median RT observed in current experiments were in the same range than those observed in previous context categorization tasks with original color photographs (Joubert et al., 2007; Rousselet et al., 2005), or in object categorization tasks (animals, humans, faces, means of transport, or food objects). Such experiments imply that the processing of object and its context of presentation can progress in parallel and results in similar time-course and performance levels. They also imply that substantial interactions between object and context processing are perfectly possible (Davenport & Potter, 2004; Joubert, Fize, Rousselet, & Fabre-Thorpe, 2008; Joubert et al., 2007). 
One could thus suggest that object and context processes could share similar spectral information to determine object and scene categories. Such quickly available information could account for the fast interference between context and object processing in a rapid coarse analysis of a scene, possibly through the dorsal magnocellular pathway as proposed by Bar and collaborators (Bar, 2004; Bar et al., 2006), or using the magnocellular information processed within the ventral pathway, in accordance with previous results from our group (Macé, Thorpe, & Fabre-Thorpe, 2005). The scene gist provided by a coarse blurred representation of the contextual frame might be sufficient to guide more detailed processing of both context and object. The good resistance of context categorization processes to phase noise demonstrated in Experiment 2 reinforces this proposal. 
On the other hand, the fact that amplitude spectrum equalization significantly impairs fast context categorization supports computational theories that claim that a large amount of information can be rapidly extracted from a visual scene without the need to access elaborated high-level representations. Indeed, the present results show that this rapidly extracted information includes image features that can be described using global image statistics. Furthermore, the results from Experiment 2 further suggest that part of the phase information, that encodes edges and contours, is also rapidly used to enrich an early contextual representation used for categorization. These results are in keeping with the spatial envelope theory (Oliva & Torralba, 2001) that proposes a computational approach “that constructs a meaningful representation of scene gist directly from the low-level features pool, without binding contours to form surfaces, or surfaces to form objects” (Oliva, 2005, p. 256). 
What other cues than low-order statistics are used in context categorization?
Using images with equalized amplitude spectra, we showed that the visual system encoded the global information of the amplitude spectrum as an early cue that can be used to speed up the processes underlying categorization of scene context. However, despite these unusual visual stimuli, subjects were still able to perform well; the information encoded by the amplitude spectrum cannot be the only source of information for performing the task. Other scene properties that take into account edges and contours, encoded in the phase spectrum, were also involved in rapid context categorization. 
Experiment 2 was designed to estimate how context categorization performance relied on phase when amplitude spectra were equalized. Under these conditions performance accuracy was remarkably insensitive to phase alteration of less than 50%, and did not follow the amount of original phase that decreased linearly, but rather followed a sharp sigmoid curve that characterizes categorical processes. This latter observation is consistent with the results from a recent study on scene perception, in which subjects had to decide whether a scene was natural in the presence of phase noise (Einhäuser et al., 2006). 
A similar sigmoid function that depicts categorization processes was also observed in a recent experiment that used context categorization of natural and man-made scenes blurred with phase noise (Loschky & Larson, 2008). In their experiment, Loschky et al. reported a greater sensitivity of performance accuracy to phase noise than seen in the present experiments. Their results showed substantial performance impairments with 40% of phase noise, much more than the mild decrease observed in the present results for the same noise level. This discrepancy cannot be explained by the force-choice task modality they used, since in the present study the control forced-choice task gave similar results than the go/no-go one. Rather, the difference could be related to the stimuli used. In their study, the man-made and natural stimuli sets were not amplitude spectrum equalized, and this might well enhance the effect of phase degradation: when original phase was predominant, reconstruction using the original amplitude spectrum would be expected to lead to high accuracy performance, which was indeed the case (95% correct vs. our 82%). On the other hand, the relevance of the original amplitude would decrease with phase noise increase, resulting in an impairment that could be observed at a low level of phase noise. Despite this probable overestimate of the role of phase information in context categorization tasks, their results are qualitatively compatible with the accuracy levels observed in the present study. 
In contrast, Wichmann et al. ( 2006) reported a gradual decrease of performance with phase noise in an Animal/Non-animal rapid categorization task. The performance accuracy they reported could be described more precisely as a sigmoid curve with an inflexion point around 130° of phase randomization, that corresponds to 75% of the maximum noise level (their Figure 5). Their results thus show a much stronger resistance to phase degradation in the case of object categorization than the present context categorization task; a possible explanation could be that the diagnostic information for object categorization, compared to the one of context, could be encoded in a wider range of contour orientations or spatial frequencies. It would be interesting to compare directly, using the same phase alteration technique, the resistance of both context and object categorizations to the addition of phase noise. 
Finally, in an interesting attempt to characterize what information could be diagnostic in the phase spectrum to categorize basic-level scenes, McCotter, Gosselin, Sowden, and Schyns (2005) used a technique of selective alteration of stimulus phase, which isolated the regions of the phase spectrum that maximized categorization accuracy in their observers. Their results showed that the diagnostic bandwidths of the scene spectra for all categories occurred at relatively low spatial frequency, and coincided with the characteristic amplitude spectral signatures reported by Oliva and Torralba ( 2001). They showed that different regions of the phase spectra are diagnostic for different scene categories, with a low level of overlap, suggesting that the local structures and edges described by the diagnostic phase spectra of a scene category are not common to other scene categories. They also mentioned that some visual features, contours or edges that were poorly redundant across the entire image (a coastline separating two different textures for example), could be particularly sensitive to disruption of amplitude and as a consequence could not tolerate the effect of phase noise to the same extent than other ones. These results stress the fact that scene redundancies or spectral signature provide diagnostic information on scene category, and are encoded by both amplitude and phase spectra. The amplitude spectrum encodes first and second order statistics, and represents luminance variations at each orientation and scale. The phase spectrum, that encodes higher order scene statistics and represents outlines, boundaries and edges, can also be divided in separated regions of spatial frequencies that are diagnostic for a limited number of scene categories. Despite these significant advances, more computational and experimental work is needed to quantify the visual features that are effectively used by scene categorization processes. 
The time-course of context categorization
The results of Experiment 2 provide some insight into the time-course of context categorization. We introduced the D–T value to provide a decision sensitivity measure for natural and man-made scene categories, taking advantage of the task design that switched the target-distractor status of all stimuli between tasks. D–T value was computed by using hit and false alarm responses to the same stimuli in the two tasks, thus removing any effects of stimulus differences on trials with hits and false alarms. The D–T curve analysis showed that the earliest correct responses were primarily observed following the presentation of man-made stimuli. When no rapid decision was made, subjects increasingly biased their choice toward a “natural” response after about 350 ms. While the interpretation and the significance of the D–T value will need further investigation, this measure succeeded in underlining the stimulus specificity of subject responses. 
The results suggest that when the information necessary for categorizing a scene as man-made was present, the decision threshold was reached quickly. On the other hand, if no diagnostic man-made features could be detected either because a natural stimulus was displayed, or because of a high noise level, the resulting slower information accumulation process favors a natural response decision. This two-phase categorization time-course could be specific to the particular context categorization task or the stimulus set used. However, it nevertheless suggests the existence of a categorization processing strategy based on stimulus information uptake, that could be driven by preparatory top-down signals. 
Subjects might bias their fast responses toward the category for which diagnostic elements are more readily available across noise levels. This bias could account for the different performance observed in Experiments 1 and 2 with the same ELA stimuli. These ELA stimuli represented the hardest experimental condition in Experiment 1, and subjects tended to respond to them with long response times and a bias toward “natural” responses. Conversely, in Experiment 2 when these stimuli represented the easiest condition (phase alteration), subjects tended to react rapidly to them with a bias toward “man-made” responses. Biases toward natural response have been reported in a few other studies on image perception that manipulated various experimental conditions (Fei-Fei et al., 2007; Loschky & Larson, 2008; Rousselet et al., 2005). Stimulus difficulty could be accounted for by the results of McCotter et al. ( 2005) reported earlier, who observed that some visual features were more sensitive to phase noise than others. Particularly, they mentioned that artificial scenes often contain well-defined and redundant edges that are more resistant to high levels phase noise, compared to more distributed diagnostic features of some natural scenes like textures and smoother edges. 
Acknowledgments
This work was supported by the CNRS (Centre National de la Recherche Scientifique), by the French Government (Ministère de la Recherche et de l'Enseignement Supérieur) and by the Fondation pour la Recherche Médicale. 
Commercial relationships: none. 
Corresponding author: Denis Fize. 
Email: denis.fize@cerco.ups-tlse.fr. 
Address: Centre de Recherche Cerveau et Cognition—CerCo (UMR 5549, CNRS-UPS), Faculté de Médecine de Rangueil, 133 route de Narbonne, F-31062 Toulouse Cedex 9, France. 
References
Bacon-Macé, N. Kirchner, H. Fabre-Thorpe, M. Thorpe, S. J. (2007). Effects of task requirements on rapid natural scene processing: From common sensory encoding to distinct decisional mechanisms. Journal of Experimental Psychology: Human Perception and Performance, 33, 1013–1026. [PubMed] [CrossRef] [PubMed]
Baddeley, R. J. (1997). The correlational structure of natural images and the calibration of spatial representations. Cognitive Sciences, 21, 351–372. [CrossRef]
Bar, M. (2004). Visual objects in context. Nature Reviews, Neuroscience, 5, 617–629. [PubMed] [CrossRef]
Bar, M. Kassam, K. S. Ghuman, A. S. Boshyan, J. Schmid, A. M. Dale, A. M. (2006). Top-down facilitation of visual recognition. Proceedings of the National Academy of Sciences of the United States of America, 103, 449–454. [PubMed] [Article] [CrossRef] [PubMed]
Barlow, H. B. Rosenblith, W. A. (1961). Possible principles underlying the transformations of sensory messages. Sensory communication. (pp. 217–236). New York: John Wiley & Sons.
Biederman, I. Pomerantz, M. K. J. R. (1981). On the semantics of a glance at a scene. Perceptual organization. (pp. 213–254). Hillsdale: Lawrence Erlbaum.
Biederman, I. Rabinowitz, J. C. Glass, A. L. Stacy, Jr., E. W. (1974). On the information extracted from a glance at a scene. Journal of Experimental Psychology, 103, 597–600. [PubMed] [CrossRef] [PubMed]
Campbell, F. W. Robson, J. G. (1968). Application of Fourier analysis to the visibility of gratings. The Journal of Physiology, 197, 551–566. [PubMed] [Article] [CrossRef] [PubMed]
Dakin, S. C. Hess, R. F. Ledgeway, T. Achtman, R. L. (2002). What causes non-monotonic tuning of fMRI response to noisy images? Current Biology, 12, R476–R477. [PubMed] [Article] [CrossRef] [PubMed]
Davenport, J. L. Potter, M. C. (2004). Scene consistency in object and background perception. Psychological Science, 15, 559–564. [PubMed] [CrossRef] [PubMed]
Einhäuser, W. Rutishauser, U. Frady, E. P. Nadler, S. König, P. Koch, C. (2006). The relation of phase noise and luminance contrast to overt attention in complex visual stimuli. Journal of Vision, 6, (11):1, 1148–1158, http://journalofvision.org/6/11/1/, doi:10.1167/6.11.1. [PubMed] [Article] [CrossRef] [PubMed]
Evans, K. K. Treisman, A. (2005). Perception of objects in natural scenes: Is it really attention free? Journal of Experimental Psychology: Human Perception and Performance, 31, 1476–1492. [PubMed] [CrossRef] [PubMed]
Fei-Fei, L. Iyer, A. Koch, C. Perona, P. (2007). What do we perceive in a glance of a real-world scene? Journal of Vision, 7, (1):10, 1–29, http://journalofvision.org/7/1/10/, doi:10.1167/7.1.10. [PubMed] [Article] [CrossRef] [PubMed]
Field, D. J. (1999). Wavelets, vision and the statistics of natural scenes. Philosophical Transactions of the Royal Society of London A, 357, 2527–2542. [CrossRef]
Freedman, D. J. Riesenhuber, M. Poggio, T. Miller, E. K. (2001). Categorical representation of visual stimuli in the primate prefrontal cortex. Science, 291, 312–316. [PubMed] [CrossRef] [PubMed]
Friedman, A. (1979). Framing pictures: The role of knowledge in automatized encoding and memory for gist. Journal of Experimental Psychology: General, 108, 316–355. [PubMed] [CrossRef] [PubMed]
Guyader, N. Chauvin, A. Peyrin, C. Hérault, J. Marendaz, C. (2004). Image phase or amplitude Rapid scene categorization is an amplitude-based process. Comptes Rendus Biologies, 327, 313–318. [PubMed] [CrossRef] [PubMed]
Joubert, O. R. Fize, D. Rousselet, G. A. Fabre-Thorpe, M. (2008). Early interference of context congruence on object processing in rapid visual categorization of natural scenes. Journal of Vision, 8, (13):11, 1–18, http://journalofvision.org/8/13/11, doi:10.1167/8.13.11. [PubMed] [Article] [CrossRef] [PubMed]
Joubert, O. R. Rousselet, G. A. Fize, D. Fabre-Thorpe, M. (2007). Processing scene context: Fast categorization and object interference. Vision Research, 47, 3286–3297. [PubMed] [CrossRef] [PubMed]
Kaping, D. Tzvetanov, T. Treue, S. (2007). Adaptation to statistical properties of visual scenes biases rapid categorization. Visual Cognition, 15, 12–19. [CrossRef]
Li, F. F. VanRullen, R. Koch, C. Perona, P. (2002). Rapid natural scene categorization in the near absence of attention. Proceedings of the National Academy of Sciences of the United States of America, 99, 9596–9601. [PubMed] [Article] [CrossRef] [PubMed]
Loschky, L. C. Larson, A. M. (2008). Localized information is necessary for scene categorization, including the Natural/Man-made distinction. Journal of Vision, 8, (1):4, 1–9, http://journalofvision.org/8/1/4/, doi:10.1167/8.1.4. [PubMed] [Article] [CrossRef] [PubMed]
Macé, M. J. Thorpe, S. J. Fabre-Thorpe, M. (2005). Rapid categorization of achromatic natural scenes: How robust at very low contrasts? European Journal of Neuroscience, 21, 2007–2018. [PubMed] [CrossRef] [PubMed]
Macmillan, N. A. Creelman, C. D. (2005). Detection theory: A user's guide. London: Routledge.
Marr, D. (1982). Vision. New York: W H Freeman.
McCotter, M. Gosselin, F. Sowden, P. Schyns, P. G. (2005). The use of visual information in natural scenes. Visual Cognition, 12, 938–953. [CrossRef]
Oliva, A. Itti,, L. Rees,, G. Tsotsos, J. K. (2005). Gist of the scene. Neurobiology of attention. (pp. 251–256). San Diego, CA: Elsevier.
Oliva, A. Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42, 145–175. [CrossRef]
Potter, M. C. (1976). Short-term conceptual memory for pictures. Journal of Experimental Psychology: Human Learning and Memory, 2, 509–522. [PubMed] [CrossRef] [PubMed]
Potter, M. C. Faulconer, B. A. (1975). Time to understand pictures and words. Nature, 253, 437–438. [PubMed] [CrossRef] [PubMed]
Rensink, R. A. (2000). Seeing, sensing, and scrutinizing. Vision Research, 40, 1469–1487. [PubMed] [CrossRef] [PubMed]
Rousselet, G. A. Fabre-Thorpe, M. Thorpe, S. J. (2002). Parallel processing in high-level categorization of natural images. Nature Neuroscience, 5, 629–630. [PubMed] [PubMed]
Rousselet, G. A. Joubert, O. R. Fabre-Thorpe, M. (2005). How long to get to the “gist” of real-world natural scenes? Visual Cognition, 12, 852–877. [CrossRef]
Rousselet, G. A. Thorpe, S. J. Fabre-Thorpe, M. (2004). Processing of one, two or four natural scenes in humans: The limits of parallelism. Vision Research, 44, 877–894. [PubMed] [CrossRef] [PubMed]
Schyns, P. G. Oliva, A. (1994). From blobs to boundary edges: Evidence for time- and spatial-scale-dependent scene recognition. Psychological Sciences, 5, 195–200. [CrossRef]
Serre, T. Oliva, A. Poggio, T. (2007). A feedforward architecture accounts for rapid categorization. Proceedings of the National Academy of Sciences of the United States of America, 104, 6424–6429. [PubMed] [Article] [CrossRef] [PubMed]
Thorpe, S. Delorme, A. Van Rullen, R. (2001). Spike-based strategies for rapid processing. Neural Networks, 14, 715–725. [PubMed] [CrossRef] [PubMed]
Thorpe, S. Fize, D. Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381, 520–522. [PubMed] [CrossRef] [PubMed]
Torralba, A. Oliva, A. (2003). Statistics of natural image categories. Network, 14, 391–412. [PubMed] [CrossRef] [PubMed]
VanRullen, R. (2007). The power of the feed-forward sweep. Advances in Cognitive Psychology, 3, 167–176. [CrossRef]
VanRullen, R. Reddy, L. Koch, C. (2004). Visual search and dual tasks reveal two distinct attentional resources. Journal of Cognitive Neuroscience, 16, 4–14. [PubMed] [CrossRef] [PubMed]
Vogel, J. Schwaninger, A. Wallraven, C. Bulthoff, W. H. (2006). Categorization of natural scenes: Local vs. global information.
Walker, S. Stafford, P. Davis, G. (2008). Ultra-rapid categorization requires visual attention: Scenes with multiple foreground objects. Journal of Vision, 8, (4):21, 1–12, http://journalofvision.org/8/4/21/, doi:10.1167/8.4.21. [PubMed] [Article] [CrossRef] [PubMed]
Westheimer, G. (2001). The Fourier theory of vision. Perception, 30, 531–541. [PubMed] [CrossRef] [PubMed]
Wichmann, F. A. Braun, D. I. Gegenfurtner, K. R. (2006). Phase noise and the classification of natural images. Vision Research, 46, 1520–1529. [PubMed] [CrossRef] [PubMed]
Figure 1
 
Examples of stimuli used in Experiments 1 and 2. All stimuli were computed from gray-level pictures of man-made and natural scenes. In both experiments, subjects performed a go/no-go task in which man-made and natural scenes were alternatively targets or distractors. (A) In Experiment 1, stimuli were divided in different subsets: original scenes (O), global contrast and luminance equalized scenes (EL), amplitude spectrum equalized scenes (ELA). In Experiment 2, stimuli were ELA scenes that included a variable percentage of phase noise, from 0 to 100% with increments of 11%. In red the two conditions at which subjects' performance dropped to chance level. (B) Average power spectral signatures of man-made EL stimuli (blue), ELA stimuli (black) and natural EL stimuli (green). The contour plots represent 3 levels of energy of each spectral signature: the contour is selected so that the sum of the components inside the section represents 60, 80, and 90% of the total energy (energy is obtained by adding the square of the Fourier components; units are in cycles per pixel).
Figure 1
 
Examples of stimuli used in Experiments 1 and 2. All stimuli were computed from gray-level pictures of man-made and natural scenes. In both experiments, subjects performed a go/no-go task in which man-made and natural scenes were alternatively targets or distractors. (A) In Experiment 1, stimuli were divided in different subsets: original scenes (O), global contrast and luminance equalized scenes (EL), amplitude spectrum equalized scenes (ELA). In Experiment 2, stimuli were ELA scenes that included a variable percentage of phase noise, from 0 to 100% with increments of 11%. In red the two conditions at which subjects' performance dropped to chance level. (B) Average power spectral signatures of man-made EL stimuli (blue), ELA stimuli (black) and natural EL stimuli (green). The contour plots represent 3 levels of energy of each spectral signature: the contour is selected so that the sum of the components inside the section represents 60, 80, and 90% of the total energy (energy is obtained by adding the square of the Fourier components; units are in cycles per pixel).
Figure 2
 
Average performance in Experiment 1. For the three conditions O, EL, ELA described in Figure 1, green bars indicate average subject performance in the natural categorization task (targets: natural scenes, distractors: man-made scenes), and blue bars indicate average performance in the man-made categorization task (targets: man-made scenes, distractors: natural scenes) in the three conditions O, EL, ELA described in Figure 1. Error bars indicate the standard errors of the mean across subjects. (A) Average global accuracy for the 12 subjects. (B) Average median RT for correct-go responses (ms). (C) Average accuracy for correct-go responses (in %). (D) Average false alarm rate. Asterisks indicate statistically significant differences between conditions. Note that the ELA condition was associated with a systematic accuracy drop, longer reaction times in both categorization tasks. Moreover, in the ELA condition, subjects tend to bias their responses toward the natural category.
Figure 2
 
Average performance in Experiment 1. For the three conditions O, EL, ELA described in Figure 1, green bars indicate average subject performance in the natural categorization task (targets: natural scenes, distractors: man-made scenes), and blue bars indicate average performance in the man-made categorization task (targets: man-made scenes, distractors: natural scenes) in the three conditions O, EL, ELA described in Figure 1. Error bars indicate the standard errors of the mean across subjects. (A) Average global accuracy for the 12 subjects. (B) Average median RT for correct-go responses (ms). (C) Average accuracy for correct-go responses (in %). (D) Average false alarm rate. Asterisks indicate statistically significant differences between conditions. Note that the ELA condition was associated with a systematic accuracy drop, longer reaction times in both categorization tasks. Moreover, in the ELA condition, subjects tend to bias their responses toward the natural category.
Figure 3
 
Categorization time-course in Experiment 1. Left column: RT distributions for correct go-responses (thick curves) and false alarms (thin curves) expressed over time using 20 ms bins and RT pooled across all subjects. Minimal RTs were determined as the first 10 ms bin for which correct responses significantly exceeded errors (targets and distractors were equally likely). Right column: d′ expressed over time using 20 ms cumulative bins. Dash lines indicate the d′ reached at the minimal RT computed for the O condition (280 ms) and is used as a reference. (A) Comparison of natural and man-made tasks using original scenes (O condition). (B) Comparison of O, EL and ELA conditions in the natural task. (C) Comparison of O, EL and ELA conditions in the man-made task. (D) Comparison of natural and man-made tasks using amplitude spectrum equalized scenes (ELA condition). With amplitude spectrum equalized stimuli, subjects remain able to perform context man-made and natural categorization tasks but a processing shift toward longer latencies can be observed in both tasks. Stimulus detectability at 280 ms is clearly lower for ELA condition than for O or EL conditions.
Figure 3
 
Categorization time-course in Experiment 1. Left column: RT distributions for correct go-responses (thick curves) and false alarms (thin curves) expressed over time using 20 ms bins and RT pooled across all subjects. Minimal RTs were determined as the first 10 ms bin for which correct responses significantly exceeded errors (targets and distractors were equally likely). Right column: d′ expressed over time using 20 ms cumulative bins. Dash lines indicate the d′ reached at the minimal RT computed for the O condition (280 ms) and is used as a reference. (A) Comparison of natural and man-made tasks using original scenes (O condition). (B) Comparison of O, EL and ELA conditions in the natural task. (C) Comparison of O, EL and ELA conditions in the man-made task. (D) Comparison of natural and man-made tasks using amplitude spectrum equalized scenes (ELA condition). With amplitude spectrum equalized stimuli, subjects remain able to perform context man-made and natural categorization tasks but a processing shift toward longer latencies can be observed in both tasks. Stimulus detectability at 280 ms is clearly lower for ELA condition than for O or EL conditions.
Figure 4
 
Average performance in Experiment 2. Subjects' average performance as a function of percentage of phase noise. The area around each data point represents 95% confidence interval computed using percentile bootstrap with 2000 resamples. When satisfactory, a fit with a psychometric function a + b/(1 + exp − ((x − c)/d)) is superimposed. (A) Average accuracy. (B) Average median RT for correct-go responses (ms). (C) Average accuracy for correct-go responses. (D) Average false alarm rate. Green (blue) data points indicate performance in the natural (man-made) go/no-go categorization task. Subplots in (A, B, C) illustrate average performance recorded in the forced-choice control task using the same stimuli than the go/no-go tasks; for the forced-choice task, green and blue lines stand thus for performance on natural and man-made stimuli, respectively. Asterisks indicate statistically significant differences between go/no-go tasks. Global accuracies on man-made and natural categorization tasks are highly similar across phase noise levels. However, subjects' responses were strongly biased toward natural category when more than 55% of phase information was altered, while the opposite response bias was observed for less than 44% of phase noise stimuli. Reaction times were 60 ms longer in the natural categorization task.
Figure 4
 
Average performance in Experiment 2. Subjects' average performance as a function of percentage of phase noise. The area around each data point represents 95% confidence interval computed using percentile bootstrap with 2000 resamples. When satisfactory, a fit with a psychometric function a + b/(1 + exp − ((x − c)/d)) is superimposed. (A) Average accuracy. (B) Average median RT for correct-go responses (ms). (C) Average accuracy for correct-go responses. (D) Average false alarm rate. Green (blue) data points indicate performance in the natural (man-made) go/no-go categorization task. Subplots in (A, B, C) illustrate average performance recorded in the forced-choice control task using the same stimuli than the go/no-go tasks; for the forced-choice task, green and blue lines stand thus for performance on natural and man-made stimuli, respectively. Asterisks indicate statistically significant differences between go/no-go tasks. Global accuracies on man-made and natural categorization tasks are highly similar across phase noise levels. However, subjects' responses were strongly biased toward natural category when more than 55% of phase information was altered, while the opposite response bias was observed for less than 44% of phase noise stimuli. Reaction times were 60 ms longer in the natural categorization task.
Figure 5
 
Categorization time-course in Experiment 2. Green (blue) lines indicate performance in the natural (man-made) go/no-go categorization task, as a function of time using RT pooled across subjects. (A) RT distributions in the natural and man-made tasks for all low phase noise level stimuli pooled together (11–44%, thick line) compared to original phase stimuli (0%, filled histogram), using 20 ms bins. (B) RT distributions, in the natural and man-made tasks, of correct-go responses (thick curves) and false alarms (thin curves), for low phase noise level stimuli (11–44%), using 20 ms bin. (C) Comparison between correct-go response RT distributions (20 ms bins) for stimuli with low phase noise level (11–44%, same as B thick curves) shown with filled histograms and the stimuli with high phase noise level (55–88%, thick curves). (D) d′ curves computed using 20 m bin for, from the thickest to the thinnest lines, stimuli with phase noise levels 11–22%, 33–44%, 55–66% and 77–88%.
Figure 5
 
Categorization time-course in Experiment 2. Green (blue) lines indicate performance in the natural (man-made) go/no-go categorization task, as a function of time using RT pooled across subjects. (A) RT distributions in the natural and man-made tasks for all low phase noise level stimuli pooled together (11–44%, thick line) compared to original phase stimuli (0%, filled histogram), using 20 ms bins. (B) RT distributions, in the natural and man-made tasks, of correct-go responses (thick curves) and false alarms (thin curves), for low phase noise level stimuli (11–44%), using 20 ms bin. (C) Comparison between correct-go response RT distributions (20 ms bins) for stimuli with low phase noise level (11–44%, same as B thick curves) shown with filled histograms and the stimuli with high phase noise level (55–88%, thick curves). (D) d′ curves computed using 20 m bin for, from the thickest to the thinnest lines, stimuli with phase noise levels 11–22%, 33–44%, 55–66% and 77–88%.
Figure 6
 
Illustration of the D–T value computed on man-made and natural stimuli for several noise levels. We introduced the D–T value (see Result) in order to provide a sensitivity measure for the same stimuli considered alternatively as target and distractor in crossed go/no-go tasks. (A) Illustration of the crossed go/no-go tasks in which the same stimuli are processed as target or distractor depending on the task. Hits and false alarms recorded in both tasks were used to compute the D–T value of man-made and natural stimuli. Hits and false alarms were taken from the same scene category, only their task status (target or distractor) were different. (B) D–T curves for natural stimuli (green) and man-made stimuli (blue), for noise levels 11–22%, 33–44%, 55–66% and 77–88%, from the thickest to the thinnest lines. Natural scenes sensitivity, as measured by D–T value, takes about 150 ms longer to reach a similar level than the one for man-made scenes. For high phase noise levels, sensitivity decreases sharply for man-made scenes. It might be that, in certain scenes, some remaining original phase information that is diagnostic for man-made scenes is used immediately by subjects to decide for a man-made response; when remaining phase information is not sufficient nor diagnostic for man-made scenes, no man-made response is produced after 400 or 600 ms. (C) Direct comparison of D–T curves for natural stimuli (green) and man-made stimuli (blue), using average performance of 55–66% and 77–88% phase noise stimuli.
Figure 6
 
Illustration of the D–T value computed on man-made and natural stimuli for several noise levels. We introduced the D–T value (see Result) in order to provide a sensitivity measure for the same stimuli considered alternatively as target and distractor in crossed go/no-go tasks. (A) Illustration of the crossed go/no-go tasks in which the same stimuli are processed as target or distractor depending on the task. Hits and false alarms recorded in both tasks were used to compute the D–T value of man-made and natural stimuli. Hits and false alarms were taken from the same scene category, only their task status (target or distractor) were different. (B) D–T curves for natural stimuli (green) and man-made stimuli (blue), for noise levels 11–22%, 33–44%, 55–66% and 77–88%, from the thickest to the thinnest lines. Natural scenes sensitivity, as measured by D–T value, takes about 150 ms longer to reach a similar level than the one for man-made scenes. For high phase noise levels, sensitivity decreases sharply for man-made scenes. It might be that, in certain scenes, some remaining original phase information that is diagnostic for man-made scenes is used immediately by subjects to decide for a man-made response; when remaining phase information is not sufficient nor diagnostic for man-made scenes, no man-made response is produced after 400 or 600 ms. (C) Direct comparison of D–T curves for natural stimuli (green) and man-made stimuli (blue), using average performance of 55–66% and 77–88% phase noise stimuli.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×