Free
Research Article  |   June 2006
Discrimination of amplitude spectrum slope in the fovea and parafovea and the local amplitude distributions of natural scene imagery
Author Affiliations
Journal of Vision June 2006, Vol.6, 3. doi:10.1167/6.7.3
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Bruce C. Hansen, Robert F. Hess; Discrimination of amplitude spectrum slope in the fovea and parafovea and the local amplitude distributions of natural scene imagery. Journal of Vision 2006;6(7):3. doi: 10.1167/6.7.3.

      Download citation file:


      © 2015 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements
Abstract

A number of studies have investigated whether human visual performance can be related to the general form of the amplitude spectra (i.e., 1/ f α) of natural scenes. Here, it is argued that there are some discrepancies in the data between some of those studies and that one possible explanation for the discrepancies may be related to differences in methodology (e.g., stimuli presented to the fovea as opposed to the parafovea). We sought to resolve some of the discrepancies with two psychophysical paradigms involving α discrimination with visual noise and natural scene image patches presented to the fovea or parafovea. Fovea–parafovea threshold differences were apparent for stimuli possessing α values <1.0, with the parafovea typically showing highest thresholds for reference α values in the 0.74–0.85 range. Both fovea and parafovea thresholds were lowest in the 1.2–1.4 range. In addition, we conducted a local amplitude distribution analysis (i.e., assessed local α) with a large set of high-resolution natural scene imagery and found that the results of that analysis provided a better account of the α discrimination thresholds for stimuli presented to the fovea as opposed to the parafovea.

General introduction
It is well accepted that the distribution of amplitude across spatial frequency in real-world scenes falls by a factor of f α. One of the more popular methods of measurement showing such a relationship is the global 2D discrete Fourier transform (DFT), where the amplitude (or power) spectra of different grayscale natural scene images have been measured in a number of studies (e.g., Billock, 2000; Field, 1987; Field & Brady, 1997; Hansen & Essock, 2005; Kretzmer, 1952; Oliva & Torralba, 2001; Ruderman & Bialek, 1994; Simoncelli & Olshausen, 2001; Tolhurst, Tadmor, & Chao, 1992; Torralba & Oliva, 2003; van der Schaaf & van Hateren, 1996). The exact exponent, α, or slope of the amplitude spectrum falloff, which characterizes natural scenes, has been the subject of much debate, but the general consensus has been that of an observed range from f−0.6 to f−1.6 (Field & Brady, 1997), with the peak of the α distribution typically falling between 0.9 and 1.2 (Burton & Moorhead, 1987; Dong & Atick, 1995; Field, 1993; Hansen & Essock, 2005; Ruderman & Bialek, 1994; Thomson & Foster, 1997; Tolhurst et al., 1992; van der Schaaf & van Hateren, 1996) and a typical average of 1.08 (Billock, 2000), suggesting a reasonable degree of scale invariance across natural scenes. The issue of determining the exact exponent is far from trivial. Specifically, different exponents of the amplitude falloff have been shown to (1) shift the peak of the human CSF (Webster & Miyahara, 1997), (2) determine the ability of humans to discriminate between different α values (Knill, Field, & Kersten, 1990; Párraga & Tolhurst, 2000; Tadmor & Tolhurst, 1994; Tolhurst & Tadmor, 1997a, 1997b), and (3) modulate the ability of humans to discriminate between changes in the content (or objects) of natural scenes (Párraga, Troscianko, & Tolhurst, 2000, 2005; Tolhurst & Tadmor, 2000). 
It has been posited that the neural processes of the visual system are optimized to exploit the second-order statistics of images—that is, the 1/ f α amplitude falloff (we use the term “second-order statistics” as specified in Klein & Tyler, 1986, not to be confused with that of Julesz, 1981, 1986; statistics or filter–rectify–filter output statistics). For example, consider Figure 1, on the left is a set of natural image patches whose α values range (from top to bottom) from relatively shallow (<1.0) to relatively steep (>1.0). On the right is a similar example constructed from a broadband visual noise image. In both examples, there is a clear perceivable transition in texture that may likely be associated with a gradual change of α. One popular psychophysical method for investigating the extent to which visual processing might exploit second-order regularities has involved varying the distribution of amplitude across spatial frequency (characterized by α) relative to a reference image's α to determine which α values are ideally suited for the effective processing of visual information. Specifically, this type of visual performance involves the relative discrimination between imagery with amplitude spectra possessing different α values (e.g., Knill et al., 1990; Tadmor & Tolhurst, 1994; Thomson & Foster, 1997; Tolhurst & Tadmor, 1997a, 1997b), and although the initial reports (e.g., Knill et al., 1990; Tadmor & Tolhurst, 1994) were similar with respect to better α discrimination for reference α values >1.0, there is a discrepancy regarding α discrimination <1.0. One possible explanation for the discrepancies between the results mentioned above could be due to the different experimental paradigms that were employed, which might have led to the testing of different visual mechanisms. However, before moving on, a brief review of the studies mentioned above is in order. 
Figure 1
 
Left: a set of natural image patches (selected from the set of stimuli used in Experiment 2) ranging in α (from top to bottom) from relatively shallow (<1.0) to relatively steep (>1.0). All patches have been equated with respect to rms contrast. Right: similar to the example on the left but was constructed from a broadband visual noise image.
Figure 1
 
Left: a set of natural image patches (selected from the set of stimuli used in Experiment 2) ranging in α (from top to bottom) from relatively shallow (<1.0) to relatively steep (>1.0). All patches have been equated with respect to rms contrast. Right: similar to the example on the left but was constructed from a broadband visual noise image.
Human visual discrimination of changes in amplitude spectrum slope
Knill et al. (1990) were the first to examine the ability of humans to discriminate between different grayscale visual noise stimuli at several reference α values (the reference α is the α assigned to the spectrum of a reference image from which changes in the α of a target image are to be discriminated). The psychophysical paradigm was a temporal two-alternative forced-choice (2AFC) task where observers were asked to indicate which of two sequentially presented noise patches (subtending either ∼1 or ∼0.75 deg visual angle, presented to the fovea) had the lower α value. For stimuli subtending 1 deg, the results showed that observers were best at detecting changes in α when the reference noise patterns had α values in the range of 1.4 to 1.8, with much higher and relatively constant thresholds across the 0.4 to 1.2 α range. It was concluded that the human visual system is “tuned” to better discriminate complex imagery that possessed global amplitude distribution falloffs characterized by α values in the 1.4 to 1.8 range. Tadmor and Tolhurst (1994) also examined α discrimination using a spatial 3AFC with either natural scene or visual noise patches (subtending ∼1 deg visual angle and presented to the parafovea). Their results showed highest thresholds for reference α values near 0.8 for both visual noise and natural scene image patches, with lower thresholds above and below 0.8. However, the reference α values used by Tadmor and Tolhurst were not the true α values of the original images, leaving open the question of whether their results would be observed if observers discriminated changes in α relative to the true α values of the natural scene stimuli. Thomson and Foster (1997) addressed this question indirectly while examining the role of the phase spectra of natural scenes in detecting changes in α. The task was a temporal 2AFC with large stimuli presented to the fovea (512 × 512 pixel natural scenes or visual noise patterns, subtending ∼10 deg visual angle). For the natural image stimuli, each reference α category tested (0.7, 1.0, and 1.3) yielded thresholds that were highest when the reference natural image's α was closest to its original value, which argues against the idea of an ideal α value for which our visual systems are most optimally tuned. 
One possible explanation for the discrepancies between the studies described above could likely be the way in which the stimulus patterns were presented to the observers. Knill et al. (1990) presented 1 deg image patches to their observers' foveas, whereas Tadmor and Tolhurst (1994) presented 1 deg image patches to their observers' parafoveas. Thus, the different pattern of results may simply result from different α discrimination sensitivities in the fovea and parafovea. However, Thomson and Foster (1997) presented very large visual noise stimuli to their observers' foveas and obtained a different pattern of results from that of Knill et al. and Tadmor and Tolhurst. Because Thomson and Foster tested α discrimination with rather large stimuli, a direct comparison cannot be made. While the potential reasons for the discrepancies between the studies described above are very interesting, we found the fovea–parafovea case to be most intriguing and, thus, decided to first focus on resolving that issue. 
Amplitude slope discrimination and the local amplitude slopes of natural scenes
Despite the differences in α discrimination reported by the studies mentioned above, all sought to relate their findings to the statistical redundancies reported in the natural scene statistics literature. One problem with that comparison is that, in general, most studies reporting descriptive statistics of the α values of grayscale natural scene imagery have been carried out with relatively large images. However, most psychophysical α discrimination experiments to date have examined the ability of humans to discriminate changes in α with very small image patches (i.e., measured central-vision-based performance). This approach makes sense when one considers that our vision is dominated by foveal vision; however, relating central vision results to the image statistics from sets of rather large images might be misleading. It would be more fitting to relate central vision psychophysical data to statistical image analyses conducted on multiple samples of small image regions (i.e., local regions). Thus, it would be useful to know if there exists a relationship between α discrimination and the distribution of local image α values. 
The goal of the current investigation was twofold.
  1.  
    Carry out two sets of psychophysical α discrimination experiments with 1 deg image patch stimuli presented either foveally or parafoveally, with the first set of experiments utilizing visual noise ( Experiment 1) image patches for stimuli (as was used by both Knill et al., 1990, and Tadmor & Tolhurst, 1994) and the second set of experiments utilizing natural scene image patches (Experiment 2) for stimuli (as used in Tadmor & Tolhurst, 1994). Should the reason for the discrepancy in the pattern of results described in the preceding section be due to different α discrimination sensitivities of the fovea and parafovea, then we would expect to find performance patterns similar to those reported by Knill et al. (1990) for the conditions where the stimuli are presented foveally and performance patterns similar to those reported by Tadmor and Tolhurst (1994) for the conditions where the stimuli are presented parafoveally.
  2.  
    Gather a large number of natural scene images and conduct a local amplitude distribution analysis (i.e., examine the α values of local regions sampled from large images) where the α values of local regions of varying size will be examined. The pattern of results obtained from the current psychophysical experiments will then be examined in conjunction with the results from the natural image local amplitude distribution analysis to determine whether the patterns of α discrimination results can be accounted for by the results from that analysis.
Experiment 1
The goal of the current experiment was to examine the ability of human observers to successfully discriminate between a given stimulus image with a given reference α value (the “reference image”) and another stimulus image with either a steeper or shallower α (the “test image”). In the current experiment, this ability was examined with broadband spatial frequency and orientation visual noise patterns as stimuli. Two different psychophysical paradigms were employed, one to assess this ability for stimuli presented to the fovea and the other to assess the same ability for stimuli presented to the parafovea. 
Method
Apparatus
Stimuli were presented with an Intel Pentium IV (3.21 GHz) processor equipped with a 1-GB RAM. Stimuli were displayed using a linearized lookup table (generated by calibrating with a UDT S370 Research Optometer) on a 22-in. Mitsubishi Diamond Pro 2070 SB CRT driven by an ASUS Extreme AX300 Graphics card with an 8-bit grayscale resolution. Maximum luminance was 100 cd/m 2, the frame rate was 120 Hz, and the resolution was 1600 × 1200 pixels. Single pixels subtended 0.013 deg visual angle (i.e., 0.82 arcmin) as viewed from 1.0 m. 
Participants
Four experienced psychophysical observers (three of whom were naive to the purpose of the study) participated in all of the experiments carried out in the current study. All participants had normal (or corrected-to-normal) vision. The age of the participants ranged between 29 and 36 years. Informed consent that was approved by the Research Ethics Board was obtained. 
Visual noise stimulus generation
All visual noise stimuli were constructed in the Fourier domain using MATLAB (version 6.5) and corresponding Image Processing (version 4.0) and Signal Processing (version 6.1) toolboxes. The visual noise stimuli were created by constructing a 512 × 512 polar matrix and assigning all coordinates the same arbitrary amplitude coefficient (except at the location of the DC component, which was assigned a zero). The result is a flat isotropic broadband spectrum (i.e., α = 0.0), referred to as the “template” amplitude spectrum. In this form, the α of the template spectrum could be adjusted by multiplying each spatial frequency's amplitude coefficient by f α. The phase spectra were constructed by assigning random values from π to − π to the different coordinates of a 512 × 512 polar matrix such that the phase spectra were odd symmetric. The noise patterns were rendered in the spatial domain by taking the inverse Fourier transform of an α-altered template amplitude spectrum and any given random phase spectrum. All noise patterns were scaled in the spatial domain to have the same grayscale mean (0.50 on a 0–1 scale) and rms contrast (0.14) to assure that the total power across all of the noise images (regardless of their assigned α value) was equivalent. From each 512 × 512 noise image, a 1 deg image patch was cropped from its center and windowed with an edge-“blurred” circle, which linearly ramped 5% of the image pixels in the proximity of the circle's edge to mean luminance (i.e., 50 cd/m 2). The above procedure was repeated for all visual noise stimuli in the current experiment. Ten different noise patches were used for each reference α. In line with Tadmor and Tolhurst (1994), stimuli were generated with respect to seven reference α values (0.4, 0.6, 0.8, 1.0, 1.2, 1.4, and 1.6). 
Psychophysical procedure
For the foveal assessment, a modified temporal 2AFC procedure was implemented and consisted of three 250-ms sequential stimulus intervals (separated by 500-ms masking pattern intervals to eliminate any potential retinal afterimage cues), where the second interval contained a reference image with one of the fixed reference α values. The end of each trial was followed by an empty display (set to mean luminance) at which time the observers were required to make a response via key-press (the duration of the response interval was unlimited). All participants were instructed to use the image viewed in the second interval as a reference for basing their discrimination judgments. The first or third interval contained the test image that possessed an amplitude spectrum with an α value either steeper or shallower than the α of the reference image. Displayed in the remaining interval was a copy of the reference image. The observers were asked to indicate which interval (first or third) contained the test image. For any given trial, all stimuli had identical phase spectra (i.e., the only difference between the reference patch and the test patch was the α value of their amplitude spectra). The ordering of the stimuli for the first and third intervals was random; all stimuli were presented binocularly. No auditory feedback was provided, and the beginning of each trial was indicated by a 1-s fixation cross (subtending 0.35 deg visual angle). Experiment sessions were blocked by reference α and controlled by a modified up–down staircase procedure. All observers were allowed to participate in practice sessions to familiarize them with the task. It should be noted that the modified 2AFC procedure mentioned above is slightly different from the procedure employed by Knill et al. (1990). The reason behind this is the rather specific nature of the type of discrimination the observers were required to perform. Knill et al.'s paradigm required observers to have a detailed internal representation of how different amplitude spectrum α values should look. Because most of the participants in the current study were not familiar with the different image qualities that arise from different α values, the inclusion of a “reminder” or reference interval was necessary. 
For the parafoveal assessment, a spatial 3AFC procedure was utilized (identical to that used in Tadmor & Tolhurst, 1994). In this paradigm, observers were instructed to fixate a cross pattern (subtending 0.35 deg visual angle) that was displayed at the center of the screen and remained until the trial was over. At the start of each trial, the fixation cross appeared alone for 1 s; afterward, it was flanked by three 1 deg stimulus patches for 250 ms (one on each side and one above the fixation cross), followed by the same masking patterns (mentioned above) for 500 ms presented at the same locations on the display as the noise patch stimuli. The end of each trial was followed by an empty display (set to mean luminance) where the observers were required to make a response via key-press (unlimited response time). For any given trial, two of the noise image stimuli were identical in that both had the same reference α value and phase spectrum. The remaining stimulus had a phase spectrum that was identical to the other two patches but with a steeper or shallower amplitude spectrum α. The task of the observers was to indicate which one of the three patterns appeared different from the other two. The distance between the center of the fixation cross and the center of each of the stimulus patches was 1 deg. As in the fovea condition, all experiment sessions were blocked by reference α and controlled by a modified up–down staircase procedure. Observers were allowed to participate in practice sessions to familiarize them with the task. 
The trial-to-trial changes of a given test image's α for either the fovea or parafovea conditions were controlled by a modified up–down staircase procedure (Leek, 2001; Macmillan & Creelman, 2005). This procedure approached the reference α from either above (i.e., initial trials had much steeper α values relative to the reference) or below (i.e., initial trials had much shallower α values relative to the reference). The α of the test stimulus was moved closer to the reference stimulus α after two sequential correct responses or moved further from the reference stimulus α after one incorrect response. This procedure targets the 50% performance level on a psychometric function ranging from 0% correct performance at chance (which was 50% correct for the 2AFC experiment and 33.3% correct for the 3AFC experiment) and 100% correct performance. The targeted performance level was the 70.7% correct mark for the current experiments. Across the trials for a given reference α block, the test image α was changed (incremented or decremented) in linear steps (0.10–0.002) until the 12th reversal, at which point the experimental block was terminated. It is worth noting that although the phase spectra of the stimuli within a given trial were identical, the phase spectrum of the stimuli displayed for each trial was randomly selected from trial to trial (thus, different images were used within one staircase); 10 different phase spectra were used for each of the 7 different reference α values tested. For any given reference α block, the average of the last 5 reversals was taken as the threshold estimate, with all blocks repeated up to four times (with each repetition conducted on a different day). Foveal and parafoveal blocks were carried out on separate days, with the type of block (fovea or parafovea) on any given day determined randomly. For each reference α, the absolute difference between the reference α and the estimated difference α threshold was calculated separately for the staircase that approached the reference α from above and for the one that approached from below. The two difference α values were then averaged together to obtain an estimate of threshold difference α discrimination for within each reference α. Those estimates were then averaged across days. 
Results and discussion
The data were analyzed with a 2 (fovea–parafovea) × 7 (reference α values) two-way repeated measures analysis of variance (ANOVA) using a reasonably conservative correction (i.e., Huynh–Feldt epsilon) to adjust the degrees of freedom (Cohen, 2001). The main effect of stimulus presentation (i.e., fovea vs. parafovea) was not significant, F(1,3) = 1.74, p > .05, indicating that, when averaged across the different reference α values, α discrimination threshold is similar in both the fovea and parafovea. There was a significant main effect of reference α, F(5,17) = 24.42, p < .001, indicating that α discrimination threshold depended on reference α. A significant interaction was observed between the fovea and parafovea conditions as a function of reference α, F(6,18) = 2.72, p < .05, indicating that the patterns of threshold data as a function of reference α for the fovea and parafovea were different. 
The results from the current experiment are shown in Figure 2. Note that the pattern of α discrimination threshold plotted as a function of reference α for both the fovea and parafovea conditions is quite similar. Specifically, there is an optimal range of reference α values over which the observers are better at making α discriminations. The optimal range across reference α values is ≥1.0, with the lowest thresholds occurring between 1.2 and 1.4. However, although both the fovea and parafovea conditions yielded results that suggest that both are tuned to the same range of α values, the parafovea condition produced discrimination thresholds that peaked at a reference α of 0.8. For the fovea condition, α discrimination thresholds are elevated for all reference α values <1.0, becoming much smaller for reference α values between 1.2 and 1.4, then rising again at 1.6. This pattern of results (i.e., the fovea condition) is most consistent with the data reported by Knill et al. (1990). For the parafovea condition, there is a peak in α discrimination thresholds at 0.8, which is consistent with the data reported by Tadmor and Tolhurst (1994). Thus, it is likely that the discrepancy between the results reported by Knill et al. (1990) and Tadmor and Tolhurst for visual noise stimulus patches could be accounted for by the fact that the stimuli from those studies were presented to portions of the retina with different α discrimination sensitivities. It should be noted that the data reported by Tadmor and Tolhurst (Figures 2 and 4 of their study) show relatively lower and equivalent α discrimination thresholds on either side of the peak, whereas the data reported in the current study, along with showing the peak at 0.8, yielded elevated thresholds for only reference α values <1.0. 
Figure 2
 
Data from Experiment 1. The seven different reference α values about which the observers made α discriminations are shown on the abscissa. The averaged differences (Δ α) between the observed α discrimination thresholds and their respective reference α values are shown on the ordinate; error bars are ±1 SEM (within subject, averaged across observers).
Figure 2
 
Data from Experiment 1. The seven different reference α values about which the observers made α discriminations are shown on the abscissa. The averaged differences (Δ α) between the observed α discrimination thresholds and their respective reference α values are shown on the ordinate; error bars are ±1 SEM (within subject, averaged across observers).
Experiment 2
The goal of the current experiment was to examine the ability of human observers to successfully discriminate between a given stimulus image with a given reference α value (the reference image) and another stimulus image with either a steeper or shallower α (the test image). In the current experiment, this ability was examined with natural scene image patches. As in Experiment 1, two different psychophysical paradigms were employed, one to assess this ability for stimuli presented to the fovea and the other to assess the same ability for stimuli presented to the parafovea. 
Method
Apparatus
The same set of apparatus used in Experiment 1 was used here. 
Participants
The same four observers who participated in Experiment 1 also participated in the this experiment. 
Natural scene stimulus generation
The natural scene stimuli used in the current experiment did not contain any carpentered content. The stimuli were selected from two different natural scene image databases. The first image set consisted of more than 1,000 RGB 1600 × 1200 pixel images and was the same image database utilized by Hansen and Essock (2004, 2005) and Hansen, Essock, Zheng, and DeFord (2003). The second image set consisted of more than 100 RGB 2560 × 1920 pixel images (from which only images that were fully in focus were selected) from the color-calibrated database maintained by the McGill Vision Research Unit (http://tabby.vision.mcgill.ca). All images from both databases were converted to grayscale using the standard National Television Standards Committee formula (i.e.,
luminosity=0.299×R(x)+0.587×G(x)+0.114×B(x)
). Before selecting the image patches for use in the current study, much consideration was given to the issues raised by Tadmor and Tolhurst (1993). Specifically, the appearance of an image that has had its amplitude spectrum altered can vary depending on the overall distribution of amplitude biases at different orientations. For example, consider an image that possesses a phase spectrum with a large amount of phase alignment at one or more orientations; the amplitude spectrum of that image will have biases in amplitude at the corresponding orientations, and often times, the local α value associated with those biased orientations will be much steeper than the nonbiased orientations. If that image is assigned an amplitude spectrum that is isotropic, the result will be an image with a large amount of noise due to the relative increase in amplitude distributed across the nonbiased orientations (see Figure 3 for further details). To deter observers from using changes in the increased level of visual noise (instead of the actual content contained in the image) as a cue for changes in α, we selected only image patches that possessed approximately equal amounts of content at all orientations for use in the current experiment. 
Figure 3
 
Schematic illustration of how the appearance of two different images that have had their amplitude spectra forced to be isotropic can vary depending on the overall distribution of “natural” amplitude biases at different orientations. Top row: from left to right, (1) a natural scene image with a predominance of vertical content in the spatial domain; (2) the amplitude spectrum of that image that has been parsed into four equal area “bow-tie” wedges (45 deg orientation bandwidth) centered on vertical, 45 deg oblique, horizontal, and 135 deg oblique—note the bias in amplitude within the horizontal bow-tie wedge indicating a bias of vertical content in the spatial domain (below that spectrum is the isotropic 1/ f spectrum used to replace the original spectrum); (3) the linear regression fits to the orientation-averaged amplitude as a function of spatial frequency for each of the four orientation bow ties. The log amplitude is shown on the ordinate, whereas log spatial frequency in cycles per picture is shown on the abscissa. Note that the fit to vertical amplitude indicates higher overall amplitude as well as a steeper α relative to the other three orientations; (4) the resultant spatial image after having been assigned the isotropic amplitude spectrum. Bottom row: identical to the top row, only with an image that originally possessed approximately equal amounts of amplitude across orientation (naturally isotropic). Note that, for the vertically biased image, its filtered version appears much more degraded (notice that the trees in the background are essentially masked) when compared with the naturally isotropic filtered image.
Figure 3
 
Schematic illustration of how the appearance of two different images that have had their amplitude spectra forced to be isotropic can vary depending on the overall distribution of “natural” amplitude biases at different orientations. Top row: from left to right, (1) a natural scene image with a predominance of vertical content in the spatial domain; (2) the amplitude spectrum of that image that has been parsed into four equal area “bow-tie” wedges (45 deg orientation bandwidth) centered on vertical, 45 deg oblique, horizontal, and 135 deg oblique—note the bias in amplitude within the horizontal bow-tie wedge indicating a bias of vertical content in the spatial domain (below that spectrum is the isotropic 1/ f spectrum used to replace the original spectrum); (3) the linear regression fits to the orientation-averaged amplitude as a function of spatial frequency for each of the four orientation bow ties. The log amplitude is shown on the ordinate, whereas log spatial frequency in cycles per picture is shown on the abscissa. Note that the fit to vertical amplitude indicates higher overall amplitude as well as a steeper α relative to the other three orientations; (4) the resultant spatial image after having been assigned the isotropic amplitude spectrum. Bottom row: identical to the top row, only with an image that originally possessed approximately equal amounts of amplitude across orientation (naturally isotropic). Note that, for the vertically biased image, its filtered version appears much more degraded (notice that the trees in the background are essentially masked) when compared with the naturally isotropic filtered image.
Both image sets were pooled into one database, and each image was Fourier transformed. All images were fit with a spatial Gaussian window prior to being subjected to the DFT to reduce any potential edge effects incurred by the DFT. The amplitude spectrum of each image was subjected to an analysis of the relative amount of amplitude bias at different orientations and spatial frequencies. Following this analysis, images that possessed approximately equal amounts of amplitude across orientation were selected (152 images) as potential sources for natural image patch stimuli (see Hansen et al., 2003, for further details regarding this procedure). 
Next, image patches (140 × 140 pixels) were sampled from random locations within each of the “naturally isotropic” images described above. After fitting the sampled image patches with a spatial Gaussian window, each patch was subjected to the same orientation bias analysis (described above) to ensure that the patches possessed the same isotropic properties as the larger image from which they were sampled. This resulted in a total of 74 image patches. Next, to obtain the α value for each of the 74 natural image patches, we filtered each image patch to be isotropic. Specifically, after each image was Fourier transformed, the average of each spatial frequency coefficient across orientation was calculated and used to replace each of the respective spatial frequency coefficients at all orientations. The next step involved selecting a vector from each image's altered spectrum, taking the logarithm of each of the amplitude coefficients and fitting that vector with a line (using standard linear regression), with the slope of that line serving as the estimate of the α value for that spectrum. Next, the 74 image patches were rank ordered with respect to their α value and placed into six nonoverlapping bins (10 images per α bin). Fourteen image patches were excluded from the image set because they did not fall “neatly” into a given α bin. The average α values of the six bins were as follows: 0.57, 0.74, 0.85, 0.95, 1.19, and 1.41 (10 image patches per α bin, see Figure 4 for examples). The descriptive statistics of the α values within each bin are summarized in Figure 5
Figure 4
 
Examples of some of the image patches used in Experiment 2. Each patch has been assigned an isotropic amplitude spectrum with an α corresponding to the bin from which the patch was assigned.
Figure 4
 
Examples of some of the image patches used in Experiment 2. Each patch has been assigned an isotropic amplitude spectrum with an α corresponding to the bin from which the patch was assigned.
Figure 5
 
A plot of the distribution of α values (located along the ordinate) of the selected natural image patches within each of the six reference α bins (located along the abscissa). Note that the image patches within each bin are very closely distributed around the average α of each bin.
Figure 5
 
A plot of the distribution of α values (located along the ordinate) of the selected natural image patches within each of the six reference α bins (located along the abscissa). Note that the image patches within each bin are very closely distributed around the average α of each bin.
Next, the selected natural scene image patches were Fourier transformed and forced to be isotropic by assigning each image with a template amplitude spectrum generated by the same procedure used for the visual noise stimuli described above. The reason for forcing the natural image patch spectra to be isotropic was twofold. (1) To eliminate the possibility of small biases in amplitude at some orientations. Because the image patches were already naturally isotropic, this procedure introduced very minimal noise (see Figures 3 and 4). (2) Using the template spectrum assures that the only difference between the visual noise stimuli and the natural scene stimuli was between their phase spectra. As in Experiment 1, the template amplitude spectra could be made to possess any given α by multiplying each spatial frequency's amplitude coefficient by f α. The natural scene stimuli were rendered in the spatial domain by taking the inverse Fourier transform of the amplitude spectrum (assigned any given α value) and the phase spectrum obtained from the initial Fourier transform for each image. All filtered natural image patches were scaled to have the same rms contrast (0.14) and grayscale mean (0.5). The resultant image patches were then windowed with an edge-blurred circle. This method of stimulus generation differs from that used by Tadmor and Tolhurst in that the current procedure will allow for α discrimination measurements to be made with stimuli possessing a reference α virtually identical to the images' original α values. The rationale for this deviation is the impact their method had on the relative “naturalistic” quality of the images. 
Psychophysical procedure
The psychophysical procedures carried out in Experiment 1 were also employed in the current experiment. As noted in Experiment 1, although the phase spectra of the stimuli within a given trial were identical, the phase spectrum of the stimuli displayed for each trial was randomly selected from trial to trial (thus, different images were used within one staircase); 10 different natural image patches (and, hence, 10 different phase spectra) were used for each of the six different reference α values mentioned above. The same set of 60 natural image patches was used in the fovea and parafovea conditions. 
Results and discussion
The data were analyzed with a 2 (fovea–parafovea) × 6 (reference α values) two-way repeated measures ANOVA using the same degrees of freedom correction as in Experiment 1. The main effect of stimulus presentation (i.e., fovea vs. parafovea) was not significant, F(1,3) = 8.1, p > .05, indicating that, when averaged across the different reference α values, α discrimination threshold is similar in both the fovea and parafovea. There was a significant main effect of reference α, F(3,10) = 5.27, p < .05, indicating that α discrimination threshold for the natural image patches depended on reference α. Finally, there was a significant interaction between the fovea and parafovea conditions as a function of reference α, F(3,10) = 4.86, p < .05, indicating that the pattern of threshold data as a function of reference α for the fovea and parafovea was different. 
The results from the current experiment are shown in Figure 6. Note that the pattern of α discrimination thresholds plotted as a function of reference α for both the fovea and parafovea conditions are not as similar to each other as in Experiment 1. For the fovea data, the pattern of α discrimination thresholds shows that the optimal range was ≥0.95, which is also apparent in the parafovea data. However, although both the fovea and parafovea conditions yielded results that suggest that both are tuned to α values ≥0.95, the parafovea condition produced discrimination thresholds that peaked at reference α values between 0.74 and 0.85 and yielded lower comparable thresholds to either side of that peaked range. For the fovea condition, α discrimination thresholds are generally elevated for all reference α values ≤0.85, becoming smaller for reference α values between 0.95 and 1.41, similar to the fovea condition in Experiment 1. The fovea data are similar to those reported by Knill et al. (1990), but because they used fractal noise images, a direct comparison of the current natural scene fovea data cannot be made. For the parafovea condition, the data are quite similar to the Tadmor and Tolhurst (1994) data, in that there is a peak in α discrimination threshold in the 0.74 to 0.85 reference α range (with lower threshold α discrimination range to either side of that range). Overall, α discrimination thresholds did depend on reference α, and this dependency was not the same for stimuli presented to the fovea or parafovea. 
Figure 6
 
Data from Experiment 2. The seven different reference α values about which the observers made α discriminations are shown on the abscissa. The averaged differences (Δ α) between the observed α discrimination thresholds and their respective reference α values are shown on the ordinate; error bars are ±1 SEM (within subject, averaged across observers).
Figure 6
 
Data from Experiment 2. The seven different reference α values about which the observers made α discriminations are shown on the abscissa. The averaged differences (Δ α) between the observed α discrimination thresholds and their respective reference α values are shown on the ordinate; error bars are ±1 SEM (within subject, averaged across observers).
Natural scene analysis: Local amplitude distribution analysis
In the current analysis, we set out to investigate the relationship between the local distribution of amplitude across the spatial frequencies (i.e., local α) of an image patch as a function of the global distribution of amplitude across the spatial frequencies of the entire image (i.e., global α). 
Method
For the current analysis, fully focused (i.e., no motion blur) natural scene images containing only natural content were randomly selected from three different image databases. The details of the imaging devices can be found in the following references for each image database. We selected (1) 160 RGB 1600 × 1200 pixel images from the image database utilized by Hansen and Essock (2004, 2005) and Hansen et al. (2003), (2) 75 RGB 2560 × 1920 pixel images from the color-calibrated database maintained by the McGill Vision Research Unit (http://tabby.vision.mcgill.ca), and (3) 93 grayscale 1536 × 1024 pixel images from the “linear” image category of the van Hateren image database (van Hateren & van der Schaaf, 1998; http://hlab.phys.rug.nl/imlib/index.html). The total number of images included in the analysis was 328. All images that were in RGB format were converted to grayscale values using the same formula described earlier, with only the central 512 × 512 pixel regions from all images analyzed (this step also ensured that the images selected for analysis would not suffer from any lens distortions). Each image was linearized by applying the appropriate correction exponent to the image values (i.e., the inverse of the exponent of the camera—i.e., 2.5), except for the images from the van Hateren database, as those images were already gamma corrected. 
The analysis algorithm involved passing a sliding window filter (i.e., a “neighborhood operator”) of specified size (in pixels) over each image in turn. For any given image, the algorithm involved the following steps: (1) taking the Fourier transform of the pixels falling inside the sliding window (after the pixels in that region were Gaussian windowed to reduce Fourier-transform-induced edge effects); (2) averaging the spectrum across orientation for each spatial frequency and storing each value in a 1D array; (3) using linear regression, a line was fit to the log amplitudes within that array—obviously, the local DC component was not included in the local α calculation; (4) assigning the absolute value of the slope of that line to a location in a 2D data matrix corresponding to the position of the central pixel of the sliding window (relative to that pixels position in the image undergoing analysis); (5) after an entire image had been filtered, the corresponding 2D data matrix was stored for further analysis (see Figure 7 for an illustration of the algorithm steps). The edge regions of each image (prior to analysis) were reflected outside the original dimensions of each image, with the size of the reflection matching the dimensions of the sliding window (a standard image processing technique for reducing spatial-domain-filter-induced edge effects). Only the local α values within the original dimensions of each image were retained for analysis. This process was repeated for each image with four different sliding window sizes: 8 × 8, 16 × 16, 32 × 32, and 64 × 64 pixels. Essentially, this procedure amounts to a pixel-by-pixel region analysis where the local α value for each pixel in any given image was assessed based on the neighboring pixel luminance values surrounding that pixel, with the total number of surrounding pixels being determined by the size of the sliding window. For an example illustration of some of the 2D local α data matrices, see Figure 8
Figure 7
 
Schematic illustration of the local amplitude distribution analysis algorithm. From left to right: (1) an arbitrary grid representing the pixels of a given natural scene image. Each cell represents the location of the pixels for that image. The black outlines represent the sliding window of the neighborhood operator used in the analysis, with the initial position of the sliding window shown at top right. As the window is passed across the image from top right to bottom left, the pixels falling in the window are subjected to the algorithm procedures described in Steps 2–5. The red dot indicates the central pixel of the sliding window as it is passed across the image; (2) the pixels falling within the boundaries of the sliding window are fit with a 2D spatial Gaussian to reduce Fourier-transform-induced “edge effects”; (3) the resultant amplitude spectrum of the Gaussian windowed pixels; (4) the orientation-averaged amplitude spectrum; (5) linear regression is used to fit a line to the log values of a vector from the orientation-averaged spectrum; the absolute value of the slope of that line is taken and assigned to a 2D data matrix at a position that corresponds to the central pixel of the current sliding window position.
Figure 7
 
Schematic illustration of the local amplitude distribution analysis algorithm. From left to right: (1) an arbitrary grid representing the pixels of a given natural scene image. Each cell represents the location of the pixels for that image. The black outlines represent the sliding window of the neighborhood operator used in the analysis, with the initial position of the sliding window shown at top right. As the window is passed across the image from top right to bottom left, the pixels falling in the window are subjected to the algorithm procedures described in Steps 2–5. The red dot indicates the central pixel of the sliding window as it is passed across the image; (2) the pixels falling within the boundaries of the sliding window are fit with a 2D spatial Gaussian to reduce Fourier-transform-induced “edge effects”; (3) the resultant amplitude spectrum of the Gaussian windowed pixels; (4) the orientation-averaged amplitude spectrum; (5) linear regression is used to fit a line to the log values of a vector from the orientation-averaged spectrum; the absolute value of the slope of that line is taken and assigned to a 2D data matrix at a position that corresponds to the central pixel of the current sliding window position.
Figure 8
 
On the left is an example of one of the images included in the local amplitude distribution analysis along with four color-coded 2D data matrices (on the right) obtained from the four different sliding window sizes mentioned in the text. The local α data matrices have been coded on a relative scale (relative to the other local α values within each sliding window data matrix) from shallower α values (blue) to steeper α values (red). These plots can be thought of as a spatial map of the slopes of the amplitude spectra for different local regions in the example image, assessed at four different scales.
Figure 8
 
On the left is an example of one of the images included in the local amplitude distribution analysis along with four color-coded 2D data matrices (on the right) obtained from the four different sliding window sizes mentioned in the text. The local α data matrices have been coded on a relative scale (relative to the other local α values within each sliding window data matrix) from shallower α values (blue) to steeper α values (red). These plots can be thought of as a spatial map of the slopes of the amplitude spectra for different local regions in the example image, assessed at four different scales.
Results and discussion
The analysis of the results was carried out for each of the four different sliding window sizes for each image with the following procedure. First, the absolute value of the global α for each image in the set of analysis images was obtained by the same procedure described in the natural scene image selection section of the current study ( σ of the Gaussian window applied to the images matched that of the window applied during the neighborhood operation). All 328 analysis images were then rank ordered according to the absolute value of their global α value from shallowest to steepest. Next, the rank-ordered images were split into 20 nonoverlapping bins, in α steps of 0.05 (total number of images per bin varied). Most of the global α values fell within the 0.9 to 1.29 range (in close agreement with the range described in the natural image statistics literature). Next, the average of the local α values for each of the different sliding window regions (averaged within each of the global α bins mentioned above) were plotted against their respective global image α values. Figure 9A shows the averaged local α values for each of the four different sliding window regions plotted against the global image α bins. Apparent in that figure is that the local α values tend to be steeper than the global α values of the scenes from which they were sampled. 
One possible explanation for the elevated local α values, however, is that the elevation was caused by a windowing artifact and, thus, would have nothing to do with natural scenes. Langer (2000; p. 29, Equation 1) has proposed a metric for estimating biases in the Fourier amplitude spectra induced by sampling smaller image “windows” from larger natural scene images. However, that particular bias estimator is only valid under the assumption of translation invariance. That is, the larger global image (or image ensemble for that matter) must possess similar content at all orientations and spatial frequencies at each location in the image—otherwise, any observed bias will likely be confounded by actual local structural/content differences (e.g., variations in the slope of the local/global amplitude spectrum or variations in the amplitudes at different orientations). We therefore chose to repeat the analysis on noise images (constructed with the same methods described in Experiment 1) possessing one of four different global α values within the range shown in Figure 9A. The results of that analysis yielded local α values that were elevated compared with their respective global image α values. However, the magnitude of the differences between the global α values and the local α values, as a function of global α, was different for the natural scene images compared with the control noise images. To investigate that difference thoroughly, we conducted a more extensive analysis. The rank-ordered global natural images were split into 10 nonoverlapping bins (with 32 images per bin, except for the last bin that contained 40 images). The average global α values for the different bins were 0.76, 0.84, 0.89, 0.94, 1.02, 1.09, 1.18, 1.25, 1.33, and 1.52. Next, 240 visual noise images with global α values identical to the 10 different natural scene α bins (6 noise images per α bin for each of the 4 sliding window sizes) were constructed and subjected to the local α analysis (see Figure 9B for a plot of the averaged local α values obtained from the noise images). For each of the four sliding window sizes, the average of the differences between a given image's global α and each of its local α values (i.e., each cell in the 2D local α data matrix) was calculated to obtain each image's local-to-global α difference estimate. The local-to-global difference estimates for the natural scene analysis images and the control noise images were then averaged within each of the 10 global α bins. The results are plotted in Figure 10A. Note that, for the natural scene images, there is a strong trend for greater local-to-global α differences for images with steeper global α values than for global images with shallower α values for all window sizes. For the noise images, this trend is much more prominent for the smaller window sizes, suggesting that windowing artifacts are strongest at those scales. 
Figure 9a, 9b
 
(a) Graph depicting the relationship between the global α values and the local α values. The global α values are shown on the abscissa, binned in steps of 0.05. The averaged local α values (calculated within each global α bin) from the four different sliding window sizes are shown on the ordinate; error bars are ±1 SEM within each bin. (b) Similar to Panel a, except that the local α values were estimated from the control noise image set (error bars for the noise image set did not exceed the size of the symbols).
Figure 9a, 9b
 
(a) Graph depicting the relationship between the global α values and the local α values. The global α values are shown on the abscissa, binned in steps of 0.05. The averaged local α values (calculated within each global α bin) from the four different sliding window sizes are shown on the ordinate; error bars are ±1 SEM within each bin. (b) Similar to Panel a, except that the local α values were estimated from the control noise image set (error bars for the noise image set did not exceed the size of the symbols).
Figure 10a
 
(a) Top: graphs showing the averaged local-to-global α differences for the natural scene analysis image set and the broadband noise control image set. The 10 different global α bins are shown on the abscissa, and the averaged differences between local α values and the respective global α values from the images in which they were sampled are shown on the ordinate; error bars are ±1 SEM within each bin (error for the noise image set did not exceed the size of the symbols). Both natural and noise images produced a trend for larger local-to-global α differences for images with steeper global α values for the smaller window sizes. However, only the natural images significantly showed this trend for the larger window sizes, with the most convincing example being the 64 × 64 pixel window results. (b) Bottom: plot of the ratios between the natural scene local-to-global α differences and the noise control local-to-global α differences (ordinate) as a function of the global α bins (abscissa). Note that the larger window functions are not flat and are mostly >1.0 for the 64 × 64 pixel window condition.
Figure 10a
 
(a) Top: graphs showing the averaged local-to-global α differences for the natural scene analysis image set and the broadband noise control image set. The 10 different global α bins are shown on the abscissa, and the averaged differences between local α values and the respective global α values from the images in which they were sampled are shown on the ordinate; error bars are ±1 SEM within each bin (error for the noise image set did not exceed the size of the symbols). Both natural and noise images produced a trend for larger local-to-global α differences for images with steeper global α values for the smaller window sizes. However, only the natural images significantly showed this trend for the larger window sizes, with the most convincing example being the 64 × 64 pixel window results. (b) Bottom: plot of the ratios between the natural scene local-to-global α differences and the noise control local-to-global α differences (ordinate) as a function of the global α bins (abscissa). Note that the larger window functions are not flat and are mostly >1.0 for the 64 × 64 pixel window condition.
To test the significance of the trends, we averaged the local-to-global α differences within each global α bin and split the bins into two groups, where α bins ≤1.02 formed one group and α bins >1.02 formed the other group. For the natural scene set, all of the local-to-global differences between the shallower slopes and the steeper slopes were found to be significant (two tailed) with separate independent t tests assuming unequal variances: 8 × 8 window, t(8) = 6.88, p < .001; 16 × 16 window, t(8) = 7.42, p < .001; 32 × 32 window, t(8) = 6.88, p < .001; 64 × 64 window, t(8) = 6.95, p < .001. However, for the control noise image set, the local-to-global differences were only found to be significant for the two smaller window sizes: 8 × 8 window, t(8) = 3.5, p < .05; 16 × 16 window, t(8) = 3.14, p < .05; 32 × 32 window, t(8) = 2.34, p = .078; 64 × 64 window, t(8) = 1.11, p = .32. Thus, while the natural scene image set yielded significant local-to-global differences at all window sizes, the noise image control set showed this difference at the smaller window sizes. This implies that the trends observed for smaller window sizes were likely confounded by windowing artifacts, whereas the trends observed for the larger window sizes (especially for the 64 × 64 pixel window) are more likely indicative of the local properties of natural scenes. Additionally, by taking the ratio of the local-to-global α differences (between natural scenes and noise images) for each global α bin (see Figure 10B), it is clear that the 64 × 64 pixel window analysis was least likely to suffer from windowing artifacts because most of those ratios are >1.0. What the previously mentioned trend implies is that reasonably sized local regions of natural scenes with global α values larger than 1.0 tend to have steeper local α values. After factoring out the results from the noise image controls, larger local region α values (i.e., 64 × 64 pixel regions) tend to be ∼0.13 steeper than their respective global image α values. 
Figure 10B
 
Plot of the ratios between the natural scene local-to-global α differences (ordinate) as a function of the global α bins (abscissa). Note that the larger window functions are not flat and are mostly >1.0 for the 64 × 64 pixel window condition.
Figure 10B
 
Plot of the ratios between the natural scene local-to-global α differences (ordinate) as a function of the global α bins (abscissa). Note that the larger window functions are not flat and are mostly >1.0 for the 64 × 64 pixel window condition.
General discussion
Psychophysical experiments
The results from Experiments 1 and 2 show that, regardless of the location of the stimulus presentation on the retina, the lowest α discrimination thresholds were obtained for reference α values ≥0.95, with the lowest thresholds obtained between the 1.2 to 1.4 range for the visual noise stimuli. It is also worth noting that the fovea data from both of the current experiments are quite similar (despite the somewhat narrower range of the natural image reference α values). The stimuli employed in the noise image patch and natural scene image patch experiments were identical in that the amplitude spectra of all of the stimuli were broadband and isotropic, with the only difference between the two types of stimuli being in their phase spectra. The similarity between the fovea pattern of results from both experiments suggests very little influence from the phase spectra (random or natural) of the patch stimuli. While the lack of phase effects is contradictory to the findings reported by Thomson and Foster (1997), it should be noted that the size of the stimuli used in their study was much larger than what was used here and, thus, argues for future studies investigating α discrimination as a function of both reference α and spatial extent of the stimuli. Regarding the noise and natural scene parafovea data, there is a similar trend in that both show highest α discrimination thresholds around 0.8. However, for the natural scene stimuli, the thresholds are comparable on either side of that elevated threshold range, whereas for the noise stimuli, this was not the case. Tadmor and Tolhurst (1994) looked at α discrimination thresholds for noise and natural image stimuli presented to the parafovea, and their noise and natural image data are much more similar than those reported here. 
Regardless of the retinal location of the stimuli, threshold α discrimination performance was found to be significantly dependent on reference α. In addition, the pattern of α discrimination thresholds for reference α values <1.0 significantly interacted for stimuli presented to the fovea and parafovea for both noise and natural scene patches. Specifically, although both the fovea and parafovea α discrimination thresholds were relatively higher for all reference α values ≤0.85, the parafovea pattern of results shows a peak in threshold at 0.8 for noise stimuli and between the 0.74 to 0.85 range for natural scene stimuli. Given that this fovea–parafovea α discrimination difference was observed in both psychophysical experiments carried out in the current study, it seems that the difference in dependency between α discrimination sensitivity and the range of reference α values tested here might be related to different foveal–parafoveal processes. It is not immediately apparent why parafovea α discrimination would be relatively worse for reference α values within the 0.74 to 0.85 range; nonetheless, it was present across both experiments. 
Several models based on band-limited contrast within one channel (Tadmor & Tolhurst; 1994; Tolhurst & Tadmor, 1997a, 1997b) or multiple channels (Párraga & Tolhurst, 2000; Párraga et al., 2005) have been proposed and offer some insight into what human observers are using in the stimuli to enable them to discriminate between different α values. However, these models say little about why parafovea threshold α discrimination is poor in the 0.74 to 0.85 range, why fovea α discrimination is poor for reference α values ≤0.85, or why both fovea and parafovea α discrimination is best for reference α values ≥0.95. This issue is made even more problematic when the nature of the α discrimination task itself and the different strategies one could use to successfully make α discriminations are considered. In the past, depending on the type of stimulus imagery utilized, the task of α discrimination has been likened to doing something more along the lines of traditional psychophysical discrimination tasks. Specifically, the task of α discrimination for visual noise patterns can be considered slightly different from texture discrimination, especially for reference α values <1.2. However, for reference α values >1.2, the task could be considered blur discrimination. Because there are really no sharp edges in a noise pattern, the latter consideration could be argued both ways (i.e., involving texture perception). In fact, the analogy to blur discrimination was posited by Tadmor and Tolhurst (1994) based on the perceived quality of various images forced to have different α values and noting that steeper α values made the images appear blurred. However, that is only true on a relative basis (e.g., an image can be sharply focused and still have a steep α; Field & Brady, 1997). The natural scene patches used in the current study were grouped with respect to their natural α values (meaning they were not forced to have artificially steeper α values that might cause the corresponding reference image to appear blurred), and because α discrimination was measured for test α values that approached the reference α values from above (and below), the task might still be considered a blur detection task. However, if that were the case, then one would expect that the results reported here would be flat because each image within the different reference α sets always appeared focused. 
Local image region amplitude spectrum analysis
The results of the current local region amplitude spectrum analysis revealed a strong trend for greater local-to-global α differences for images with steeper global α values (i.e., steeper than 1.0) than for images with shallower global α values (i.e., shallower than 1.0) for all window sizes. However, given the likelihood of that trend being confounded by windowing artifacts for the smaller sliding windows, only the largest of those windows (i.e., 64 × 64 pixel region analysis) should be considered seriously. After factoring out the elevation in local α present in the noise image analysis, natural scene local region α values (64 × 64 pixel regions) sampled from images with global α values steeper than 1.0 tend to be ∼0.13 steeper than their respective global image α values. This finding can be put in perspective when considered in conjunction with the set of psychophysical results reported in Experiments 1 and 2. Consider that the experiments reported here, as well as those reported by Knill et al. (1990), provide support for threshold α discrimination being best for reference α values between 1.2 and 1.4. While this optimal range is close to the typical global α reported for natural scenes (∼0.9 to ∼1.2), it is centered more to the right of that range. If α discrimination is related to general natural image statistics, this misalignment is rather puzzling. Specifically, it is reasonable to assume that the processing strategies employed by our visual systems have been shaped through our experiences in the natural environment (either on an evolutionary or developmental timescale); therefore, one could expect that a task involving α discrimination (or any other task involving human perception of real-world image statistics) would be related to the general statistical regularities contained in natural scenes. That is not to say that such a relationship must be present, but it does seem likely that some relationship could exist. Tadmor and Tolhurst (1994) have argued that it is not the optimal range of reference α thresholds, where such a relationship can be observed, but the range of highest α discrimination thresholds that best reflects this relationship. It was argued that high discrimination thresholds for reference images with α values around 0.8 suggest a high degree of “tolerance” for changes in α that might occur within that range. However, that range of elevated reference α discrimination thresholds is also misaligned with the typical global α values reported in the literature. It is quite perplexing that one key feature present in the parafovea data (the high threshold peak near 0.8) lies to the left of the typical range of natural α values, and the other key feature present in both the fovea and parafovea data (low threshold trough >1.0) lies to the right of that range. Part of this issue may be related to the fact that a relationship between α discrimination performance and the second-order statistics of natural images has been sought after by comparing α discrimination thresholds obtained with rather small stimuli and the image statistics of rather large scenes. Although the bulk of the distribution of global α values obtained from the images used in the current image analysis (∼0.9 to ∼1.29) was in agreement with the peak range typically reported in the natural image literature, the larger local region α values from images in that peak range were found to be ∼0.13 steeper. This amounts to most of the larger region local α values falling between ∼1.03 and ∼1.42, which is very close to the range in which α discrimination thresholds were at their lowest. Additionally, for the set of images used in the current image analysis, local α values were seldom observed to be shallower than 0.8. The range of elevated α discrimination thresholds observed in Experiments 1 and 2 typically fell at and below that value, which may be related to the lack of such shallow α values at the local scale. Although the above arguments fit nicely for the fovea data, they do not provide a reasonable account for portions of the parafovea data. Specifically, there were no clear indications in the current local region image analysis that could account for the high thresholds in the 0.74 to 0.85 reference α range. Furthermore, for the natural scene image patches, α discrimination thresholds were comparable for reference α values on either side of that peak. Therefore, although the local region image analysis provided a reasonable account for α discrimination thresholds in the fovea, little insight was gained regarding α discrimination threshold peak in the parafovea, thereby leaving those data open to conjecture. Given that there was reasonable agreement between the local region image analysis and the pattern of α discrimination thresholds for stimuli presented to the fovea, the final issue to address is how would using the local amplitude distributions be beneficial to the visual system. One possibility might relate to texture discrimination/texture segmentation. That is, it is fair to assume that a large number of real-world textures can be characterized by their amplitude spectra, and although natural textures can also be characterized by higher order image statistics, one cannot deny that the second-order statistics are available. Specifically, as shown in Figure 11, any given natural scene image can be successfully segmented into regions based purely on texture as defined by the local α statistics. Additionally, because the bulk of the large images used in the image analysis of the current study typically possessed local α values in the range corresponding to the lowest α discrimination thresholds reported here (with respect to the fovea data), texture segmentation based on local amplitude distribution differences is quite plausible. 
Figure 11
 
Left: an example of one of the images included in the local amplitude distribution analysis. Right: an example of the same image with a random phase spectrum but with its local α relationships preserved. Notice that although the phase spectrum has been randomized, the image is still segmentable with only the local α relationships of the original image preserved.
Figure 11
 
Left: an example of one of the images included in the local amplitude distribution analysis. Right: an example of the same image with a random phase spectrum but with its local α relationships preserved. Notice that although the phase spectrum has been randomized, the image is still segmentable with only the local α relationships of the original image preserved.
Acknowledgments
This work was supported by a grant (MT 108-18) from the Canadian Institutes of Health Research to R.F.H. We would also like to thank the Essock Laboratory (University of Louisville, Louisville, KY 40292, USA) for granting us access to their natural image database and Aaron P. Johnson and Keith A. May for their comments on earlier versions of the manuscript. Additionally, we would like to thank one of the anonymous reviewers for suggesting that the local α analysis be carried out with 1/ f noise images. 
Commercial relationships: none. 
Corresponding author: Bruce C. Hansen. 
Email: bruce.hansen@mcgill.ca. 
Address: 687 Pine Ave West Rm. H4-14, Montréal, Québec, Canada, H3a 1A1. 
References
Billock, V. A. (2000). Neural acclimation to 1/f spatial frequency spectra in natural images transduced by the human visual system. Physica D, 137, 379–391. [CrossRef]
Burton, G. J. Moorhead, I. R. (1987). Color and spatial structure in natural scenes. Applied Optics, 26, 157–170. [CrossRef] [PubMed]
Cohen, B. H. (2001). Explaining psychological statistics. New York: Wiley.
Dong, D. W. Atick, J. J. (1995). Statistics of time-varying images. Network: Computation in Neural Systems, 6, 345–358. [CrossRef]
Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A, Optics and Image Science, 4, 2379–2394. [PubMed] [CrossRef] [PubMed]
Field, D. J. Farge,, M. Hunt,, J. C. R. Vassilicos, J. C. (1993). Scale-invariance and self-similar ‘wavelet’ transforms: An analysis of natural scenes and mammalian visual systems. Wavelets, Fractals and Fourier Transforms: New Developments and New Applications..
Field, D. J. Brady, N. (1997). Visual sensitivity, blur and the sources of variability in the amplitude spectra of natural scenes. Vision Research, 37, 3367–3383. [PubMed] [CrossRef] [PubMed]
Hansen, B. C. Essock, E. A. (2004). A horizontal bias in human visual processing of orientation and its correspondence to the structural components of natural scenes. Journal of Vision, 4, (12), 1044–1060, http://journalofvision.org/4/12/5/, doi:10.1167/4.12.5. [PubMed] [Article] [CrossRef] [PubMed]
Hansen, B. C. Essock, E. A. (2005). Influence of scale and orientation on the visual perception of natural scenes. Visual Cognition, 12, 1199–1234. [CrossRef]
Hansen, B. C. Essock, E. A. Zheng, Y. DeFord, J. K. (2003). Perceptual anisotropies in visual processing and their relation to natural image statistics. Network, 14, 501–526. [PubMed] [CrossRef] [PubMed]
Julesz, B. (1981). Textons, the elements of texture perception, and their interactions. Nature, 290, 91–97. [PubMed] [CrossRef] [PubMed]
Julesz, B. (1986). Texton gradients: The texton theory revisited. Biological Cybernetics, 54, 245–251. [PubMed] [CrossRef] [PubMed]
Klein, S. A. Tyler, C. W. (1986). Phase discrimination of compound gratings: Generalized autocorrelation analysis. Journal of the Optical Society of America A, Optics and Image Science, 3, 868–879. [PubMed] [CrossRef] [PubMed]
Knill, D. C. Field, D. Kersten, D. (1990). Human discrimination of fractal images. Journal of the Optical Society of America A, Optics and Image Science, 7, 1113–1123. [PubMed] [CrossRef] [PubMed]
Kretzmer, E. R. (1952). The statistics of television signals. Bell System Technical Journal, 31, 751–763. [CrossRef]
Langer, M. S. (2000). Large-scale failures of f− α scaling in natural image spectra. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 17, 28–33. [PubMed] [CrossRef] [PubMed]
Leek, M. R. (2001). Adaptive procedures in psychophysical research. Perception and Psychophysics, 63, 1279–1292. [PubMed] [Article] [CrossRef] [PubMed]
Macmillan, N. A. Creelman, C. D. (2005). Detection theory: A user's guide. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Oliva, A. Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42, 145–175. [CrossRef]
Párraga, C. A. Tolhurst, D. J. (2000). The effect of contrast randomisation on the discrimination of changes in the slopes of the amplitude spectra of natural scenes. Perception, 29, 1101–1116. [PubMed] [CrossRef] [PubMed]
Párraga, C. A. Troscianko, T. Tolhurst, D. J. (2000). The human visual system is optimized for processing the spatial information in natural visual images. Current Biology, 10, 35–38. [PubMed] [Article] [CrossRef] [PubMed]
Párraga, C. A. Troscianko, T. Tolhurst, D. J. (2005). The effects of amplitude–spectrum statistics on foveal and peripheral discrimination of changes in natural images and a multi-resolution model. Vision Research, 45, 3145–3168. [PubMed] [CrossRef] [PubMed]
Ruderman, D. L. Bialek, W. (1994). Statistics of natural images: Scaling in the woods. Physical Review Letters, 73, 814–817. [PubMed] [CrossRef] [PubMed]
Simoncelli, E. P. Olshausen, B. A. (2001). Natural image statistics and neural representation. Annual Review of Neuroscience, 24, 1193–1216. [PubMed] [CrossRef] [PubMed]
Tadmor, Y. Tolhurst, D. J. (1993). Both the phase and amplitude spectrum may determine the appearance of natural images. Vision Research, 33, 141–145. [PubMed] [CrossRef] [PubMed]
Tadmor, Y. Tolhurst, D. J. (1994). Discrimination of changes in the second-order statistics of natural and synthetic images. Vision Research, 34, 541–554. [PubMed] [CrossRef] [PubMed]
Thomson, M. G. A. Foster, D. H. (1997). Role of second- and third-order statistics in the dicriminability of natural images. Journal of the Optical Society of America A, Optics and Image Science, 14, 2081–2090. [Article] [CrossRef]
Tolhurst, D. J. Tadmor, Y. (1997a). Band-limited contrast in natural images explains the detectability of changes in the amplitude spectra. Vision Research, 37, 3203–3215. [PubMed] [CrossRef]
Tolhurst, D. J. Tadmor, Y. (1997b). Discrimination of changes in the slopes of the amplitude spectra of natural images: Band-limited contrast and psychometric functions. Perception, 26, 1011–1025. [PubMed] [CrossRef]
Tolhurst, D. J. Tadmor, Y. (2000). Discrimination of spectrally blended natural images: Optimisation of the human visual system for encoding natural images. Perception, 29, 1087–1100. [PubMed] [CrossRef] [PubMed]
Tolhurst, D. J. Tadmor, Y. Chao, T. (1992). Amplitude spectra of natural images. Ophthalmic and Physiological Optics, 12, 229–232. [PubMed] [CrossRef] [PubMed]
Torralba, A. Oliva, A. (2003). Statistics of natural image categories. Network, 14, 391–412. [PubMed] [CrossRef] [PubMed]
van der Schaaf, A. van Hateren, J. H. (1996). Modelling the power spectra of natural images: Statistics and information. Vision Research, 36, 2759–2770. [PubMed] [CrossRef] [PubMed]
van Hateren, J. H. van der Schaaf, A. (1998). Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings: Biological Sciences/The Royal Society, 265, 359–366. [PubMed] [Article] [CrossRef] [PubMed]
Webster, M. A. Miyahara, E. (1997). Contrast adaptation on the spatial structure of natural images. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 14, 2355–2366. [PubMed] [CrossRef] [PubMed]
Figure 1
 
Left: a set of natural image patches (selected from the set of stimuli used in Experiment 2) ranging in α (from top to bottom) from relatively shallow (<1.0) to relatively steep (>1.0). All patches have been equated with respect to rms contrast. Right: similar to the example on the left but was constructed from a broadband visual noise image.
Figure 1
 
Left: a set of natural image patches (selected from the set of stimuli used in Experiment 2) ranging in α (from top to bottom) from relatively shallow (<1.0) to relatively steep (>1.0). All patches have been equated with respect to rms contrast. Right: similar to the example on the left but was constructed from a broadband visual noise image.
Figure 2
 
Data from Experiment 1. The seven different reference α values about which the observers made α discriminations are shown on the abscissa. The averaged differences (Δ α) between the observed α discrimination thresholds and their respective reference α values are shown on the ordinate; error bars are ±1 SEM (within subject, averaged across observers).
Figure 2
 
Data from Experiment 1. The seven different reference α values about which the observers made α discriminations are shown on the abscissa. The averaged differences (Δ α) between the observed α discrimination thresholds and their respective reference α values are shown on the ordinate; error bars are ±1 SEM (within subject, averaged across observers).
Figure 3
 
Schematic illustration of how the appearance of two different images that have had their amplitude spectra forced to be isotropic can vary depending on the overall distribution of “natural” amplitude biases at different orientations. Top row: from left to right, (1) a natural scene image with a predominance of vertical content in the spatial domain; (2) the amplitude spectrum of that image that has been parsed into four equal area “bow-tie” wedges (45 deg orientation bandwidth) centered on vertical, 45 deg oblique, horizontal, and 135 deg oblique—note the bias in amplitude within the horizontal bow-tie wedge indicating a bias of vertical content in the spatial domain (below that spectrum is the isotropic 1/ f spectrum used to replace the original spectrum); (3) the linear regression fits to the orientation-averaged amplitude as a function of spatial frequency for each of the four orientation bow ties. The log amplitude is shown on the ordinate, whereas log spatial frequency in cycles per picture is shown on the abscissa. Note that the fit to vertical amplitude indicates higher overall amplitude as well as a steeper α relative to the other three orientations; (4) the resultant spatial image after having been assigned the isotropic amplitude spectrum. Bottom row: identical to the top row, only with an image that originally possessed approximately equal amounts of amplitude across orientation (naturally isotropic). Note that, for the vertically biased image, its filtered version appears much more degraded (notice that the trees in the background are essentially masked) when compared with the naturally isotropic filtered image.
Figure 3
 
Schematic illustration of how the appearance of two different images that have had their amplitude spectra forced to be isotropic can vary depending on the overall distribution of “natural” amplitude biases at different orientations. Top row: from left to right, (1) a natural scene image with a predominance of vertical content in the spatial domain; (2) the amplitude spectrum of that image that has been parsed into four equal area “bow-tie” wedges (45 deg orientation bandwidth) centered on vertical, 45 deg oblique, horizontal, and 135 deg oblique—note the bias in amplitude within the horizontal bow-tie wedge indicating a bias of vertical content in the spatial domain (below that spectrum is the isotropic 1/ f spectrum used to replace the original spectrum); (3) the linear regression fits to the orientation-averaged amplitude as a function of spatial frequency for each of the four orientation bow ties. The log amplitude is shown on the ordinate, whereas log spatial frequency in cycles per picture is shown on the abscissa. Note that the fit to vertical amplitude indicates higher overall amplitude as well as a steeper α relative to the other three orientations; (4) the resultant spatial image after having been assigned the isotropic amplitude spectrum. Bottom row: identical to the top row, only with an image that originally possessed approximately equal amounts of amplitude across orientation (naturally isotropic). Note that, for the vertically biased image, its filtered version appears much more degraded (notice that the trees in the background are essentially masked) when compared with the naturally isotropic filtered image.
Figure 4
 
Examples of some of the image patches used in Experiment 2. Each patch has been assigned an isotropic amplitude spectrum with an α corresponding to the bin from which the patch was assigned.
Figure 4
 
Examples of some of the image patches used in Experiment 2. Each patch has been assigned an isotropic amplitude spectrum with an α corresponding to the bin from which the patch was assigned.
Figure 5
 
A plot of the distribution of α values (located along the ordinate) of the selected natural image patches within each of the six reference α bins (located along the abscissa). Note that the image patches within each bin are very closely distributed around the average α of each bin.
Figure 5
 
A plot of the distribution of α values (located along the ordinate) of the selected natural image patches within each of the six reference α bins (located along the abscissa). Note that the image patches within each bin are very closely distributed around the average α of each bin.
Figure 6
 
Data from Experiment 2. The seven different reference α values about which the observers made α discriminations are shown on the abscissa. The averaged differences (Δ α) between the observed α discrimination thresholds and their respective reference α values are shown on the ordinate; error bars are ±1 SEM (within subject, averaged across observers).
Figure 6
 
Data from Experiment 2. The seven different reference α values about which the observers made α discriminations are shown on the abscissa. The averaged differences (Δ α) between the observed α discrimination thresholds and their respective reference α values are shown on the ordinate; error bars are ±1 SEM (within subject, averaged across observers).
Figure 7
 
Schematic illustration of the local amplitude distribution analysis algorithm. From left to right: (1) an arbitrary grid representing the pixels of a given natural scene image. Each cell represents the location of the pixels for that image. The black outlines represent the sliding window of the neighborhood operator used in the analysis, with the initial position of the sliding window shown at top right. As the window is passed across the image from top right to bottom left, the pixels falling in the window are subjected to the algorithm procedures described in Steps 2–5. The red dot indicates the central pixel of the sliding window as it is passed across the image; (2) the pixels falling within the boundaries of the sliding window are fit with a 2D spatial Gaussian to reduce Fourier-transform-induced “edge effects”; (3) the resultant amplitude spectrum of the Gaussian windowed pixels; (4) the orientation-averaged amplitude spectrum; (5) linear regression is used to fit a line to the log values of a vector from the orientation-averaged spectrum; the absolute value of the slope of that line is taken and assigned to a 2D data matrix at a position that corresponds to the central pixel of the current sliding window position.
Figure 7
 
Schematic illustration of the local amplitude distribution analysis algorithm. From left to right: (1) an arbitrary grid representing the pixels of a given natural scene image. Each cell represents the location of the pixels for that image. The black outlines represent the sliding window of the neighborhood operator used in the analysis, with the initial position of the sliding window shown at top right. As the window is passed across the image from top right to bottom left, the pixels falling in the window are subjected to the algorithm procedures described in Steps 2–5. The red dot indicates the central pixel of the sliding window as it is passed across the image; (2) the pixels falling within the boundaries of the sliding window are fit with a 2D spatial Gaussian to reduce Fourier-transform-induced “edge effects”; (3) the resultant amplitude spectrum of the Gaussian windowed pixels; (4) the orientation-averaged amplitude spectrum; (5) linear regression is used to fit a line to the log values of a vector from the orientation-averaged spectrum; the absolute value of the slope of that line is taken and assigned to a 2D data matrix at a position that corresponds to the central pixel of the current sliding window position.
Figure 8
 
On the left is an example of one of the images included in the local amplitude distribution analysis along with four color-coded 2D data matrices (on the right) obtained from the four different sliding window sizes mentioned in the text. The local α data matrices have been coded on a relative scale (relative to the other local α values within each sliding window data matrix) from shallower α values (blue) to steeper α values (red). These plots can be thought of as a spatial map of the slopes of the amplitude spectra for different local regions in the example image, assessed at four different scales.
Figure 8
 
On the left is an example of one of the images included in the local amplitude distribution analysis along with four color-coded 2D data matrices (on the right) obtained from the four different sliding window sizes mentioned in the text. The local α data matrices have been coded on a relative scale (relative to the other local α values within each sliding window data matrix) from shallower α values (blue) to steeper α values (red). These plots can be thought of as a spatial map of the slopes of the amplitude spectra for different local regions in the example image, assessed at four different scales.
Figure 9a, 9b
 
(a) Graph depicting the relationship between the global α values and the local α values. The global α values are shown on the abscissa, binned in steps of 0.05. The averaged local α values (calculated within each global α bin) from the four different sliding window sizes are shown on the ordinate; error bars are ±1 SEM within each bin. (b) Similar to Panel a, except that the local α values were estimated from the control noise image set (error bars for the noise image set did not exceed the size of the symbols).
Figure 9a, 9b
 
(a) Graph depicting the relationship between the global α values and the local α values. The global α values are shown on the abscissa, binned in steps of 0.05. The averaged local α values (calculated within each global α bin) from the four different sliding window sizes are shown on the ordinate; error bars are ±1 SEM within each bin. (b) Similar to Panel a, except that the local α values were estimated from the control noise image set (error bars for the noise image set did not exceed the size of the symbols).
Figure 10a
 
(a) Top: graphs showing the averaged local-to-global α differences for the natural scene analysis image set and the broadband noise control image set. The 10 different global α bins are shown on the abscissa, and the averaged differences between local α values and the respective global α values from the images in which they were sampled are shown on the ordinate; error bars are ±1 SEM within each bin (error for the noise image set did not exceed the size of the symbols). Both natural and noise images produced a trend for larger local-to-global α differences for images with steeper global α values for the smaller window sizes. However, only the natural images significantly showed this trend for the larger window sizes, with the most convincing example being the 64 × 64 pixel window results. (b) Bottom: plot of the ratios between the natural scene local-to-global α differences and the noise control local-to-global α differences (ordinate) as a function of the global α bins (abscissa). Note that the larger window functions are not flat and are mostly >1.0 for the 64 × 64 pixel window condition.
Figure 10a
 
(a) Top: graphs showing the averaged local-to-global α differences for the natural scene analysis image set and the broadband noise control image set. The 10 different global α bins are shown on the abscissa, and the averaged differences between local α values and the respective global α values from the images in which they were sampled are shown on the ordinate; error bars are ±1 SEM within each bin (error for the noise image set did not exceed the size of the symbols). Both natural and noise images produced a trend for larger local-to-global α differences for images with steeper global α values for the smaller window sizes. However, only the natural images significantly showed this trend for the larger window sizes, with the most convincing example being the 64 × 64 pixel window results. (b) Bottom: plot of the ratios between the natural scene local-to-global α differences and the noise control local-to-global α differences (ordinate) as a function of the global α bins (abscissa). Note that the larger window functions are not flat and are mostly >1.0 for the 64 × 64 pixel window condition.
Figure 10B
 
Plot of the ratios between the natural scene local-to-global α differences (ordinate) as a function of the global α bins (abscissa). Note that the larger window functions are not flat and are mostly >1.0 for the 64 × 64 pixel window condition.
Figure 10B
 
Plot of the ratios between the natural scene local-to-global α differences (ordinate) as a function of the global α bins (abscissa). Note that the larger window functions are not flat and are mostly >1.0 for the 64 × 64 pixel window condition.
Figure 11
 
Left: an example of one of the images included in the local amplitude distribution analysis. Right: an example of the same image with a random phase spectrum but with its local α relationships preserved. Notice that although the phase spectrum has been randomized, the image is still segmentable with only the local α relationships of the original image preserved.
Figure 11
 
Left: an example of one of the images included in the local amplitude distribution analysis. Right: an example of the same image with a random phase spectrum but with its local α relationships preserved. Notice that although the phase spectrum has been randomized, the image is still segmentable with only the local α relationships of the original image preserved.
© 2006 ARVO
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×