Letters are broadband visual stimuli with information useful for discrimination over a wide range of spatial frequencies. Yet, recent evidence suggests that observers use only a single, fixed spatial-frequency channel to identify letters and that the scale of that channel, in units of letter size, is determined by the size of the letter (scale dependence). We report two letter-identification experiments using critical-band masking. With sufficiently high-amplitude, low- or high-pass masking noise, observers switched to a different range of spatial frequencies for the task. Thus, letter channels are not fixed for a given letter size. When an additional white-noise masker was added to the stimulus to flatten the contrast-sensitivity function, the letter channel used by the observer still depended on letter size, further supporting the hypothesis that letter identification is scale dependent.

*f,*they will typically use a channel centered on

*f*to perform the task because that is the channel that is most sensitive to the target stimulus. However, when confronted with, e.g., a low-pass noise masker, they will shift to a channel with a higher peak spatial frequency ( Figure 3A illustrates this for the task of letter discrimination rather than grating detection). This

*off-frequency looking*implies that the subject uses a channel less sensitive to the test grating but also less responsive to the masking noise, resulting in a higher signal-to-noise ratio (Pelli, 1981; Perkins & Landy, 1991). With a low-pass masker, observers use a higher frequency channel, and with high-pass noise they use a lower frequency channel.

*E*

_{0}of a target with no masker, along with threshold energies

*E*

_{low}and

*E*

_{high}in the presence of low- and high-pass noise maskers sharing the same cutoff frequency

*f,*and

*E*

_{all}in the presence of all-pass (white) noise. If the observer used a fixed channel, and threshold depended only on the signal-to-noise ratio passed by that channel, then threshold elevations (e.g.,

*E*

_{low}

^{+}=

*E*

_{low}−

*E*

_{0}) should sum, i.e.,

*E*

_{all}

^{+}=

*E*

_{low}

^{+}+

*E*

_{high}

^{+}. On the other hand, with off-frequency looking, observers should be able to reduce threshold for the low- and high-pass maskers, so that

*E*

_{all}

^{+}>

*E*

_{low}

^{+}+

*E*

_{high}

^{+}. Majaj et al. (2002) reviewed the data of their own study as well as that of Pelli (1981) and concluded that the excess masking (at most 0.2 log units on average, or about 33% higher contrast energy required in white noise than predicted) was negligible for both letters and sine-wave gratings to have any practical importance. Observers were unable to switch channels to any useful degree. Note that this argument did not take into account the data of Perkins and Landy (1991), who found excess masking an order of magnitude larger than this for the detection of a sine-wave grating in the presence of two band-pass noise maskers with frequency ranges above and below that of the target grating. In the experiments below, we will attempt to resolve the apparent contradiction between these results. In our experiments described below, we find that excess masking increases with masking-noise level.

*equivalent input noise*(hereinafter referred to as “equivalent noise” and symbolically as

*N*

_{eq}). That is, insensitivity to a given spatial frequency can be treated as if the observer were, in fact, an ideal discriminator hampered both by noise in the visual display plus equivalent noise. This equivalent noise is high for those spatial frequencies at which the observer's sensitivity is low, and low where the sensitivity is high. The intensity of the equivalent noise at each spatial frequency is set so that an ideal observer, when presented with sine-wave gratings plus the equivalent noise, would reproduce the observer's CSF. As it is usually measured, the CSF is band pass. The high-frequency cutoff is due to low-pass filtering by the eye's optics, while the low-frequency attenuation is the result of neural processing (e.g., Banks, Geisler, & Bennett, 1987). Thus, equivalent noise is highest at low and high spatial frequencies and has a minimum where sensitivity is highest (approximately 4 cycles/deg).

*would*be beneficial to switch channels to higher spatial frequencies to improve signal-to-noise ratio.

^{2}. The standard viewing distance

*d*was 91.4 cm. In Experiment 2, some subjects also ran at distances of

*d*/3,

*d*/2, and 3

*d*(30.5, 45.7, and 274.2 cm, respectively). All stimuli were 512 × 512 pixels in size (16.0 × 16.0 cm) and were generated using Matlab 7.0 and Adobe Illustrator 10.

*SD*) of the entire stimulus. Stimuli were bright letters on a gray background masked by low- or high-pass-filtered noise.

*f*

_{ c}= 0.5, 1, 2, 5, 10, 20, and 40 cycles/letter. A low-pass noise with 80 cycles/letter cutoff frequency was used in the nominally white-noise condition. Cycle/letter is defined as cycle/deg × letter width in deg. The low-pass noise masks were produced by filtering white noise using a Butterworth filter with frequency response:

*f*denotes spatial frequency and

*f*

_{ c}is the cutoff frequency. The value

*n*= 5 was used to provide a relatively steep filter cutoff. The Butterworth filter is a standard engineering filter design that enables one to control the rate of attenuation (by assigning an appropriate value to

*n*) and minimize ringing in the filtered image. We produced the high-pass noise masks by filtering nominally white-noise masks (i.e., low-pass noise with 80 cycles/letter cutoff) with inverted versions of these filters (i.e., 1 −

*B*(

*f*)). The root-mean-squared (RMS) contrast of the white noise prior to filtering had one of four possible values: 0.2, 0.4, 0.6, and 0.8. As a result, the noise maskers had a two-sided power spectral density of 1.53 × 10

^{−5}, 6.1 × 10

^{−5}, 1.37 × 10

^{−4}, and 2.44 × 10

^{−4}deg

^{2}, respectively, as viewed from 91.4 cm.

*f*

_{ c}, using a least-squares criterion. The derivative of the best fitting curve with respect to cutoff frequency divided by

*f*provided an estimate of the power gain of the channel used to identify the letters, assuming a fixed channel was used across those conditions (Solomon & Pelli, 1994). A second, independent fit of a reversed cumulative Gaussian was made to the threshold elevation data for the high-pass noise conditions, and differentiated and divided by

*f*to provide a second channel estimate.

*f*provides an estimate of the power gain of the underlying channel. Figure 5B shows estimates of the power gains of channels estimated using the data from the low- (solid curve) and high-pass (dotted curve) noise sweeps shown in Figure 5A.

_{channel}=

*f*

_{ LP}−

*f*

_{ HP}( Figure 5B).

_{channel}is plotted as a function of noise level in Figure 6 for five subjects (solid line) as well as the group average. For each observer 2–4 estimates of Δ

_{channel}were obtained by separately fitting each set of high- and low-pass thresholds. The average and standard error of these estimates are shown in Figure 6. We also simulated the CSF-ideal observer of Chung et al. (2002) as described in detail in 2 (Figure 6, dashed lines). On each trial, the CSF-ideal observer was presented with a letter with added external masking and equivalent input noise (based on the observer's individual CSF, i.e., there were no free parameters for this prediction; 1). The CSF-ideal observer chose the letter that was most likely given the noisy input stimulus. The CSF-ideal did not possess explicit “channels” but, nevertheless, produced simulated data similar to those in Figure 5. The dashed lines in Figure 6 show the values of Δ

_{channel}estimated from the simulated CSF-ideal data. As expected (Figure 3), the CSF-ideal observer demonstrated increasing magnitude of channel switching (i.e., gave greater weight to a different set of spatial frequencies) as the masking noise power was increased.

_{channel}with increasing noise level. Majaj et al. (2002) concluded that excess masking in their experiments (at most 0.2 dB on average) was negligible. However, when described in terms of the estimated shift of the peak frequency of the spatial frequency channel used to perform the task, we find combined shifts (leftward for high-pass noise plus rightward for low-pass noise) of over 90% of the average channel frequency in high-noise conditions (Figure 7).

*noise-additivity ratio*(

*E*

_{ low}

^{+}+

*E*

_{ high}

^{+})/

*E*

_{ all}

^{+}. If observers used a fixed channel, then this ratio should equal 1. If observers switch channels to escape the effects of masking noise, this ratio should fall below 1, especially near the preferred channel's peak frequency. In Figure 8, we plot the noise additivity as a function of masking-noise power. For each noise power, we plot the noise-additivity ratio for the noise cutoff frequency that results in the smallest noise-additivity ratio, i.e., near the intersection of the low- and high-pass threshold elevation curves.

_{channel}estimates.

_{channel}was nearly zero).

^{−5}deg

^{2}at the standard viewing distance of 91.4 cm. A new white-noise pedestal was generated for each trial. The contrast of the white pedestal noise needed to be high enough to produce significant flattening of the CSF. Therefore, we chose the highest prefiltering contrast (0.5) allowed by the dynamic range of our display (the stimulus, the masking noise, and the pedestal white noise were summed), resulting in a noise power spectral density of 9.54 × 10

^{−5}deg

^{2}at the standard viewing distance of 91.4 cm. Binary noise was used to minimize clipping. Again, our nominal white-noise condition was low-pass filtered with an 80 cycles/letter cutoff. Note that an identical noise pedestal was used for the CSF measurements described in 1.

*d*), noise spectral density was reduced by a factor of 9. Thus, the flattening of the CSF required for this experiment (and demonstrated in 1) may no longer hold for the longest viewing distance. Therefore, we also included a condition at the standard viewing distance with reduced letter size on the monitor (and noise spectral density equal to other conditions run at that standard viewing distance). This is not an issue for the closer viewing distances, at which noise spectral density is stronger than at the standard viewing distance.

*except*for the scale dependence caused by the equivalent noise. However, our measurements in the presence of a white-noise pedestal argue strongly against this. The pedestal effectively flattened the CSF but did

*not*eliminate scale dependence.

*d*′

^{2}

*N*/

*E*

_{ped}

^{+}, where

*d*′ is 1.3 corresponding to our 82% correct criterion and 2IFC task,

*N*is the noise power spectral density, and

*E*

_{ped}

^{+}is the increase in threshold signal energy due to the white-noise pedestal (Pelli & Farell, 1999). This results in estimated calculation efficiencies on the order of 10% here, relatively independent of spatial frequency, which is a value consistent with the literature for this task.

*f*for which threshold signal energy was measured, we determined the equivalent noise power spectrum as

*c*

^{2}(

*f*) is the squared contrast threshold for spatial frequency

*f, C*

_{f}(

*x, y*) = (

*L*

_{f}(

*x, y*) −

*L*

_{0})/

*L*

_{0}is the local contrast of the Gabor patch,

*L*

_{f}(

*x, y*), when it is at full contrast,

*d*′ is the value of

*d*′ corresponding to the 82% threshold criterion for the 2IFC contrast detection task in 1, and

*L*

_{0}is the mean luminance of the Gabor patch.

*N*

_{ eq}(

*f*) was computed by first fitting a smooth curve given by

*f*is spatial frequency and

*p*

_{1}, …,

*p*

_{5}are free parameters constrained to be nonnegative. We extrapolated noise power from 0 to 0.5 cycle/deg and from 16 to 25.6 cycles/deg (the Nyquist frequency for our viewing conditions) based on the best fitting curve (for comparison with an alternative method, see Nandy & Tjan, 2008). For simplicity, we assumed sensitivities to be independent of orientation and retinal eccentricity, so we defined

*N*

_{eq}(

*f*

_{x},

*f*

_{y}) =

*N*

_{eq}(

*f*

_{x}and

*f*

_{y}in units of cycle/image, and

*N*

_{eq}in units of noise spectral density (noise contrast variance per unit bandwidth, in units of pixels

^{2}).

*N*

_{ ext}and

*N*

_{ eq}. The calculations are most easily described in the Fourier domain. The stimulus is

*S*(

*f*

_{ x},

*f*

_{ y}). At the current contrast level, there were five possible letter templates,

*T*

_{1}(

*f*

_{ x},

*f*

_{ y}), …,

*T*

_{5}(

*f*

_{ x},

*f*

_{ y}). In the Fourier domain, the noise consisted of independent Gaussian-distributed Fourier components, with variance of the real and imaginary components at each spatial frequency equal to

*N*(

*f*

_{ x},

*f*

_{ y}) =

*N*

_{ ext}(

*f*

_{ x},

*f*

_{ y}) +

*N*

_{ eq}(

*f*

_{ x},

*f*

_{ y}).

*T*

_{ i}such that the likelihood

*P*(

*T*

_{ i}∣

*S*) was maximized. Since the letters occurred with equal frequency, this was equivalent to a Bayesian, maximum a posteriori decision rule. We assume that the CSF-ideal observer knew the noise statistics as well as the current letter contrast. The maximum likelihood decision is easiest to describe by prewhitening the stimulus (Barrett & Myers, 2004, p. 839; Chao, Tai, Dymek, & Yu, 1980; Cook & Bernfeld, 1967). That is, we compute the whitened stimulus

*S*′(

*f*

_{x},

*f*

_{y}) =

*S*(

*f*

_{x},

*f*

_{y})/

*T*

_{i}′(

*f*

_{x},

*f*

_{y}) =

*T*

_{i}(

*f*

_{x},

*f*

_{y})/

*i*that minimized the squared distance

*d*(

*T*

_{i}′,

*S*′)

^{2}=

*T*

_{i}′(

*f*

_{x},

*f*

_{y}) −

*S*′(

*f*

_{x},

*f*

_{y}))

^{2}.