Free
Research Article  |   October 2008
Efficient integration across spatial frequencies for letter identification in foveal and peripheral vision
Author Affiliations
Journal of Vision October 2008, Vol.8, 3. doi:10.1167/8.13.3
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Anirvan S. Nandy, Bosco S. Tjan; Efficient integration across spatial frequencies for letter identification in foveal and peripheral vision. Journal of Vision 2008;8(13):3. doi: 10.1167/8.13.3.

      Download citation file:


      © 2015 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements
Abstract

Objects in natural scenes are spatially broadband; in contrast, feature detectors in the early stages of visual processing are narrowly tuned in spatial frequency. Earlier studies of feature integration using gratings suggested that integration across spatial frequencies is suboptimal. Here we re-examined this conclusion using a letter identification task at the fovea and at 10 deg in the lower visual field. We found that integration across narrow-band (1-octave) spatial frequency components of letter stimuli is optimal in the fovea. Surprisingly, this optimality is preserved in the periphery, even though feature integration is known to be deficient in the periphery from studies of other form-vision tasks such as crowding. A model that is otherwise a white-noise ideal observer except for a limited spatial resolution defined by the human contrast sensitivity function and using internal templates slightly wider in bandwidth than the stimuli is able to account for the human data. Our findings suggest that deficiency in feature integration found in peripheral vision is not across spatial frequencies.

Introduction
Reading and object recognition are usually associated with the clear and sharp central vision that is afforded by the fovea. Compared to the fovea, the periphery is far less capable of these types of form vision, even after its poor spatial resolution has been compensated for by magnification and contrast enhancement. For example, reading in the periphery is laboriously slow and objects often cannot be identified in a cluttered scene. At present, we do not sufficiently understand why peripheral form vision is qualitatively inferior to that of central vision. This is not only a crucial issue for basic research but is also vital for developing effective rehabilitation regimens and adaptive technologies for patients with central vision loss. 
Past studies, particularly those on “crowding” (the inability to identify objects against a cluttered background), suggest that peripheral deficits in form vision result from inadequate selection and inappropriate integration of simple features into complex features at a relatively early stage of visual processing (Blake, Tadin, Sobel, Raissian, & Chong, 2006; He, Cavanagh, & Intriligator, 1996; Levi, Hariharan, & Klein, 2002; Nandy & Tjan, 2007; Pelli, Palomares, & Majaj, 2004). However, the specific nature of these deficits in feature selection and integration is unknown. 
A conventional starting point for investigating feature processing is to measure an observer's spatial tuning properties when performing a form-vision task. Chung, Legge, and Tjan (2002) found that for both foveal and peripheral vision, the peak spatial tuning frequency and tuning bandwidth for identifying an isolated letter can be adequately modeled by an ideal observer with a limited spatial resolution as described by the subject's contrast sensitivity function (CSF) at the test eccentricity. Chung and Tjan (2007) extended this result to letter identification under crowding. Because the ideal-observer models in these studies made use of all stimulus-level information within the imposed spatial resolution limits, their results imply that feature selection along the dimension of spatial frequency (for the purpose of letter identification) is optimal in both the fovea and the periphery. 
In the current study, we ask the complementary questions: Is feature integration across spatial frequencies efficient in the fovea and in the periphery for the identification of an isolated letter? “Feature” refers to any aspect of a stimulus that carries information relevant to a given task. A feature is therefore task dependent. A feature that is useful for detecting a target may not be useful for discriminating it from a distracter. Letters, similar to most visual forms, comprise of a broad range of spatial-frequency components that bear shape information. Efficient integration across spatial frequencies is thus a prerequisite for efficient form vision. 
Classical findings on grating detection and discrimination suggest that integration across features is inefficient when there is a large difference in the spatial frequency of the features (e.g., Graham, Robson, & Nachmias, 1978; Quick, Mullins, & Reichert, 1978). For example, the probability of detecting a compound grating that is comprised of two simple gratings with spatial frequencies more than a factor of two (one octave) apart is found to be approximately equal to the sum of the probabilities of detecting each of component gratings presented alone. This “probability summation” is consistent with the idea that each component grating is detected by a different narrowly tuned spatial frequency channel (Blakemore & Campbell, 1969; Campbell & Robson, 1968), which is “blind” to the other grating of a very different spatial frequency. 
Probability summation is essentially the absence of summation (integration). In comparison, the probability of detection for an ideal observer (a statistically optimal decision rule) is determined by the sum of the contrast energy of the component gratings. Energy summation, which is optimal in this case, predicts a much larger effect of summation than probability summation when the detectabilities of the components are comparable. Moreover, when the contrast energy of the components are sub-threshold, while their sum is supra-threshold, probability summation predicts the compound grating to be sub-threshold, while energy summation predicts it to be supra-threshold. 
Let us now return to letter identification, a more complex form-vision task. Our goal was to measure and compare how efficiently the foveal and peripheral vision systems integrate features across spatial frequencies. We wanted to determine if the form-vision deficits in the periphery are attributable to suboptimal feature integration along the spatial-frequency dimension. We adapted a typical summation paradigm. In the critical experiment, we measured contrast thresholds for identifying band-pass filtered letters at two frequencies that are two octaves apart ( f/2 and 2 f). We also measured the threshold for identifying letters composed by adding these two frequency components together. 
To preview, we found the periphery to be equally efficient at integrating features across spatial frequencies as the fovea. More surprisingly, both fovea and periphery exhibit optimal summation in the letter identification task, in sharp contrast to the sub-optimal probability summation for a simple grating detection task. The significant implications of these results for identifying form-vision deficits in the periphery and for the understanding of form-vision processes in general will be addressed in the General discussion section. 
Analysis procedure
In this section, we will describe the necessary steps to quantify the extent of integration across spatial frequencies. A broad-band stimulus like the letter “a” shown in Figure 1A can be decomposed into narrow-band components by a set of band-pass filters whose center frequencies form a progression in frequency space ( Figure 1B). 
Figure 1
 
A broadband stimulus like the letter “a” (A) can be decomposed into narrow-band components (B) by filtering the stimulus with a set of band-pass filters centered at different spatial frequencies. (C) Two narrow-band components that are two octave apart in frequency space and their composite sum are used to measure the index of integration Φ ( Equation 2) in Experiment 2. (D) Three possible outcomes of the integration index: optimal summation (left panel), sub-optimal summation (middle panel) and nonorthogonal summation (right panel).
Figure 1
 
A broadband stimulus like the letter “a” (A) can be decomposed into narrow-band components (B) by filtering the stimulus with a set of band-pass filters centered at different spatial frequencies. (C) Two narrow-band components that are two octave apart in frequency space and their composite sum are used to measure the index of integration Φ ( Equation 2) in Experiment 2. (D) Three possible outcomes of the integration index: optimal summation (left panel), sub-optimal summation (middle panel) and nonorthogonal summation (right panel).
We can then choose two narrow-band components that are sufficiently far apart in frequency space so as to have negligible overlap in their frequency contents. Moreover, if the separation is further than the typical bandwidth of a spatial-frequency channel (between 1 to 2 octaves), we can ensure that the components will be processed by different spatial frequency channels in the visual front-end. Let the center frequencies of these components be f 1 and f 2, respectively. We can form a composite stimulus by simply adding the two narrow-band components ( Figure 1C). 
Let c f be the contrast threshold for identifying a narrow-band letter at center frequency f and CS f = 1/ c f be the corresponding contrast sensitivity. If there is no overlap in the frequency domain between the components with center frequencies f 1 and f 2 (i.e., the components are orthogonal, with a zero dot product), then for an ideal observer limited by an additive Gaussian equivalent input noise, the contrast sensitivity for the composite can be predicted from those for the components ( 3):  
C S f 1 + f 2 2 = C S f 1 2 + C S f 2 2
(1)
To quantify a visual system's ability to integrate information across spatial frequencies, we define an index of integration Φ as follows:  
Φ = C S f 1 + f 2 2 C S f 1 2 + C S f 2 2
(2)
 
This index has a simple interpretation if we assume that the visual system is like a Gaussian-noise-limited maximum a posteriori ideal observer but differs from it by making use of only a constant fraction of the signal-to-noise ratio of the stimulus (Legge, Kersten, & Burgess, 1987; Peli, 1990; Tjan, Braje, Legge, & Kersten, 1995; see also 3). If the component stimuli are handled independently by the visual system and if the information across frequency channels is optimally combined, then Φ will be equal to 1. If on the other hand the information across the frequency channels is not optimally combined, Φ will be less than 1. 
There exists a third possibility where Φ can be greater than 1. In this case, even though the component stimuli are far apart in frequency space, the underlying spatial mechanisms utilized for the task are not independent. Example scenarios include inappropriately broad channels for the orthogonal components and performance-dominating “late” noise situated after the outputs of the spatial mechanisms have been combined. In the first scenario, thresholds for identifying the orthogonal components in the stimulus are elevated by the use of inappropriately broad channels, which admit more noise from positions or spatial frequencies where there is no signal. The integration index is greater than 1.0 not because the composite is more efficiently identified, but because the components are poorly identified. In the second scenario, the threshold for the composite is lower than expected because there is only one dominant noise source as opposed to two when the components are combined. 
The three categories of the integration index are summarized as follows:  
Φ { < 1 s u b o p t i m a l i n t e g r a t i o n = 1 o p t i m a l i n t e g r a t i o n > 1 n o n o r t h o g o n a l i n t e g r a t i o n
(3)
 
We note that this interpretation of the integration index is contingent on the validity of Equation 1, which in turns relies on the observer being a maximum a posteriori observer with additive Gaussian equivalent input noise. The derivation of the base case for Equation 1 relies on the linear proportionality between signal contrast and the Euclidean distance between pairs of alternatives in the internal decision space. For a 2-way discrimination task, this means that we are assuming that the psychometric function of d′ vs. contrast is linear, which has been shown to be the case for a contrast discrimination task when the target contrast is above detection threshold (“supra-threshold”) (Legge et al., 1987). The details of this derivation are provided in 3. 3 also shows that this base case can be generalized to other types of observer models for which the psychometric function is not linear. In particularly, we are able to show that Equation 1 holds either analytically or is a good approximation for observers with contrast-dependent noise (“multiplicative” noise) and in supra-threshold conditions for observers with a nonlinear transducer. 
We also note that in so far as Equation 1 is valid, the interpretation for the cases of Φ < 1 and Φ > 1 are unequivocal, as described in Equation 3. However, there exist conditions such that Φ = 1 does not necessarily imply optimal feature integration. For example, a visual system can have an integration index of 1 or close to 1 by having nonorthogonal spatial mechanisms for the components and a suboptimal integration mechanism to combine the results. Of course, in this example, the nonorthogonality would have to precisely balance out the suboptimal integration. Furthermore, if the primary goal of measuring Φ is to compare feature integration performance across experimental conditions (in our case, foveal vs. peripheral vision), the ambiguity in the interpretation of Φ = 1 is less of a concern. 
To empirically distinguish between the three cases of feature integration described in Equation 3, the choice of the center frequencies, f 1 and f 2, of the component stimuli is important. If there is a large difference in contrast sensitivity between the components, then it follows from Equation 1 that the sensitivity for the composite under optimal integration will be very similar to the sensitivity for the more sensitive component, making it difficult to distinguish between optimal and sub-optimal integration. We chose to address this issue by first measuring an observer's spatial tuning function (contrast sensitivity vs. stimuli center frequency) for letter identification. Previous work (Chung et al., 2002) has shown that such a tuning function is roughly symmetric about the peak tuning frequency when frequency is expressed in log units. Given the tuning function, we chose two components of roughly equal sensitivity by selecting their center frequencies as plus and minus one octave from the peak tuning frequency: 
flow=fpeak/2fhigh=fpeak×2
(4)
The two center frequencies are therefore two octaves apart. To ensure orthogonality at the stimulus level, we used components with a bandwidth of one octave. With this arrangement, we expect the components to be encoded by different spatial frequency channels since the typical channel bandwidth is believed to be between 1 and 2 octaves (Blakemore & Campbell, 1969; Stromeyer & Julesz, 1972). The composite stimulus was constructed by summing the two components. 
Methods
Experiments were conducted at the fovea and at 10 deg in the lower right quadrant of the visual field. Two experiments were conducted. In both the experiments, 26 lowercase filtered target letters were presented in a letter identification task. The first experiment was to measure the spatial tuning functions; the second experiment was to measure the index of integration. In addition to the two main experiments, for the purpose of ideal-observer analysis, we also measured the contrast sensitivity function (CSF) with static sinusoidal gratings at both the fovea and 10 deg. 
Subjects
Four subjects (ASN, BW, PLB, and JS), including one of the authors, participated in the experiments. All subjects had normal or corrected to normal vision and three of the subjects (BW, PLB, and JS) were naïve to the purpose of the study. All had (corrected) acuity of 20/20 or better in both eyes. Subjects viewed the stimuli binocularly in a dark room with a dim night-light. Written informed consent was obtained from the naïve subjects before the commencement of data collection. 
General procedure
All the experiments followed a block design and each experimental condition consisted of 4 blocks with 60 trials per block. The blocks were randomly ordered with the constraint that the nth repeat of a particular center-frequency block occurred only after all frequencies had been tested for at least n − 1 blocks. This was done in order to distribute the blocks of each condition evenly throughout the experiment to prevent practice effects from confounding experimental manipulations. 
For Experiments 1 and 2, a filtered letter stimulus (see Stimuli section for details) was presented in each trial and the task of the subject was to identify the presented letter by pressing the appropriate key. Details of the filtering conditions are provided in the appropriate experiment sections below. The stimuli were presented at a viewing distance of 105 cm. Depending on the eccentricity condition, the subjects had to fixate either at a fixation mark at the center of the screen (foveal viewing) or at a green LED 10 deg above and to the left of the center of the screen (peripheral viewing). The letter contrast was adjusted using the QUEST procedure (Watson & Pelli, 1983) as implemented in the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) to estimate threshold contrast for reaching an accuracy level of 50% (chance is 1/26). Following Chung et al. (2002), we defined the contrast of a filtered letter in units of nominal contrast with respect to an unfiltered letter. Specifically, we assign a (nominal) contrast of x% to a filtered letter that was derived from an unfiltered letter of x% Weber contrast and a filter with a modular transfer function of unit peak amplitude. 
Prior to the main experiment, letter acuity was measured at 10 deg in the inferior right peripheral field. The subjects were asked to identify any of the 26 lowercase letters, while the size of the presented letter was varied using QUEST to achieve an identification accuracy of 79%. 
Stimuli
The 26 lowercase letters in Arial font (Mac OS 9) were generated separately on a background of 256 × 256 pixels. Letter size was set to twice the subject's peripheral letter acuity in both foveal and peripheral viewing conditions. The letter size (the height of a lowercase “x”) used for each of our subjects is shown in Table 1. The letter images were each spatially filtered by a set of raised cosine filters (Alexander, Xie, & Derlacki, 1994; Chung et al., 2002; Chung, Levi, & Legge, 2001; Peli, 1990). Each filter has a bandwidth (full-width at half-height) of 1 octave and is radially symmetric in the log-frequency domain. The transfer function of the filter at radial frequency fr is given by 
G(fr)=12[1.0+cos(π·log(fr)log(fctr)log(fcut)log(fctr))],
(5)
where fctr is the spatial frequency corresponding to the peak amplitude of the filter (center frequency) and fcut is the frequency at which the amplitude of the filter drops to zero (cutoff frequency). Figure 2 depicts examples of filtered images for the letter “a”. 
Table 1
 
Stimuli letter size (x-height) for each subject in units of degrees of visual angle.
Table 1
 
Stimuli letter size (x-height) for each subject in units of degrees of visual angle.
Subject Letter size
ASN 0.96°
BW 1.15°
PLB 1.11°
JS 1.18°
Figure 2
 
Examples of stimuli used for measuring the letter tuning function ( Experiment 1). The unfiltered letters are passed through a set of unit-gain raised cosine filters ( Equation 5) at different center frequencies in half-octave steps.
Figure 2
 
Examples of stimuli used for measuring the letter tuning function ( Experiment 1). The unfiltered letters are passed through a set of unit-gain raised cosine filters ( Equation 5) at different center frequencies in half-octave steps.
The stimuli were displayed at the center of a 19-in. CRT monitor (Sony Trinitron CPD-G400) and the monitor was placed at a distance of 105 cm from the subject. After calibration and gamma-linearization, the monitor has 11 bits (2048 levels) of linearly spaced contrast levels. The experiments and the monitor were controlled from a Mac G4 running OS 9.2.2. All 11 bits of the contrast levels were addressable to render the stimulus for each trial. This was achieved by using a passive video attenuator (Pelli & Zhang, 1991) and a custom-built contrast calibration and control software implemented in MATLAB. Only the green channel of the monitor was used to present the stimuli. 
The stimuli were presented according to the following temporal design: (a) a fixation beep immediately followed by a fixation screen for 500 ms, (b) stimulus presentation for 250 ms, (c) subject response period (variable) with positive feedback beep for correct trials, and (d) 500-ms delay before onset of the next trial. 
On each trial, we collected the identity and contrast of the target letter, and the response of the subject for subsequent data analysis. 
Contrast sensitivity functions
For the purpose of an ideal-observer analysis, the contrast sensitivity function (CSF) was measured using a 2IAFC task in which vertically oriented cosine phase Gabors were presented in one of two intervals. For each trial, the task of the subject was to identify the interval in which the grating was presented. Gratings were presented at 0.1, 1.0, 2.0, 4.0, 8.0, and 16.0 cycles/deg for the foveal viewing condition and at 0.5, 1.0, 2.0, 4.0, and 8.0 cycles/deg for peripheral viewing. The subjects were tested with one spatial frequency in each block, and there were 4 blocks of 60 trials each for each spatial frequency. The space constant (size at 1/ e) of the Gaussian envelope of the gratings was set to 4.7 deg. QUEST procedure was used to adjust the contrast of the gratings to achieve an accuracy level of 79%. 
To incorporate the CSFs into a model, the data points at which the measurements were made were fitted by a bi-parabolic function for the fovea and a biphasic function (flat at low frequencies followed by parabolic roll-off) for the periphery. 1 specifies the equations used for the fit. 
Experiment 1: Measuring the letter tuning function (LTF)
The letter tuning function (Chung et al., 2002) or LTF was measured by presenting letter stimuli filtered with a set of six unit-gain 1-octave wide raised cosine filters (Equation 5) whose center frequencies (fctr) were logarithmically spaced at 1.25, 1.77, 2.5, 3.54, 5.0, and 7.07 cycles/letter, respectively. Figure 2 shows an example of the filtered stimuli. For each trial, the nominal contrast of the filtered letters was adjusted using a QUEST procedure to achieve a performance threshold of 50% correct (chance was 1/26 or 3.8%). There were 240 trials per center frequency, broken into 4 blocks of 60 trials each. 
Results
Figure 4A shows the LTF (contrast sensitivity for letter identification vs. filter center frequency) obtained for our subjects in the fovea (blue) and in the periphery (red). Each data point was estimated from 240 trials with QUEST. The 95% confidence intervals were estimated using a bootstrap procedure (Efron & Tibshirani, 1993). The six data points for each viewing condition were well described by a parabolic function in log–log (r2 > 0.97). The equation of the parabolic function is given by 
log(CSf)=log(A)4σ2log(2)(log(f)log(fpeak))2,
(6)
where CSf is the contrast sensitivity at frequency f, A is the peak sensitivity, fpeak is the peak tuning frequency, and σ is the bandwidth of the function in octaves. 
The peak tuning frequencies as determined by the fitted parabolic functions are shown in Table 2 in units of cycles/letter (1 letter = 1 x-height). For the same letter size, the average peak tuning frequency in the periphery was lower than that in the fovea ( t(6) = 3.7258, p < 0.01). The average tuning bandwidth was about 2.05 and 1.69 (octaves) in the fovea and periphery, respectively ( Table 2). The tuning bandwidths in the periphery were slightly but significantly lower than that that in the fovea ( t(6) = 7.3294, p < 0.01). As will be shown with the ideal-observer analysis, an LTF with a narrower bandwidth is indicative of the observer using perceptual templates of broader bandwidths. These results are generally consistent with those obtained in Chung et al. (2002). By using the same letter size in both the fovea and the periphery conditions, the current study is more sensitive to differences in spatial tuning properties between foveal and peripheral vision. 
Table 2
 
Peak tuning frequencies (cycles/letter) and bandwidths (full-width at half-height in octaves) of the letter tuning functions obtained in Experiment 1.
Table 2
 
Peak tuning frequencies (cycles/letter) and bandwidths (full-width at half-height in octaves) of the letter tuning functions obtained in Experiment 1.
Subject f peak Bandwidth
Fovea 10° Fovea 10°
ASN 2.54 2.07 1.96 1.67
BW 2.99 2.23 2.09 1.72
PLB 2.46 2.29 2.14 1.62
JS 2.82 2.25 2.00 1.74
Experiment 2: Estimating the integration index (Φ)
For each subject in Experiment 1, we generated the stimuli for estimating the integration index ( Equation 2) with respect to the subject's peak tuning frequencies using Equation 4. Specifically, we generated (a) the low frequency component by setting the center frequency of the raised cosine filters ( Equation 5) to one half of f peak; (b) the high frequency component by setting the center frequency to twice f peak; and (c) the composite stimulus by summing the two components at the same contrast ratio as the components would in an unfiltered letter (i.e., a contrast ratio of 1 in units of nominal contrast). See Figure 3 for an example of the stimuli for this experiment. Akin to Experiment 1, we measured the nominal contrast thresholds of these three stimulus categories by using an adaptive QUEST procedure to attain a performance level of 50% accuracy. The integration index was then calculated according to Equation 2
Figure 3
 
Examples of stimuli used for estimating the integration index ( Experiment 2). The upper row shows the low and high spatial frequency components and the composite stimuli for the letter “c.” The lower row shows the corresponding amplitude spectra—since the components are separated by 2 octaves in frequency space, they have nonoverlapping spectra.
Figure 3
 
Examples of stimuli used for estimating the integration index ( Experiment 2). The upper row shows the low and high spatial frequency components and the composite stimuli for the letter “c.” The lower row shows the corresponding amplitude spectra—since the components are separated by 2 octaves in frequency space, they have nonoverlapping spectra.
Results
Figure 4B shows the contrast sensitivities for the component and the composite stimuli overlaid on the LTF curves obtained in Experiment 1. As expected, the contrast sensitivities of the components lie on or very close to the respective LTF curves indicating the robust test-retest reliability. The sensitivities to the composite stimuli are significantly higher than that of the components for all subjects and all viewing conditions. 
Figure 4
 
(A) The letter tuning functions—LTFs: contrast sensitivity (1/threshold contrast) for identifying the 26 letters at 50% correct vs. spatial frequency—obtained from four human observers for the narrow-band letter stimuli ( Experiment 1) are shown for both the fovea (blue) and the periphery (red). The dotted gray lines mark the peak tuning frequency. (B) The observers' contrast sensitivities for identifying the stimuli used in Experiment 2 are shown overlaid on the corresponding letter tuning functions. The square data points represent the sensitivity to the low (left) and high (right) spatial frequency components, respectively; the horizontal lines represent the sensitivity to the composite stimulus. (C) Integration indices for the fovea and the periphery are shown along with the predicted indices for probability summation (green dots). Error bars are bootstrapped 95% confidence intervals.
Figure 4
 
(A) The letter tuning functions—LTFs: contrast sensitivity (1/threshold contrast) for identifying the 26 letters at 50% correct vs. spatial frequency—obtained from four human observers for the narrow-band letter stimuli ( Experiment 1) are shown for both the fovea (blue) and the periphery (red). The dotted gray lines mark the peak tuning frequency. (B) The observers' contrast sensitivities for identifying the stimuli used in Experiment 2 are shown overlaid on the corresponding letter tuning functions. The square data points represent the sensitivity to the low (left) and high (right) spatial frequency components, respectively; the horizontal lines represent the sensitivity to the composite stimulus. (C) Integration indices for the fovea and the periphery are shown along with the predicted indices for probability summation (green dots). Error bars are bootstrapped 95% confidence intervals.
Figure 4C shows the integration indices obtained from the fovea (blue) and periphery (red). The confidence intervals were obtained by bootstrapping. Unlike the typical finding in grating detection or discrimination, integration across spatial frequency components two octave apart was optimal for all four subjects in the fovea. Even more surprisingly, peripheral integration was optimal for subjects ASN, BW, and PLB. For subject JS, we find that Φ is significantly greater than 1 in the periphery. This could be due to the nonorthogonality of the underlying mechanism or perceptual templates used to encode the composite. We will return to a more detailed discussion of this in the section on the Ideal observer analysis
Also shown in Figure 4C (green symbol) are the predicted integration indices for probability summation (see Equation D2). This would be the case if the visual system were to use information from the component it was most sensitive to, as opposed to integrating information from both components. Subjects' performance in both foveal and peripheral viewing conditions was significantly better than that predicted by probability summation. 
To further confirm that these results are qualitatively different from those obtained with grating detection and discrimination, we measured contrast thresholds for subjects ASN, BW, and PLB in a compound-grating orientation discrimination task ( 4). The spatial frequencies of the components gratings were the same as the center frequencies of the component letters in Experiment 2. Indeed, integration was suboptimal in this case and in the range predicted by probability summation ( Figure D1). 
These results suggest that for a letter-identification task, the visual system can indeed integrate information across spatial frequency channels and do so optimally in both the fovea and periphery. 
Ideal observer analysis
What is the most concise model that can quantitatively account for our letter tuning function (LTF) and index of integration data? Chung et al. (2002) found that for both foveal and peripheral vision and over a range of letter sizes, the LTFs can be adequately modeled by an ideal observer with a limited spatial resolution defined as the subject's contrast sensitivity function (CSF) at the test eccentricity. When it comes to accounting for both the peak tuning frequency and tuning bandwidth, this CSF-limited ideal-observer model has no free parameter. Chung et al. essentially argued that to a good approximation, the visual system is optimal in selecting the spatial-frequency features for a letter identification task, after discounting a linear limitation in spatial resolution. Their model provided a mechanistic explanation of the observed letter channel (Solomon & Pelli, 1994) without postulating any specialized channels.1 
We measured the CSF for subjects ASN, BW, and PLB at the test eccentricities used in Experiments 1 and 2. The CSFs and the fits to an analytic form ( 1) are shown in Figure 5. CSF for subject JS could not be collected due to lack of availability of the subject. 
Figure 5
 
Observers' contrast sensitivity function (CSF) for the fovea and the periphery are shown relative to the corresponding letter tuning functions. The ranges of measurement of the CSFs encompass the letter tuning functions. The analytical functions used to fit the CSF ( 1) are used as the front-end filters in the ideal-observer models. Error bars are bootstrapped 95% confidence intervals.
Figure 5
 
Observers' contrast sensitivity function (CSF) for the fovea and the periphery are shown relative to the corresponding letter tuning functions. The ranges of measurement of the CSFs encompass the letter tuning functions. The analytical functions used to fit the CSF ( 1) are used as the front-end filters in the ideal-observer models. Error bars are bootstrapped 95% confidence intervals.
When applied to the composite letters used in Experiment 2, this CSF-limited ideal-observer model predicts an integration index of 1.0, which matched the results of all of our subjects in the foveal condition, and three of the four subjects in the peripheral condition. The model, however, cannot account for the integration index of 1.68 of JS in the periphery. Quantitatively, this model also slightly overestimated the tuning bandwidth for all subjects and underestimated the peak tuning frequency for the fovea condition for subject BW. Similar quantitative deviations were observed in Chung et al. (2002) as well as in Chung and Tjan (2007). 
The CSF-limited ideal-observer model of Chung et al. does not include any front-end spatial frequency channels. The presence of low-level spatial frequency channels followed by transducer nonlinearity will impose a lower bound on the bandwidth of the perceptual templates utilized by an observer. This is because if, for example, everything is seen through two-octave channels, then the basic features that comprise a perceptual template cannot be less than two octaves wide. We can capture this effect of front-end spatial-frequency channels by specifying the bandwidth of the templates used by the ideal-observer model. 
We simulated a family of CSF-limited ideal-observer models, each using internal templates of a different bandwidth, ranging from 1 (same as the stimuli) to 8 octaves. The center frequency of the templates matches the center frequency of the signal. We also tested a model that uses unfiltered letters as templates (i.e., templates of infinite bandwidth). These letter templates were generated by filtering the unfiltered letter images by raised cosine filters ( Equation 5) at the same center frequencies as the stimuli, but with different cutoff frequencies. 
Our model therefore deviated from the true ideal observer in two respects: (a) its spatial resolution was limited by the subject's CSF, and (b) the perceptual templates it used were with bandwidth that might not match that of the stimuli. Barring these limitations, the models were optimal and made their decision by maximizing the posterior probability among the list of candidate templates that are known to the system. Figure 6 shows a schematic of a model. 2 provides the detailed formulation of the ideal-observer model. 
Figure 6
 
Schematic of the ideal-observer models. The stimulus is first linearly filtered by the human CSF. The result is then perturbed by white Gaussian pixel noise. This noisy and filtered signal is then fed into an ideal observer that chooses among a set of responses (defined by the internal templates) the one that maximizes the posterior probability.
Figure 6
 
Schematic of the ideal-observer models. The stimulus is first linearly filtered by the human CSF. The result is then perturbed by white Gaussian pixel noise. This noisy and filtered signal is then fed into an ideal observer that chooses among a set of responses (defined by the internal templates) the one that maximizes the posterior probability.
The CSF and template limited ideal-observer models were executed to perform the tasks that were identical to those in the human experiments ( Experiments 1 and 2), using the same stimulus sets as the corresponding human subjects. Figure 7 shows the results of the model simulations overlaid on the corresponding human data. The ideal-observer model's tuning curves were fitted by parabolic functions ( Equation 6, r 2 = 0.97). The internal noise of the model was adjusted such that at the 50% correct threshold criterion, the peak amplitude of the model's most sensitive tuning function (obtained with the 1-octave templates that matched the bandwidth of the stimuli) is the same as the peak amplitude of the subject's tuning function. Figure 8 compares the bandwidths and peak tuning frequencies of the tuning functions for our family of ideal-observer models to that of the human observers. 
Figure 7
 
(A) The letter tuning functions from the family of six ideal-observer models with internal template bandwidth of 1, 2, 3, 4, 8, and ∞ octaves are shown overlaid on the corresponding human results. (B) The indices of integration obtained from the ideal-observer models are compared with those obtained from the human observers.
Figure 7
 
(A) The letter tuning functions from the family of six ideal-observer models with internal template bandwidth of 1, 2, 3, 4, 8, and ∞ octaves are shown overlaid on the corresponding human results. (B) The indices of integration obtained from the ideal-observer models are compared with those obtained from the human observers.
Figure 8
 
(A) Bandwidths of the models' letter tuning functions are plotted as a function of template bandwidth. Letter tuning function bandwidth decreases monotonically with increasing template bandwidth. For comparison, the bandwidths of the human letter tuning functions are shown by the horizontal solid lines (dotted lines representing the bootstrapped 95% confidence intervals). (B) A similar comparison is made for the peak frequencies of the letter tuning functions. The model peak tuning frequencies are invariant to template bandwidth.
Figure 8
 
(A) Bandwidths of the models' letter tuning functions are plotted as a function of template bandwidth. Letter tuning function bandwidth decreases monotonically with increasing template bandwidth. For comparison, the bandwidths of the human letter tuning functions are shown by the horizontal solid lines (dotted lines representing the bootstrapped 95% confidence intervals). (B) A similar comparison is made for the peak frequencies of the letter tuning functions. The model peak tuning frequencies are invariant to template bandwidth.
Results
As the bandwidth of the templates increases, the model's peak tuning frequency remains unchanged, while its tuning bandwidth decreases ( Figures 7 and 8). In general, the bandwidth of the letter tuning function (or the letter channel) is at its widest when the bandwidth of the templates matches the bandwidth of the stimuli. This theoretical result means that compared to a CSF-limited ideal observer, the efficiency of a CSF and template limited observer with templates broader than the signal decreases as the center frequency of the signal moves away from the peak tuning frequency of the CSF-limited ideal observer. This result makes sense because the alternative would mean that at some signal frequency away from the peak tuning frequency, the efficiency would approach 1.0, which is impossible because the templates used by the observer is broader than the signal by a constant octave. 
We found that a CSF and template limited ideal-observer model using templates with bandwidth around 2 octave accounts quite well for the human data in the fovea and the periphery, both in terms of matching the human letter tuning functions ( Figure 7A) and in reproducing the observed integration indices ( Figure 7B). 2 Ideal-observer models that use internal templates beyond 3 octaves wide produce tuning functions that are significantly narrower than the human curves ( Figure 8A) and also lead to integration indices that are significantly greater than 1.0. The latter is due to the fact that the templates used for the component stimuli are no longer orthogonal. The integration indices for these nonorthogonal channels increase monotonically with increasing degree of overlap. 3 
These results corroborate a two-stage processing hypothesis: a front-end system consisting of independent feature detectors or spatial frequency channels that are narrowly tuned (about 2 octaves) followed by a back-end system that optimally combines the detected features across spatial frequencies. 
Due to events beyond our control, subject JS was not available for the CSF measurement, and as a result, no model simulation could be performed on his data set. Nevertheless, we can account for JS's peripheral integration index, which is significantly greater than 1, from the simulation results for the other subjects. Ideal-observer models that use 3–4 octave wide internal templates yield indices similar to that of JS in the periphery. This suggests that the peripheral visual field for subject JS may be mediated by spatial frequency channels that have wider bandwidths as compared to the other subjects. 
General discussion
Our results show that while contrast sensitivity to filtered letters were significantly lower in the periphery than in the fovea, the periphery is as efficient at integrating features across spatial frequencies as the fovea. That both fovea and periphery exhibit optimal integration across spatial frequencies in a letter identification task is surprising and is in sharp contrast to the sub-optimal probability summation for a compound-grating detection or discrimination task. What might account for this apparent discrepancy? There are at least two possibilities. 
Letters as “natural” stimuli
First, the answer may lie in the overlearning of broadband edges and bars. Compound gratings are not familiar objects, and it is possible that the visual system never develops the mechanisms to optimally combine information across the spatial frequency components in arbitrary phase alignments. In contrast, natural (or overlearned) stimuli like letters are necessarily broadband, and feature-detection must be distributed across many narrowly tuned mechanisms (or channels) found in the early stages of visual processing. More importantly, unlike that of a compound grating, the narrowband components extracted from a natural stimulus are often confined to a specific set of phase configurations that lead to the sharp edges and contours and other common broadband features. It is reasonable to expect the visual system to develop mechanisms that are efficient detectors of these naturally occurring combinations of spatial frequency and phase arrangements. Using an orientation discrimination task, Thomas and Olzak (2001) showed the existence of an efficient summing circuit for gratings that is phase-aligned. The role of familiarity is also hinted at by a finding in Meinhardt (2001), showing that learning a spatial-frequency discrimination task with gratings over a period of weeks resulted in broadening of the spatial tuning function for grating detection (i.e., better cross-frequency summation). However, to our knowledge, there is no study to date showing the emergence of optimal summation following learning a fixed but arbitrary phase alignment between the components of a compound grating. 
One way to test if the efficient integration with letter stimuli depends on naturally occurring phase alignments common to broadband shapes with sharp edges is to randomly jitter the components in a composite letter stimulus. We would expect suboptimal integration in the jittered condition if specific phase alignment is required. Conversely, one can systematically measure integration indices using gratings with relative amplitude and phase that are either consistent with those found in natural images or are parametrically deviated from the naturally occurring ones. To test whether familiarity to a fixed but unnatural phase alignment plays an important role, we can train participants to identify composite letters and gratings with components that are relatively displaced by a fixed amount and compare the integration indices between the naturally occurring phase alignment and the trained and untrained displacements. 
Letter identification and cue integration
A second explanation for the optimal summation observed with letters is that the sensitivity to the components, against which the sensitivity of the composite was compared, is not limited by the front-end spatial frequency channels but by a second-stage feature integration process. It is worth noting that whereas optimal summation has rarely been observed in simple grating detection and discrimination tasks, optimal cue integration is relatively common in tasks such as 3-D depth and slant perception (Blake, Bülthoff, & Sheinberg, 1993; Ernst & Banks, 2002; Hillis, Watt, Landy, & Banks, 2004), texture-defined pattern detection and identification (Landy & Kojima, 2001; Meinhardt, Persike, Mesenholl, & Hagemann, 2006), and visual search (Shimozaki, Eckstein, & Abbey, 2002). 
A common theme among the tasks showing optimal cue integration is that the low-level features that make up the individual cues are supra-threshold such that the visibility of the individual cues are not limited by the sensitivity of the front-end feature detectors. Another common theme is that perceiving the quantity of interest (e.g., slant) using any individual cue (e.g., stereo-disparity) requires integration across low-level features (e.g., local disparity in a random dot stereogram). While different cues may be optimally integrated (e.g., disparity and texture), the low-level integration process subserving each individual cue is often suboptimal (e.g., statistical efficiency for detecting disparity-defined edges is below 20%; Harris & Parker, 1992; Wallace & Mamassian, 2004). This last point is akin to our findings. 
Let us assume that oriented narrowband gratings (Gabors) are the features extracted by the early stages of visual processing. To identify a narrowband letter, these oriented Gabors must be integrated spatially. Different spatial configurations of these Gabor features form different letters. The highest reported detection efficiency for Gabors is about 70% (Burgess, Wagner, Jennings, & Barlow, 1981), while the highest reported efficiency for identifying 2-octave band-pass filtered letters is 42% (Parish & Sperling, 1991), with the more typical values being in the single digits (Gold, Bennett, & Sekular, 1999; Pelli, Burns, Farell, & Moore-Page, 2006; Tjan et al., 1995). The large difference in efficiency is indicative of the presence of additional processing between the stages of Gabor detection and narrowband letter identification. For our tasks, to identify letters at a single spatial frequency band, the Gabor features must be integrated across spatial locations and orientations; to identify letters rendered at two spatial frequency bands, feature integration must also occur across spatial frequencies. Our results suggest that in spite of the inefficiency in cross-location or cross-orientation integration of visual features, the process of integration across spatial frequencies is optimal within the second-stage feature space, irrespective of eccentricity. 
What constitutes a letter channel?
Our results also elaborate the notion of a letter channel (Majaj, Pelli, Kurshan, & Palomares, 2002; Solomon & Pelli, 1994). The finding of optimal summation across letter components widely separated in spatial frequencies clearly argues against treating a letter channel as analogous to a spatial frequency channel. Chung et al. (2002) have shown that to a good approximation the feature selection process underlying the identification of isolated band-pass filtered letters can be explained by the two factors that limit the performance of their CSF-limited ideal-observer models: the letter-identity information in the spatial frequency spectra of the stimulus, called the letter sensitivity function, or LSF, and the observer's contrast sensitivity function, or CSF. Chung et al. showed that an observer's letter tuning function (LTF) is proportional to the product of CSF and LSF, that is: LTF(f) = k × CSF(f) × LSF(f). This simple model argues against the need to posit an active channel selection mechanism that scales with the size of the stimulus. This model corresponds to our ideal-observer model with 1-octave templates. Our simulation results show that while the Chung et al. model is able to predict human performance for cross-spatial-frequency integration, a better fit to the human letter tuning functions can be achieved with 2-octave templates at the same center frequency as the test stimulus. With or without modification to the template bandwidth, the view expressed in Chung et al. (2002) against an active scale-dependent channel selection process remains unchanged. Our results also support the general claim of Chung et al. that the periphery visual system, like the foveal system, uses the appropriate set of spatial-frequency features from the input when performing a letter identification task. 
Feature integration in the periphery
If cross-spatial-frequency selection and integration are optimal in the periphery, why is form vision so much poorer in the periphery than in the fovea? It is likely that the cause of form-vision deficits in the periphery is not about feature integration across spatial frequency, but across spatial locations. It is known that spatial uncertainty is high in the periphery (Hess & Field, 1993; Hess & McCarthy, 1994; Levi & Klein, 1996; Levi, Klein, & Yap, 1987), and there is emerging evidence that crowding in the periphery is due to uncertainty about target or feature locations (Nandy & Tjan, 2007; Strasburger, 2005). This view is consistent with the second explanation of our finding of optimal integration, i.e., from the perspective of cue combination. 
Summary
Our findings suggest that both the fovea and the periphery are equally efficient at integrating information across spatial frequency channels. In fact, this integration is optimal for letter stimuli. In light of the sub-optimal summation that is the norm for compound gratings (also replicated in our study), our results suggest the existence of a stage of feature integration that combines information across spatial frequency channels efficiently, regardless of eccentricity. This stage plays an integral role in the identification of complex broadband stimuli. Taking into account the CSF and the bandwidth of the front-end spatial frequency channels, a simple ideal-observer model explains our data. 
Appendix A
Analytical form for fitting the CSFs
The fovea CSF data points were fitted with the following bi-parabolic function:  
log ( C S f ) = { log ( A ) 4 σ 1 2 log ( 2 ) ( log ( f ) log ( f p e a k ) ) 2 f o r f f p e a k log ( A ) 4 σ 2 2 log ( 2 ) ( log ( f ) log ( f p e a k ) ) 2 f o r f > f p e a k
(A1)
 
The periphery CSF data points were fitted with a biphasic function, which is flat to the left of the peak frequency and has a parabolic roll-off to the right:  
log ( C S f ) = { log ( A ) f o r f f p e a k log ( A ) 4 σ 2 log ( 2 ) ( log ( f ) log ( f p e a k ) ) 2 f o r f > f p e a k
(A2)
In both Equations A1 and A2, A is the peak sensitivity of the CSF, f is the spatial frequency, CS f is the contrast sensitivity at frequency f, f peak is the spatial frequency at peak sensitivity, and σ is the bandwidth of one limb of the function in octaves. Equations A1 and A2 provide accurate interpolation of the CSF within the relevant range of spatial frequencies needed for simulating our ideal-observer model. 
Appendix B
Ideal-observer model
Here we briefly describe the mathematical formulation of the CSF and template limited ideal-observer models used in this study. A schematic of the model is shown in Figure 6, which is an extension of the model used in Chung et al. (2002). As in Chung et al. we assumed that a linear filter derived from the human CSF and an additive white Gaussian noise source were situated between the input signal, S, and the ideal observer (a maximum a posteriori (MAP) classifier). A more detailed description of the decision rule for such a classifier can be found elsewhere (Tjan et al., 1995). Here we restate the derivations that are specific to our current application. 
Let G( f) denote the transfer function of the CSF ( 1) and N int be the noise source with each noise pixel being normally distributed with a mean of zero and standard deviation σ. Then the resulting input that is fed into the MAP classifier is given by  
I = F 1 { F { S } · G } + N int ,
(B1)
where F{·} and F −1{·} represent forward and inverse Fourier transform operations and S is the filtered letter stimulus. 
Templates used by the model were generated by filtering broadband letters with raised cosine filters of various bandwidths at the same center frequency as that of the stimulus. Because the templates can be of a different bandwidth than the stimulus, the contrast of the internal templates is estimated as part of the decision procedure. In other words, the ideal-observer model has an uncertainty in stimulus contrast. 
To be maximally correct on average, the MAP classifier selects the response (corresponding to the internal template T) that is the most probable given the input I:  
P ( T j | I ) = P ( c T j | I ) d c = P ( I | c T j ) P ( c T j ) P ( I ) d c ( B a y e s r u l e ) ,
(B2)
where c is the internal template contrast, which is not necessarily the same as the stimulus contrast. Since the prior probability P( cT j) is constant within the range of feasible contrasts (all letters are equally likely), and P( I) does not depend on T j, we can further simplify the posterior probability as follows:  
P ( T j | I ) = k P ( I | c T j ) d c = K exp ( 1 2 σ 2 I c T j 2 ) d c
(B3)
The chosen response of the MAP classifier is then the signal corresponding to the template T j MAP where  
j M A P = a r g m a x j P ( T j | I )
(B4)
Since taking the integral in Equation B3 is nontrivial, we adopted the following approximation by replacing integration with maximization. That is, we computed:  
j M A P = a r g m a x j P ( T j | I ) = a r g m a x j P ( I | c T j ) d c a r g m a x j { max c P ( I | c T j ) } = a r g m a x j { max c [ exp ( 1 2 σ 2 I c T j 2 ) ] } = a r g m a x j { min c I c T j 2 }
(B5)
 
Appendix C
Optimal integration for ideal observer and ideal-observer-like visual system
Claim 1: Let c f be the contrast threshold in nominal contrast units for identifying a narrow-band letter at center frequency f. If there is no overlap in the frequency domain between the components with center frequencies f 1 and f 2 (i.e., the components are orthogonal, with a zero dot product), then for an ideal observer limited by an invariant (stimulus-independent) additive white input noise (at the front-end), the contrast sensitivity (reciprocal of the threshold contrast) for the composite can be predicted from those for the components:  
1 / c f 1 + f 2 2 = 1 / c f 1 2 + 1 / c f 2 2
(C1)
 
Proof: An ideal observer (a maximum a posteriori classifier) with white additive input noise makes decisions by maximizing the posterior probability given by Equation B3. The posterior is computed in terms of the squared Euclidean distance between the input image ( I) and the letter templates at the test contrast ( cT j), normalized by the noise variance, which is a constant for invariant additive noise. As a result, the pairwise Euclidean distances between the letters at the test contrast jointly determine the average accuracy (Tjan et al., 1995, 2). Hence, without lost of generality, we can confine our derivation to the Euclidean distance between two randomly chosen letters. At a given criterion accuracy, the Euclidean distance between the generic letter pair is a constant. 
Let A and B be the templates of two letters such that at contrast c, the signals corresponding to these letters are cA and cB, and their Euclidean distance is cAB∥. For a composite at contrast c, the nominal contrast of its components is also c by definition. This definition of contrast conveniently reflects the equality: cA f 1+ f 2 = c( A f 1 + A f 2) = cA f 1 + cA f 2, where c is the nominal contrast of A f 1, A f 2, and A f 1+ f 2. Now consider the Euclidean distance between two composite letters at nominal contrast c:  
c A f 1 + f 2 B f 1 + f 2 = c ( A f 1 + f 2 B f 1 + f 2 ) 2 = c A f 1 2 + A f 2 2 2 A f 1 B f 1 2 A f 2 B f 2 + B f 1 2 + B f 2 2 = c ( A f 1 B f 1 ) 2 + ( A f 2 B f 2 ) 2
(C2)
This is because the dot products of the cross-frequency terms (e.g., A f 1 A f 2 or A f 1 B f 2) are zero since the components are orthogonal. Equivalently:  
c 2 A f 1 + f 2 B f 1 + f 2 2 = c 2 A f 1 B f 1 2 + c 2 A f 2 B f 2 2
(C3)
 
As noted earlier, at a given criterion accuracy, the Euclidean distance between a generic pair of letter templates is a constant. Hence,  
c f 1 2 A f 1 B f 1 2 = c f 2 2 A f 2 B f 2 2 = c f 1 + f 2 2 A f 1 + f 2 B f 1 + f 2 2 ,
(C4)
where c f 1, c f 2, and c f 1+ f 2 are the threshold nominal contrasts for the components f 1 and f 2 and the composite f 1+ f 2, respectively. Expressing the composite of Equation C4 in terms of its components using Equation C3, we have:  
c f 1 2 A f 1 B f 1 2 = c f 2 2 A f 2 B f 2 2 = c f 1 + f 2 2 A f 1 B f 1 2 + c f 1 + f 2 2 A f 2 B f 2 2
(C5)
Solving the set of equations in Equation C5 results in Equation C1, hence proving 3:  
{ 1 = c f 2 2 A f 2 B f 2 2 c f 1 2 A f 1 B f 1 2 1 = c f 1 + f 2 2 A f 1 B f 1 2 + c f 1 + f 2 2 A f 2 B f 2 2 c f 1 2 A f 1 B f 1 2 1 = c f 1 + f 2 2 c f 1 2 + c f 1 + f 2 2 c f 2 2 1 c f 1 + f 2 2 = 1 c f 1 2 + 1 c f 2 2
(C6)
 
Corollary 1.1: Equation C1 holds if the additive input noise of the ideal observer is a multivariate Gaussian and not necessarily uncorrelated or white
Proof: A linear transformation (a matrix multiplication) that decorrelates the noise can be applied simultaneously to the input, noise, and letter templates, such that a white-noise ideal observer formulation is applicable to the transformed input. Let P be the “pre-whitening” matrix. We can follow the derivation of Equation C1 in 3 by replacing A and B by PA and PB, respectively. 
Corollary 1.2: Equation C1 holds if a linear filter is placed at the front-end of an ideal observer (as in the CSF-limited ideal observer) before the input noise
Proof: We can filter both the input and the templates and apply a Gaussian-noise ideal observer to the transformed input. If F is linear filter, we can derive Equation C1 by replacing A and B by FA and FB, respectively, in 3
Corollary 1.3: Equation C1 holds for an observer that is otherwise a Gaussian-noise ideal observer but with sampling efficiencies η 1 for f 1 and η 2 for f 2
Proof: If the sampling efficiency is η, then the effective Euclidean distance between a generic pair of letters is scaled by η. In the derivation of 3, this scaling can be implemented by replacing ∥ A f 1B f 1∥ by η 1A f 1B f 1∥ and ∥ A f 2B f 2∥ by η 2A f 2B f 2∥, starting at Equation C5
Corollary 1.4: Equation C1 holds for an observer with a contrast nonlinear transducer situated after the input noise
Proof: Such a nonlinearity does not affect pixel-wise signal-to-noise ratio and can be undone by taking its inverse. The effective Euclidean between any pair of letters is unaffected; hence the derivation of 3 still applies. 
Claim 2: Equation C1 holds if a stimulus-dependent Gaussian noise source is added after the invariant input noise, such that the variance of this second noise source is proportional to the sum of the contrast energy of the signal and the variance of the invariant input noise. Such a noise source is often called a “multiplicative” noise. The invariant noise is commonly referred to as an “additive” noise. The additive and multiplicative noises are stochastically independent
Proof: As stated earlier, the relevant quantity in computing the posterior is the ratio between the squared Euclidean distance between the input and a template, and the variance of the noise ( Equation B3). In the derivation of 3, the variance of the noise is ignored because it is a constant. In the case of having a contrast-dependent noise, the proof of 3 can be replayed to arrive at a version of Equation C6 by considering the equality in the “effective” Euclidean distance between a pair of generic letters, which is the Euclidean distance normalized by the total variance of the noise sources. To simplify notations, we will consider only the case when both noise sources are white and independent. By applying Corollary 1.1, the proof can be extended to correlated Gaussian noise sources. 
Let m be the proportionality constant that relates the variance of the multiplicative noise to the sum of the contrast energy and the variance of the additive noise. Let a be the image area in units of pixels, X be an arbitrary letter and [.] denote expected value. The equality of effective Euclidean distances at threshold nominal contrast can be expressed as:  
c f 1 2 A f 1 B f 1 2 σ 2 + ( σ 2 + [ X f 1 2 ] c f 1 2 / a ) m = c f 2 2 A f 2 B f 2 2 σ 2 + ( σ 2 + [ X f 2 2 ] c f 2 2 / a ) m = c f 1 + f 2 2 A f 1 + f 2 B f 1 + f 2 2 σ 2 + ( σ 2 + [ X f 1 + f 2 2 ] c f 1 + f 2 2 / a ) m
(C7)
Following the derivation of 3, we arrive at a normalized version of Equation C6:  
σ 2 + ( σ 2 + [ X f 1 + f 2 2 ] c f 1 + f 2 2 / a ) m c f 1 + f 2 2 = σ 2 + ( σ 2 + [ X f 2 2 ] c f 2 2 / a ) m c f 2 2 + σ 2 + ( σ 2 + [ X f 1 2 ] c f 1 2 / a ) m c f 1 2
(C8)
Divide both side of the equality by (1 + m) σ 2 and rearrange terms, we have:  
1 c f 1 + f 2 2 = 1 c f 1 2 + 1 c f 2 2 + m ( 1 + m ) σ 2 a ( [ X f 1 2 ] + [ X f 2 2 ] [ X f 1 + f 2 2 ] )
(C9)
Moreover, since the f 1 and f 2 components are orthogonal to each other,  
[ X f 1 + f 2 2 ] = [ X f 1 + X f 2 2 ] = [ X f 1 2 + X f 2 2 + 2 X f 1 T X f 2 ] = [ X f 1 2 ] + [ X f 2 2 ]
(C10)
 
Hence, the second line of Equation C9 is zero, which gives us Equation C1
Corollary 2.1: Equation C1 holds under supra-threshold conditions for an observer whose front-end consists of an additive “peripheral” noise, followed by a pixel-wise logarithmic compressive nonlinearity, followed by a second additive “central” noise, and is otherwise ideal
Proof: Legge et al. (1987) showed in their 1 that for a contrast discrimination task when the pedestal is above detection threshold (supra-threshold), an observer with a nonlinear transducer situated between two invariant noise sources is equivalent to an observer with contrast-invariant periphery noise followed by a contrast-dependent central noise, without any nonlinearity in between. They did so by expressing the nonlinearity with a Taylor series and consider the net signal-to-noise ratio. They showed that the variance of the contrast-dependent noise is inversely proportional to the squared derivative of the transducer function under supra-threshold conditions. If we are to require the variance of the central noise to be proportional to input contrast energy, as is required by 3, then the derivative of the nonlinearity has to be inversely proportional to input contrast, thus implying that the nonlinearity must be logarithmic (and thus compressive) in contrast. 3 applies under these conditions, implying Equation C1
Although the derivation of Legge et al. concerns a contrast discrimination task, it is equally applicable to a n-way identification task since as we have mentioned in the derivation of 3, all that is relevant for determining the performance of a maximum-a posteriori observer for an n-way discrimination task is the generic pairwise distance ( d′ in the case of a 2-way discrimination) between a pair of alternatives. 
Corollary 2.2: For an observer with an expansive nonlinearity in between the peripheral and central noise, the threshold contrast sensitivities to the components and the composite stimuli approach Equation C1 in supra-threshold conditions as stimulus contrast increases
Proof: Recall that Legge et al. (1987) showed that the variance of equivalent central contrast-dependent noise is inversely proportional to the squared derivative of the transducer function. For an expansive nonlinearity, the derivative increases as a function of contrast. High contrast thus reduces the variance of the central noise and diminishes its influence. In the limit, the central noise becomes irrelevant and the performance is only limited by the invariant periphery noise, meeting the conditions of 3. Hence, Equation C1 holds in the limit of high stimulus contrast for an expansive nonlinearity. 
Appendix D
Grating orientation discrimination task
Our goal in this supplementary experiment was to measure the index of integration for gratings in a way that is comparable to the spatial frequencies and size of the letter stimuli in Experiment 2. The component gratings were tested at the same frequencies as that of the center frequencies of the letter components for subjects ASN, BW, and PLB in Experiment 2. For ASN, this corresponded to f low = 1.63 and f high = 6.51 cpd in the fovea and f low = 1.3 and f high = 5.2 cpd in the periphery. The compound gratings were formed by combining (in sine phase) the corresponding component gratings according to the ratio of their detection thresholds, estimated from the subject's CSF. For example, the detection threshold in the periphery of ASN was 0.86% in Weber contrast for f low and 1.92% for f high; as a result, the contrast ratio of the components in the composite was 0.86 ( f low) to 1.92 ( f high). Similar in principle to the definition used for letters, we define the nominal contrast of a component or the composite to be the corresponding Weber contrast of the f low component in the composite. 
The task was a 2-way discrimination task between horizontally and vertically oriented Gabor patches. The space constant of the Gaussian envelope of the gratings was set to the same size as that of the letter stimuli (0.96° for ASN, 1.15° for BW, 1.11° for PLB). QUEST procedure was used to adjust the contrast of the gratings to achieve an accuracy level of 74%. As with Experiment 2, the component and the composite gratings were tested in interleaving blocks. Stimulus presentation timing was identical to that of Experiment 2. The number of trials per condition was 180 divided over three blocks. 
The integration index was estimated as in Experiment 2:  
Φ = C S f l o w + f h i g h 2 C S f l o w 2 + C S f h i g h 2
(D1)
Figure D1 shows the obtained integration indices in the fovea and in the periphery. Also shown are the predicted indices for probability summation which were estimated by considering only the response to the maximally sensitive component:  
Φ p r o b . s u m . = ( max [ C S f l o w , C S f h i g h ] ) 2 C S f l o w 2 + C S f h i g h 2
(D2)
 
Figure D1
 
Integration indices of observers ASN, BW, and PLB for the orientation discrimination task with sinusoidal gratings. The human indices are suboptimal for both the fovea and periphery conditions and include within their 95% CIs the predicted indices for probability summation (green data symbols).
Figure D1
 
Integration indices of observers ASN, BW, and PLB for the orientation discrimination task with sinusoidal gratings. The human indices are suboptimal for both the fovea and periphery conditions and include within their 95% CIs the predicted indices for probability summation (green data symbols).
As has been showed in the past (Graham et al., 1978), the process of integration of compound gratings is sub-optimal and in the range predicted by probability summation. We found no practice effect on integration index across blocks of trials (highest positive change in integration index between first and final blocks was 0.15 for BW in the fovea; highest negative change was 0.22 for PLB in the periphery). 
Footnote
Footnotes
1  We chose to analyze our results using the ideal-observer model of Chung et al. (2002) because excluding the front-end CSF filter, the model is the white-noise ideal observer formulation for our octave bandwidth letter stimuli. We note that the ideal observer tuning function of Chung et al. (2002, Figure 10) is band-pass while that of Solomon and Pelli (1994, Figures 4c and 4d) is low-pass, even though both groups used a white-noise ideal observer. This discrepancy in the ideal-observer tuning functions is superficial since the two groups measured the tuning function in different units. Following the standard engineering conventions, Solomon and Pelli measured amplitude gain per unit linear bandwidth per unit radius. In contrast, staying native to the stimulus units, Chung et al. measured “letter sensitivity” (equivalent to amplitude gain) in units of per octave bandwidth per 2π radii. Chung et al. used octave units because they tested their human observers with octave-wide filtered letters. For Chung et al., the linear bandwidth decreases in proportion to decreasing spatial frequency. As a result, letter sensitivity also decreases with frequency, reaching zero at DC (since the linear bandwidth of a one-octave width stimulus at DC is zero).  We can derive the exact formula equating the letter sensitivity H( f) of Chung et al. to the amplitude gain G( f) of Solomon and Pelli. Let
E
( f) be the fraction of signal energy utilized by the ideal observer for the letter-identification task over all orientations and for all radial spatial frequencies less than f. The gain of Solomon & Pelli is equivalent to  
G ( f ) = 1 2 π f d E d f
(N1)
The letter sensitivity of Chung et al. is  
H ( f ) = d E d ( log 2 f )
(N2)
Applying chain rule and rearranging terms, we have  
G ( f ) = 1 2 π f d E d ( log 2 f ) d ( log 2 f ) d f = H ( f ) 1 2 π ln ( 2 ) f 2 H ( f ) = f G ( f ) 2 π ln ( 2 ) f G ( f )
(N3)
 Solomon and Pelli found that G( f) is low-pass. By Equation N3, the sensitivity of Chung et al., H( f), should be bandpass, which is the case. Furthermore, Equation N3 suggests that the low-frequency falloff of H( f) should approach a log–log slope of 1.0 at low radial frequencies, which is also the case (Chung et al., 2002, Figure 10).
Footnotes
2  BW's fovea LTF is peculiar since it straddles the 4-octave model LTF at the low frequencies and the 1-octave model LTF at the high frequencies. Our model used a single template bandwidth and underestimated the peak tuning frequency by 0.2 octave, an error that is less than but comparable to those observed in the foveal condition in Chung et al. (2002).
Footnotes
3  We observe that the integration index of the model with 2-octave wide templates is very slightly but consistently lower than the observer model with the templates matched to the signal (1-octave wide). Recall that a template for the composite condition is equal to the sum of the templates for the corresponding components. A component template that is 2-octaves wide (full width at half height) can “see” very little of the other signal component in the composite condition because the 1-octave signal components are two octaves apart. Thus, as far as the narrow-band signals are concerned, these 2-octave templates appear independent. However, the component templates see identical noise in the frequency range where they overlap and are not independent with respective to noise. As a result, the signal-to-noise ratio seen by a composite template is lower than the sum of the signal-to-noise ratios seen by the component templates, hence a slight reduction in integration index. The 2-octave templates are not special. If the 1-octave signal components were three octaves apart, then we would observe a slightly decreased integration index for models with template bandwidths between one to three octaves.
Acknowledgments
This research was supported by the National Institutes of Health Grants EY016391 and EY016093 to BST. 
Commercial relationships: none. 
Corresponding author: Bosco S. Tjan. 
Email: btjan@usc.edu. 
Address: Department of Psychology, 3620 South McClintock, SGM 501, University of Southern California, Los Angeles, CA 90089-1061, USA. 
References
Alexander, K. R. Xie, W. Derlacki, D. J. (1994). Spatial-frequency characteristics of letter identification. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 11, 2375–2382. [PubMed] [CrossRef] [PubMed]
Blake, A. Bülthoff, H. H. Sheinberg, D. (1993). Shape from texture: Ideal observers and human psychophysics. Vision Research, 33, 1723–1737. [PubMed] [CrossRef] [PubMed]
Blake, R. Tadin, D. Sobel, K. V. Raissian, T. A. Chong, S. C. (2006). Strength of early visual adaptation depends on visual awareness. Proceedings of the National Academy of Sciences of the United States of America, 103, 4783–4788. [PubMed] [Article] [CrossRef] [PubMed]
Blakemore, C. Campbell, F. W. (1969). On the existence of neurones in the human visual system selectively sensitive to the orientation and size of retinal images. The Journal of Physiology, 203, 237–260. [PubMed] [Article] [CrossRef] [PubMed]
Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433–436. [PubMed] [CrossRef] [PubMed]
Burgess, A. E. Wagner, R. F. Jennings, R. J. Barlow, H. B. (1981). Efficiency of human visual signal discrimination. Science, 214, 93–94. [PubMed] [CrossRef] [PubMed]
Campbell, F. W. Robson, J. G. (1968). Application of Fourier analysis to the visibility of gratings. The Journal of Physiology, 197, 551–566. [PubMed] [Article] [CrossRef] [PubMed]
Chung, S. T. Legge, G. E. Tjan, B. S. (2002). Spatial-frequency characteristics of letter identification in central and peripheral vision. Vision Research, 42, 2137–2152. [PubMed] [CrossRef] [PubMed]
Chung, S. T. Levi, D. M. Legge, G. E. (2001). Spatial-frequency and contrast properties of crowding. Vision Research, 41, 1833–1850. [PubMed] [CrossRef] [PubMed]
Chung, S. T. Tjan, B. S. (2007). Shift in spatial scale in identifying crowded letters. Vision Research, 47, 437–451. [PubMed] [CrossRef] [PubMed]
Efron, B. Tibshirani, R. (1993). An introduction to the bootstrap. Monographs on statistics and applied probability; 57
Ernst, M. O. Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415, 429–433. [PubMed] [CrossRef] [PubMed]
Gold, J. Bennett, P. J. Sekuler, A. B. (1999). Identification of band-pass filtered letters and faces by human and ideal observers. Vision Research, 39, 3537–3560. [PubMed] [CrossRef] [PubMed]
Graham, N. Robson, J. G. Nachmias, J. (1978). Grating summation in fovea and periphery. Vision Research, 18, 815–825. [PubMed] [CrossRef] [PubMed]
Harris, J. M. Parker, A. J. (1992). Efficiency of stereopsis in random-dot stereograms. Journal of the Optical Society of America A, Optics and Image Science, 9, 14–24. [PubMed] [CrossRef] [PubMed]
He, S. Cavanagh, P. Intriligator, J. (1996). Attentional resolution and the locus of visual awareness. Nature, 383, 334–337. [PubMed] [CrossRef] [PubMed]
Hess, R. F. Field, D. (1993). Is the increased spatial uncertainty in the normal periphery due to spatial undersampling or uncalibrated disarray? Vision Research, 33, 2663–2670. [PubMed] [CrossRef] [PubMed]
Hess, R. F. McCarthy, J. (1994). Topological disorder in peripheral vision. Visual Neuroscience, 11, 1033–1036. [PubMed] [CrossRef] [PubMed]
Hillis, J. M. Watt, S. J. Landy, M. S. Banks, M. S. (2004). Slant from texture and disparity cues: Optimal cue combination. Journal of Vision, 4, (12):1, 967–992, http://journalofvision.org/4/12/1/, doi:10.1167/4.12.1. [PubMed] [Article] [CrossRef] [PubMed]
Landy, M. S. Kojima, H. (2001). Ideal cue combination for localizing texture-defined edges. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 18, 2307–2320. [PubMed] [CrossRef] [PubMed]
Legge, G. E. Kersten, D. Burgess, A. E. (1987). Contrast discrimination in noise. Journal of the Optical Society of America A, Optics and Image Science, 4, 391–404. [PubMed] [CrossRef] [PubMed]
Levi, D. M. Hariharan, S. Klein, S. A. (2002). Suppressive and facilitatory spatial interactions in peripheral vision: Peripheral crowding is neither size invariant nor simple contrast masking. Journal of Vision, 2, (2):3, 167–177, http://journalofvision.org/2/2/3/, doi:10.1167/2.2.3. [PubMed] [Article] [CrossRef]
Levi, D. M. Klein, S. A. (1996). Limitations on position coding imposed by undersampling and univariance. Vision Research, 36, 2111–2120. [PubMed] [CrossRef] [PubMed]
Levi, D. M. Klein, S. A. Yap, Y. L. (1987). Positional uncertainty in peripheral and amblyopic vision. Vision Research, 27, 581–597. [PubMed] [CrossRef] [PubMed]
Majaj, N. J. Pelli, D. G. Kurshan, P. Palomares, M. (2002). The role of spatial frequency channels in letter identification. Vision Research, 42, 1165–1184. [PubMed] [CrossRef] [PubMed]
Meinhardt, G. (2001). Learning a grating discrimination task broadens human spatial frequency tuning. Biological Cybernetics, 84, 383–400. [PubMed] [CrossRef] [PubMed]
Meinhardt, G. Persike, M. Mesenholl, B. Hagemann, C. (2006). Cue combination in a combined feature contrast detection and figure identification task. Vision Research, 46, 3977–3993. [PubMed] [CrossRef] [PubMed]
Nandy, A. S. Tjan, B. S. (2007). The nature of letter crowding as revealed by first- and second-order classification images. Journal of Vision, 7, (2):5, 1–26, http://journalofvision.org/7/2/5/, doi:10.1167/7.2.5. [PubMed] [Article] [CrossRef] [PubMed]
Parish, D. H. Sperling, G. (1991). Object spatial frequencies, retinal spatial frequencies, noise, and the efficiency of letter discrimination. Vision Research, 31, 1399–1415. [PubMed] [CrossRef] [PubMed]
Peli, E. (1990). Contrast in complex images. Journal of the Optical Society of America A, Optics and Image Science, 7, 2032–2040. [PubMed] [CrossRef] [PubMed]
Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. [PubMed] [CrossRef] [PubMed]
Pelli, D. G. Burns, C. W. Farell, B. Moore-Page, D. C. (2006). Feature detection and letter identification. Vision Research, 46, 4646–4674. [PubMed] [CrossRef] [PubMed]
Pelli, D. G. Palomares, M. Majaj, N. J. (2004). Crowding is unlike ordinary masking: Distinguishing feature integration from detection. Journal of Vision, 4, (12):12, 1136–1169, http://journalofvision.org/4/12/12/, doi:10.1167/4.12.12. [PubMed] [Article] [CrossRef]
Pelli, D. G. Zhang, L. (1991). Accurate control of contrast on microcomputer displays. Vision Research, 31, 1337–1350. [PubMed] [CrossRef] [PubMed]
Quick, Jr., R. F. Mullins, W. W. Reichert, T. A. (1978). Spatial summation effects on two-component grating thresholds. Journal of the Optical Society of America, 68, 116–124. [PubMed] [CrossRef] [PubMed]
Shimozaki, S. S. Eckstein, M. P. Abbey, C. K. (2002). Stimulus information contaminates summation tests of independent neural representations of features. Journal of Vision, 2, (5):1, 354–370, http://journalofvision.org/2/5/1/, doi:10.1167/2.5.1. [PubMed] [Article] [CrossRef] [PubMed]
Solomon, J. A. Pelli, D. G. (1994). The visual filter mediating letter identification. Nature, 369, 395–397. [PubMed] [CrossRef] [PubMed]
Strasburger, H. (2005). Unfocused spatial attention underlies the crowding effect in indirect form vision. Journal of Vision, 5, (11):8, 1024–1037, http://journalofvision.org/5/11/8/, doi:10.1167/5.11.8. [PubMed] [Article] [CrossRef]
Stromeyer, C. F. Julesz, B. (1972). Spatial-frequency masking in vision: Critical bands and spread of masking. Journal of the Optical Society of America, 62, 1221–1232. [PubMed] [CrossRef] [PubMed]
Thomas, J. P. Olzak, L. A. (2001). Spatial phase sensitivity of mechanisms mediating discrimination of small orientation differences. Journal of the Optical Society of America A, Optics, Image Science, and Vision, 18, 2197–2203. [PubMed] [CrossRef] [PubMed]
Tjan, B. S. Braje, W. L. Legge, G. E. Kersten, D. (1995). Human efficiency for recognizing 3-D objects in luminance noise. Vision Research, 35, 3053–3069. [PubMed] [CrossRef] [PubMed]
Wallace, J. M. Mamassian, P. (2004). The efficiency of depth discrimination for non-transparent and transparent stereoscopic surfaces. Vision Research, 44, 2253–2267. [PubMed] [CrossRef] [PubMed]
Watson, A. B. Pelli, D. G. (1983). QUEST: A Bayesian adaptive psychometric method. Perception & Psychophysics, 33, 113–120. [PubMed] [CrossRef] [PubMed]
Figure 1
 
A broadband stimulus like the letter “a” (A) can be decomposed into narrow-band components (B) by filtering the stimulus with a set of band-pass filters centered at different spatial frequencies. (C) Two narrow-band components that are two octave apart in frequency space and their composite sum are used to measure the index of integration Φ ( Equation 2) in Experiment 2. (D) Three possible outcomes of the integration index: optimal summation (left panel), sub-optimal summation (middle panel) and nonorthogonal summation (right panel).
Figure 1
 
A broadband stimulus like the letter “a” (A) can be decomposed into narrow-band components (B) by filtering the stimulus with a set of band-pass filters centered at different spatial frequencies. (C) Two narrow-band components that are two octave apart in frequency space and their composite sum are used to measure the index of integration Φ ( Equation 2) in Experiment 2. (D) Three possible outcomes of the integration index: optimal summation (left panel), sub-optimal summation (middle panel) and nonorthogonal summation (right panel).
Figure 2
 
Examples of stimuli used for measuring the letter tuning function ( Experiment 1). The unfiltered letters are passed through a set of unit-gain raised cosine filters ( Equation 5) at different center frequencies in half-octave steps.
Figure 2
 
Examples of stimuli used for measuring the letter tuning function ( Experiment 1). The unfiltered letters are passed through a set of unit-gain raised cosine filters ( Equation 5) at different center frequencies in half-octave steps.
Figure 3
 
Examples of stimuli used for estimating the integration index ( Experiment 2). The upper row shows the low and high spatial frequency components and the composite stimuli for the letter “c.” The lower row shows the corresponding amplitude spectra—since the components are separated by 2 octaves in frequency space, they have nonoverlapping spectra.
Figure 3
 
Examples of stimuli used for estimating the integration index ( Experiment 2). The upper row shows the low and high spatial frequency components and the composite stimuli for the letter “c.” The lower row shows the corresponding amplitude spectra—since the components are separated by 2 octaves in frequency space, they have nonoverlapping spectra.
Figure 4
 
(A) The letter tuning functions—LTFs: contrast sensitivity (1/threshold contrast) for identifying the 26 letters at 50% correct vs. spatial frequency—obtained from four human observers for the narrow-band letter stimuli ( Experiment 1) are shown for both the fovea (blue) and the periphery (red). The dotted gray lines mark the peak tuning frequency. (B) The observers' contrast sensitivities for identifying the stimuli used in Experiment 2 are shown overlaid on the corresponding letter tuning functions. The square data points represent the sensitivity to the low (left) and high (right) spatial frequency components, respectively; the horizontal lines represent the sensitivity to the composite stimulus. (C) Integration indices for the fovea and the periphery are shown along with the predicted indices for probability summation (green dots). Error bars are bootstrapped 95% confidence intervals.
Figure 4
 
(A) The letter tuning functions—LTFs: contrast sensitivity (1/threshold contrast) for identifying the 26 letters at 50% correct vs. spatial frequency—obtained from four human observers for the narrow-band letter stimuli ( Experiment 1) are shown for both the fovea (blue) and the periphery (red). The dotted gray lines mark the peak tuning frequency. (B) The observers' contrast sensitivities for identifying the stimuli used in Experiment 2 are shown overlaid on the corresponding letter tuning functions. The square data points represent the sensitivity to the low (left) and high (right) spatial frequency components, respectively; the horizontal lines represent the sensitivity to the composite stimulus. (C) Integration indices for the fovea and the periphery are shown along with the predicted indices for probability summation (green dots). Error bars are bootstrapped 95% confidence intervals.
Figure 5
 
Observers' contrast sensitivity function (CSF) for the fovea and the periphery are shown relative to the corresponding letter tuning functions. The ranges of measurement of the CSFs encompass the letter tuning functions. The analytical functions used to fit the CSF ( 1) are used as the front-end filters in the ideal-observer models. Error bars are bootstrapped 95% confidence intervals.
Figure 5
 
Observers' contrast sensitivity function (CSF) for the fovea and the periphery are shown relative to the corresponding letter tuning functions. The ranges of measurement of the CSFs encompass the letter tuning functions. The analytical functions used to fit the CSF ( 1) are used as the front-end filters in the ideal-observer models. Error bars are bootstrapped 95% confidence intervals.
Figure 6
 
Schematic of the ideal-observer models. The stimulus is first linearly filtered by the human CSF. The result is then perturbed by white Gaussian pixel noise. This noisy and filtered signal is then fed into an ideal observer that chooses among a set of responses (defined by the internal templates) the one that maximizes the posterior probability.
Figure 6
 
Schematic of the ideal-observer models. The stimulus is first linearly filtered by the human CSF. The result is then perturbed by white Gaussian pixel noise. This noisy and filtered signal is then fed into an ideal observer that chooses among a set of responses (defined by the internal templates) the one that maximizes the posterior probability.
Figure 7
 
(A) The letter tuning functions from the family of six ideal-observer models with internal template bandwidth of 1, 2, 3, 4, 8, and ∞ octaves are shown overlaid on the corresponding human results. (B) The indices of integration obtained from the ideal-observer models are compared with those obtained from the human observers.
Figure 7
 
(A) The letter tuning functions from the family of six ideal-observer models with internal template bandwidth of 1, 2, 3, 4, 8, and ∞ octaves are shown overlaid on the corresponding human results. (B) The indices of integration obtained from the ideal-observer models are compared with those obtained from the human observers.
Figure 8
 
(A) Bandwidths of the models' letter tuning functions are plotted as a function of template bandwidth. Letter tuning function bandwidth decreases monotonically with increasing template bandwidth. For comparison, the bandwidths of the human letter tuning functions are shown by the horizontal solid lines (dotted lines representing the bootstrapped 95% confidence intervals). (B) A similar comparison is made for the peak frequencies of the letter tuning functions. The model peak tuning frequencies are invariant to template bandwidth.
Figure 8
 
(A) Bandwidths of the models' letter tuning functions are plotted as a function of template bandwidth. Letter tuning function bandwidth decreases monotonically with increasing template bandwidth. For comparison, the bandwidths of the human letter tuning functions are shown by the horizontal solid lines (dotted lines representing the bootstrapped 95% confidence intervals). (B) A similar comparison is made for the peak frequencies of the letter tuning functions. The model peak tuning frequencies are invariant to template bandwidth.
Figure D1
 
Integration indices of observers ASN, BW, and PLB for the orientation discrimination task with sinusoidal gratings. The human indices are suboptimal for both the fovea and periphery conditions and include within their 95% CIs the predicted indices for probability summation (green data symbols).
Figure D1
 
Integration indices of observers ASN, BW, and PLB for the orientation discrimination task with sinusoidal gratings. The human indices are suboptimal for both the fovea and periphery conditions and include within their 95% CIs the predicted indices for probability summation (green data symbols).
Table 1
 
Stimuli letter size (x-height) for each subject in units of degrees of visual angle.
Table 1
 
Stimuli letter size (x-height) for each subject in units of degrees of visual angle.
Subject Letter size
ASN 0.96°
BW 1.15°
PLB 1.11°
JS 1.18°
Table 2
 
Peak tuning frequencies (cycles/letter) and bandwidths (full-width at half-height in octaves) of the letter tuning functions obtained in Experiment 1.
Table 2
 
Peak tuning frequencies (cycles/letter) and bandwidths (full-width at half-height in octaves) of the letter tuning functions obtained in Experiment 1.
Subject f peak Bandwidth
Fovea 10° Fovea 10°
ASN 2.54 2.07 1.96 1.67
BW 2.99 2.23 2.09 1.72
PLB 2.46 2.29 2.14 1.62
JS 2.82 2.25 2.00 1.74
© 2008 ARVO
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×