Vision must analyze the retinal image over both small and large areas to represent fine-scale spatial details and extensive textures. The long-range neuronal convergence that this implies might lead us to expect that contrast sensitivity should improve markedly with the contrast area of the image. But this is at odds with the orthodox view that contrast sensitivity is determined merely by probability summation over local independent detectors. To address this puzzle, I aimed to assess the summation of luminance contrast without the confounding influence of area-dependent internal noise. I measured contrast detection thresholds for novel *Battenberg* stimuli that had identical overall dimensions (to clamp the aggregation of noise) but were constructed from either dense or sparse arrays of micro-patterns. The results unveiled a three-stage visual hierarchy of contrast summation involving (i) spatial filtering, (ii) long-range summation of coherent textures, and (iii) pooling across orthogonal textures. Linear summation over local energy detectors was spatially extensive (as much as 16 cycles) at Stage 2, but the resulting model is also consistent with earlier classical results of contrast summation (J. G. Robson & N. Graham, 1981), where co-aggregation of internal noise has obscured these long-range interactions.

*μ*m wide (0.2 deg). The human primary visual cortex contains visual neurons with receptive fields that are well suited to this scale of analysis, and these mechanisms (filter elements) are easily capable of resolving the blades of grass. But how might an entire sports field be represented? One possibility is that it is encoded by mechanisms that operate at a coarser scale of analysis, such that the sports field falls within a single large receptive field. An obvious method by which this could be achieved is through neuronal convergence, where higher-order texture mechanisms pool over many lower-order filter elements (Bergen & Adelson, 1988; Lennie, 1998; Motoyoshi, Nishida, Sharan, & Adelson, 2007; Pollen, Przybyszewski, Rubin, & Foote, 2002; Victor & Conte, 2005; Wilson & Wilkinson, 1998). In this scheme, the properties of the filter elements (that respond to luminance contrast) and their variation across the retina characterize the texture mechanisms, which could encode various attributes of the patterns including their form and regularity (Bergen & Adelson, 1988; Wilson & Wilkinson, 1998), their depth gradients (Meese & Holmes, 2004), and other parameters (Kingdom & Keeble, 1996; Li & Zaidi, 2000). However, the orthodox interpretation of typical psychophysical detection studies does not fit well with this idea. When contrast detection threshold is measured as a function of the area of a sine-wave grating, larger areas offer a benefit no greater than that predicted by probability summation (PS) among many small filter elements, each perturbed by independent additive noise (Anderson & Burr, 1991; Foley, Varadharajan, Koh, & Farias, 2007; Howell & Hess, 1978; Meese & Williams, 2000; Meese, Hess, & Williams, 2005; Robson & Graham, 1981) (Figure 1a). Thus, the standard psychophysical model of contrast detection makes no provision for detecting the sports field (large luminance contrast area)

*per se*; aggregation at threshold is limited to individual blades of grass (local luminance contrast elements).

*i*= 1 to

*n*independent detecting mechanisms as follows:

*r*

_{ i }

*is*the response of the

*i*th mechanism in the pool, resp

_{overall}is the observer's decision variable (Figure 1f), and the Minkowski exponent (

*mink*) is typically about 4 (Anderson & Burr, 1991; Bonneh & Sagi, 1998; Robson & Graham, 1981; Meese & Williams, 2000; Meese et al., 2005; Tyler & Chen, 2000). For a sine-wave grating, this predicts that contrast sensitivity should improve in proportion to the fourth-root of its area (

*area*). Another way of expressing the benefit of increasing stimulus area is as a summation ratio (SR), where SR =

^{1/4}*thresh*(

*area*/2)/

*thresh*(

*area*), or 20 times the log

_{10}of this when expressed in dB. The variable

*thresh*() is the contrast at detection threshold for each of two stimuli, one of which has twice the area of the other. Thus, in Figure 1, these stimuli would excite either

*n*/2 or

*n*input lines. Assuming equally sensitive input lines this gives an SR ∼1.5 dB (a factor of ∼1.2) for PS (and

*mink*= 4), consistent with human data (Bonneh & Sagi, 1998; Foley et al., 2007; Meese & Williams, 2000; Meese et al., 2005; Meese & Hess, 2007; Meese & Summers, 2007; Robson & Graham, 1981). This is markedly less than the 6 dB (factor of 2) prediction made by the linear summing model (Figure 1b), where the signals combine linearly against a background of fixed internal noise. Note that in this model (and the next), summation extends over a fixed number of signal lines, some of which might not always carry signal.

*summation*is a relative measure. Thus, the benefit of doubling the number of signals is not as great as in the linear summation model because performance is being compared with a more efficient (less noisy) starting point. The ideal model predicts SR = 3 dB (because it selectively sums both signal and noise), the same prediction as for the energy model. Thus, although linear summation (Figure 1b), energy measures (Figure 1c), and ideal summation (Figure 1d) might each be relevant to our interest in the aggregation of visual contrast texture, none appears consistent with the classical fourth-root empirical result at threshold.

*combination*model in Figure 1e. This involves the square-law transducer, as in the energy model, and a facility for selective pooling, as in the ideal summation model. The cascade of these effects means that the model predicts a fourth-root result (SR = 1.5 dB), exactly the same as the PS model (Meese & Summers, 2007). This offers a solution to the conundrum above: perhaps neuronal convergence for contrast detection does take place over area after all but offers a performance benefit at threshold no better than probability summation. Since the completely different architectures in Figures 1a and 1e make identical predictions, how might we tell them apart?

*Battenbergs*(see figure caption). This stimulus arrangement has two advantages over the conventional sine-wave grating. First, it allows the contrast area to be manipulated without changing the overall stimulus dimensions. This is a desirable property as I now explain. Let's assume that the observer does not have templates matched to these peculiar stimulus configurations (Meese & Summers, 2007; Näsänen, Kukkonen, & Rovamo, 1994). Let's also assume that summation is uniform over signal and gap regions for Battenbergs (

*j*≥ 1; Figure 3). It then follows that the signal-to-noise ratio (SNR) is given by:

*n*is the number of signal elements and

*C*is the signal contrast. From this, it is easy to show that contrast sensitivity increases with

*n*for any positive nonlinearity

*p*. Thus, even though the proposed strategy aggregates noise from gap regions that contain no signal, the system still benefits from aggregating over as large a stimulus area as possible. For the combination model, this strategy holds the overall variance of internal noise constant (the summation region is the same for each of the stimuli in Figure 3a), which means that it behaves exactly like the energy model (Figure 1c) and the predicted level of summation exceeds that for PS (Figures 1a and 1e). Of course, if the assumptions above are incorrect and the system is able to perform selective pooling by rejecting the gap regions from aggregation, then the experiment will have failed to achieve its aim and behavior will be indistinguishable from probability summation (Figures 1a and 1e).

*must*predict the same behavior for each stimulus type. The crucial point for the combination model (Figure 1e), as discussed above, is the

*assumption*that the model is able to operate in selective pooling mode for gratings (Summers & Meese, 2007) but not for Battenbergs. Finally, note that if the ideal summation model in Figure 1d were also unable to operate in selective pooling mode, it would revert to the linear summation model of Figure 1b.

^{2}. The experiments were controlled by a PC. Observers were the author (TSM) and four undergraduate optometry students (OS, KM, MN, and IM) who completed the study as part of their course requirement. The experiments were performed in a darkened room and with the aid of a chin and headrest at a viewing distance of 72 cm.

*j*= 0) and each of the patchy stimuli, thereby illustrating the benefit of filling the gaps in the patchy stimuli with additional micro-patterns. For example, when this was done for the first pair of “check” stimuli (

*j*= 1; threshold ∼6 dB), detection thresholds halved (

*j*= 0; threshold = 0 dB), consistent with full linear summation (Figure 1b). However, this does not imply that linear summation extends over the entire signal region; it could be that behavior is determined by summation only between neighboring micro-patterns. The spatial extent of summation can be assessed by grouping the micro-patterns in blocks of increasing size, as in the stimulus sequence in Figure 3a. By the time the central check region (

*j*) is 6 or 8 micro-patterns square (Figure 4, far right), the benefit of the extra micro-patterns in the “full” stimulus has declined but is still ≥3 dB. Thus, between

*j*= 1 and

*j*= 8, summation falls from perfectly linear to quadratic. This transition is discussed later, but assuming that aggregation is contiguous and because sensitivity to the full stimulus is at least 3 dB (√2) greater than it is to each of the other stimuli in Figure 3a, the implication is that quadratic summation extends over at least twice as many micro-patterns (Figure 1e) as the width (or height) of the largest cluster in the stimuli on the far right (

*j*= 8). That is, at least 16 (2

*j*) micro-patterns, or 16 grating cycles: much more extensive than the orthodox view of contrast detection (Anderson & Burr, 1991; Bonneh & Sagi, 1998; Carney et al., 2000; Foley et al., 2007; Meese et al., 2005; Robson & Graham, 1981; Rovamo, Luntinen, & Näsänen, 1993), although within the range indicated by cortical physiology (Pollen et al., 2002; Sclar, Maunsell, & Lennie, 1990; Von der Heydt, Peterhans, & Dursteler, 1992).

*mink*= 4 (4). Note that the combination of spatial filtering and boundary effects (within the stimulus) mean that both filter models predict more summation than expected from second (energy)- or fourth-power metrics applied to the raw contrast images (5) (compare thin and thick curves in Figure 4). Note also that for summation studies such as this one—where the analysis is restricted to just a single criterion level of performance—the energy model is identical to Minkowski summation with

*mink*= 2 (see The energy model is equivalent to Minkowski summation (with

*mink*= 2) at a single level of performance section).

*j*> 0 in Figure 3) was 1.88 dB (Figure 5c). This represents the average summation between the orthogonal signals and is very similar to previous estimates for superimposed stimulus pairs where components have differed widely in orientation (Carney et al., 2000; Georgeson & Shackleton, 1994) and spatial frequency (Carney et al., 2000; Graham & Nachmias, 1971). Plotting the results this way suggests a Minkowski exponent (

*mink*) of about 1.75 (thin curves, Figure 5c) for summation across orthogonal filter orientations.

*j*= 1) to about a factor of √2, consistent with long-range detection by the energy model (Figure 1c; Figure 4,

*j*= 8).

*n*(Meese & Summers, 2007) (Figure 1e). In general, this is a good strategy for detecting a range of stimulus sizes. For example, when the target is small, not only is spatial resolution preserved (i.e., the observer is able to access individual filter elements to help identify individual blades of grass) but the selectivity prevents the aggregation of noise from irrelevant filter elements that would degrade the signal-to-noise ratio. In contrast, when the target is large, the detection process benefits from the aggregation of local signals.

*n*), and the appropriate selection by the observer of the relevant summing mechanism. Each of these methods is easily achieved with the assumption of labeled lines (Watson & Robson, 1981). However, the observer does not always have the information necessary to make the appropriate selection—as in the case where stimulus conditions are interleaved from trial to trial—but this appears to have little affect on the form of area summation (Meese et al., 2005). One explanation for this involves nonlinear pooling (e.g., either fourth-root summation (Tyler & Chen, 2000) or a MAX operation (Summers & Meese, 2007)) over multiple long-range mechanisms that sample the continuum of

*n*. This type of scheme can be arranged to produce the summation behavior of selective pooling, even when the observer does not have complete knowledge of the stimulus (Tyler & Chen, 2000) (see also 1).

*mink*≈ 1.75; Figure 5c) is much too low for this. Most contemporary models of PS would put this around

*mink*= 4 (Tyler & Chen, 2000) (see also 8). Even older models that assume the summation exponent can be estimated directly from the slope of the psychometric function (Quick, 1974; Sachs, Nachmias, & Robson, 1971) would fail. For example, for TSM I have estimated the Weibull slope of the psychometric function for Battenbergs to be

*β*= 4.06, no different from those for gratings (Mayer & Tyler, 1986; Meese & Williams, 2000) and much higher than the

*mink*=

*β*= 1.75, that is needed.

*mink*= 1.75 (Figure 5c) represents further nonlinear contrast transduction prior to linear summation at Stage 3. However, there are problems with this since the model's limiting noise is at Stage 2, placing it before the additional transducer. According to Birdsall's theorem (Klein & Levi, 2009; Lasley & Cohn, 1981), early limiting noise will linearize the effects of subsequent nonlinearities on the

*d*′ psychometric function and computer simulations have shown that this diminishes the transducer's effect on summation (Meese & Summers, 2009).

*mink*= 2, close to the estimate from the results (Figure 5c). Why the estimate of

*mink*here is a little less than this (1.75) is not clear, but it might be expected if a further source of additive (though not dominant) noise were to be injected at a later stage. Furthermore, whether pooling at Stage 3 is an explicit part of a higher-order visual code (e.g., for complex “gingham” textures) or represents a more general-purpose decision strategy by the observer is also unclear. Nonetheless, whatever the details of Stage 3, the experiments and analyses here suggest a strict sequence to the summation stages (Figure 6) owing to the relative sizes of the effects. Short-range summation (Stage 1 filtering; ∼6 dB) precedes long-range summation (Stage 2 aggregation; ∼3 dB), and this precedes summation across filter groups (Stage 3 pooling; ∼1.9 dB).

*d*′ slope of unity) (Klein & Levi, 2009; Lasley & Cohn, 1981). So how could the accelerating contrast nonlinearity—central to the model here—seemingly out-maneuver Birdsall's theorem? There are several possibilities (Klein & Levi, 2009; Lu & Dosher, 2008), but one is “distraction.” The psychophysical performance of a distracted observer is similar to an uncertain observer—each produce steep psychometric functions (Kontsevich & Tyler, 1999). Thus, the linearizing effects of Birdsall's theorem would not be observed if the external noise pattern also served to distract the observer's attention away from the relevant target mechanisms. In other words, the usual objection to nonlinear contrast transduction need not apply if the observer is distracted by external noise. See also Burgess and Colborne (1988) and more recent papers by Lu and Dosher (2008) and Klein and Levi (2009).

*j*> 0 (Figure 3). One view of second-order spatial vision is that second-order mechanisms pool information from first-order mechanisms (Henning, Hertz, & Broadbent, 1975). In fact, the contrast mechanisms that we propose can be thought of exactly this way: they are second-order mechanisms sensitive to the DC (0 c/deg) component of contrast modulation. But could it be that other second-order mechanisms were used to detect the contrast

*boundaries*in our Battenbergs? We cannot rule this out, although the low sensitivity to second-order modulation found in other experiments makes this seem unlikely (Schofield & Georgeson, 1999, 2003). Furthermore, if it were the contrast boundaries that were detected in our Battenbergs, this could not apply to the full stimulus (

*j*= 0) where there are no contrast boundaries. Thus, the high level of summation that we find in our “gaps” experiment would still need to be explained.

*Battenberg*stimuli developed here (Figure 3) provide a new method by which neuronal convergence can be assessed in human vision. Experiments using these stimuli shed a very different light on the processes of early spatial vision compared with the orthodox interpretation established in the early 1980s (Graham & Nachmias, 1971; Graham, Robson, & Nachmias, 1978; Robson & Graham, 1981; Sachs et al., 1971). In those studies, the performance benefit achieved by either increasing the area of a grating or adding further gratings at different orientations or spatial frequencies was attributed to a single unselective (dumb) process of PS. The work presented here suggests a more strategic arrangement: the long-range process of area summation is more potent than once thought, consistent with a contrast transducer exponent of 2 and energy detection within—but not between—spatial textures. Nonetheless, when the aggregation of internal noise covaries with stimulus size—as I propose it does in conventional studies of area summation—the degree of area summation drops to more modest levels (Figure 1e), typical of those usually attributed to PS. The mechanistic hierarchy proposed here (Figure 6) offers valuable insights into the missing link between early sensory processes of contrast vision and later stages in which substantial neuronal convergence is needed to represent larger shapes, forms, structures, and surfaces. Therefore, understanding the details of stimulus selection by the long-range summation at Stage 2 is a top priority for future research.

*σ*= 1, and let the signal strengths (Michelson contrasts) on each signal line be

*C*. I consider summation ratios (SR) where the sensitivity of each of the input lines is equal and where the number of signals is doubled from

*n*/2 to

*n*for any even

*n*. I denote the contrasts in these cases as

*C*

_{half}and

*C*

_{full}, respectively, and refer to the conditions as the “half” and “full” conditions. The signal-to-noise ratio (SNR) at a given output is

*signal*/

*σ*

_{ tot }, where

*signal*is the response to

*n*/2 or

*n*signals as appropriate and

*σ*

_{tot}is the standard deviation of the noise on that output line. Without losing generality we can set the criterion SNR required for detection to unity. Thus, the summation ratio (SR) is given by:

*C*

_{half}/

*C*

_{full}, for SNR = 1. Note also, because noise variances sum, the standard deviation of the sum of

*n*noise sources is √(

*n*).

*β*. For probability summation (PS), this is equal to the exponent (

*mink*) in Minkowski Summation (Quick, 1974) (Equation 1 main body). Typical empirical estimates give

*β*≈ 4 in contrast detection studies (e.g., see Meese & Williams, 2000), which results in a fourth-root rule where SR = 2

^{∼1/4}≈ 1.2, or 1.5 dB. This is also equivalent to the vector summation model of Quick (1974).

*nC*

_{full}/√(

*n*) = 1, which rearranges to give:

*C*

_{full}= 1/√(

*n*). For the half condition we have:

*nC*

_{half}/2√(

*n*) = 1, which rearranges to give:

*C*

_{half}= 2/√(

*n*). Thus, SR =

*C*

_{half}/

*C*

_{full}= 2 = 6 dB.

*nC*

_{full}

^{2}/√(

*n*) = 1, which rearranges to give:

*C*

_{full}= 1/

*n*

^{1/4}. For the half condition, we have:

*nC*

_{half}

^{2}/2√(

*n*) = 1, which rearranges to give:

*C*

_{half}= √(2)/

*n*

^{1/4}. Thus, SR =

*C*

_{half}/

*C*

_{full}= √2 = 3 dB.

*nC*

_{full}/√(

*n*) = 1, which rearranges to give:

*C*

_{full}= 1/√(

*n*). For the half condition, we have:

*nC*

_{half}/2√(

*n*/2) = 1, which rearranges to give:

*C*

_{half}= √(2)/√(

*n*). Thus, SR =

*C*

_{half}/

*C*

_{full}= √(2) = 3 dB. This is the summation expected for an ideal observer and is identical to that predicted by the energy model. Strictly speaking, the ideal observer knows the stimulus exactly on each trial and detects it using a linear matched filter, a strategy depicted by Figure 1d and described above. However, as Tyler and Chen (2000) point out (pp. 3130–3131), the same level of summation (though not the same overall sensitivity) might be expected in less specific situations. Suppose that the observer has several mechanisms that sample the size (diameter) dimension of the target gratings but does not know the size of the target to be presented on each trial. In that case, if each mechanism is weighted by the reciprocal of its expected noise level then fourth-root (Minkowski) summation over the set of mechanisms will also produce the behavior of ideal summation (Tyler & Chen, 2000). Thus, the same summation ratio (3 dB) might be expected in the ideal framework (Figure 1d) regardless of whether the experimental trials are blocked or interleaved across stimulus conditions.

*nC*

_{full}

^{2}/√(

*n*) = 1, which rearranges to give:

*C*

_{full}= 1/

*n*

^{1/4}. For the half condition, we have:

*nC*

_{half}

^{2}/2√(

*n*/2) = 1, which rearranges to give:

*C*

_{half}= 2

^{1/4}/

*n*

^{1/4}. Thus, SR =

*C*

_{half}/

*C*

_{full}= 2

^{1/4}= 1.2 = 1.5 dB. This is identical to the PS model with

*mink*= 4. Following similar analysis to Tyler and Chen above, it can be shown that this level of summation is to be expected (under reasonable assumptions) for both blocked and interleaved experimental designs (Summers & Meese, 2007).

*x*coordinate) and 0.5 dB per cycle in the vertical meridian (

*y*coordinate). The attenuated image was then filtered by a pair of quadrature

^{1}log-Gabor filters (see 3), with spatial frequency bandwidth of 1.6 octaves (full width at half-height) and orientation bandwidth of ±25° (half-width at half-height). The filters were matched to the spatial frequency (2.5 c/deg) and orientation (horizontal) of the micro-patterns in the “gaps” experiment and their outputs (2D arrays: Hsfilt and Hcfilt) were full-wave rectified. With this formulation, the basic filter responses did not represent a response to particular stimulus contrast, but the spatial distribution of responses of the filter elements (convolution kernels) across space for a particular stimulus. The contrast response was then derived by multiplying this pattern of filter responses by the Michelson contrast (0:1) of the stimulus. (This describes Stage 1 of the model in Figure 6.)

*n*) for each image, where

*n*is the number of mechanisms (pixels) in the stimulus. Thus, the signal-to-noise ratio (SNR) for the summation process is given by:

*C*

_{stim}is the Michelson contrast of the stimulus (in the range 0 to 1), and Hsfilt

_{ i }and Hcfilt

_{ i }are the sine- and cosine-phase filter responses at the

*i*th of

*n*pixels in the image, scaled to a response range of −1 to 1 for all stimuli. Assuming a criterion signal-to-noise ratio (SNR) of unity at detection threshold (this is arbitrary), it follows that:

*C*

_{thresh}gives:

*C*

_{thresh}are arbitrary. These arbitrary units cancel when the model predictions are normalized by the response to the full stimulus (

*j*= 0 in Figure 3) to calculate a summation ratio. Note also that for the stimuli used here, the noise level in the model was constant and therefore irrelevant to the calculation of the model summation ratios. This describes the output of the summation box for the horizontal filter at Stage 2 in Figure 6.

*σ*is the standard deviation of independent zero-mean Gaussian noise added to the output of each contrast-response mechanism after squaring. At the same criterion level of performance as above, Equation C6 rearranges to give:

*k*is the standard deviation of the total noise. This is absolved by an implicit contrast gain parameter in each model. Thus, there is an equivalence between energy summation (Equation C7) and Minkowski summation (Equation C5) when

*mink*= 2. Note that this holds only for a single level of performance owing to the outer exponent (1/

*mink*) in Equation C4. Put another way, the slope of the psychometric function (performance as a function of signal strength) is different for the two formulations. See also Meese and Summers (2009).

*f*,

*θ*) are polar coordinates in the Fourier plane (spatial frequency [in c/deg] and orientation [in degrees]). The first term on the right is defined as:

*f*

_{0}and

*θ*

_{0}are the preferred spatial frequency and orientation of the filter and

*ω*is the filter's spatial frequency bandwidth (full-width at half-height, in octaves). This is a Gaussian function on a log spatial frequency axis.

*h*is the filter's orientation bandwidth (half-width at half-height, in degrees).

*f*

_{0},

*θ*

_{0},

*ω*, and

*h*. The phase of the filters is set directly in the phase spectrum of the Fourier domain.

*polar separable*log Gabor filters that have been described elsewhere: http://www.csse.uwa.edu.au/~pk/research/matlabfns/PhaseCongruency/Docs/convexpl.html.

*mink*= 4 was used as a model of spatial probability summation (PS). Following the retinal inhomogeneity and spatial filtering described in 3, this gives:

*C*

_{thresh}we have:

*p*= 2 and 4, respectively) are calculated by summing over the

*i*= 1 to

*m*image pixels thus:

*L*

_{ i }is local pixel intensity and

*L*

_{0}is the mean pixel intensity across the image. For convenience, these metrics are normalized such that Metric(

*p*) = 1, at detection threshold. This is equivalent to:

*C*

_{thresh}is the Michelson contrast at detection threshold, and image is the luminance profile of the stimulus scaled to the range: −1:1. Solving for

*C*

_{thresh}gives:

*p*). It is equivalent to spatial Minkowski summation across raw local contrasts, where

*p*=

*mink*.

*mink*= 2 for energy (see The energy model is equivalent to Minkowski summation (with

*mink*= 2) at a single level of performance section) and

*mink*= 4 for PS (1).

*mink*in the Discussion section of the main report. Using the nomenclature above, the system response (resp) for Minkowski summation between vertical and horizontal energy filters following long-range spatial summation is given by:

*C*

_{thresh}we have:

*mink*, which is the only free parameter. This describes the output of Stage 3 in Figure 6, where the model is operating in mandatory summation mode for each of the filter orientations.

*mink*in Equations F1 and F3. In Equation F1,

*mink*operates on the contrast response of each filter element, whereas in Equation F3, the exponent applied to each filter element is set to 2 (consistent with energy detection), and

*mink*operates on the responses at each orientation after long-range summation within each of those orientation bands.

*mink*). (This model involves only one stage of indiscriminate pooling beyond the initial filtering.) To illustrate the point, I began by arbitrarily halving the bandwidths (to 0.8 octaves and ±12.5°) and optimizing the exponent by fitting to the data from both experiments. This gave an exponent value of 2.9. With this arrangement, the model performed well and the results are shown in Figure G2b and Table G1 (Line 1).

Line | Filter derivation and extra parameters | SF B/W (octaves) | Orient B/W (±deg) | Transducer exponent (mink in Equation F1 and fixed at 2 in Equation F3) | RMS error (dB) |
---|---|---|---|---|---|

1 | Equation F1 and bandwidths = half those in main report | 0.8 | 12.5 | 2.9 | 0.383 |

2 | Optimized fit using Equation F1 | 0.4 | 23 | 2.9 | 0.350 |

3 | “ | 0.5 | 15 | 2.9 | 0.338 |

4 | “ | 0.6 | 12 | 2.9 | 0.336 |

5 | “ | 0.7 | 11 | 2.9 | 0.337 |

6 | “ | 0.8 | 10 | 2.9 | 0.337 |

7 | “ | 1.0 | 9 | 2.9 | 0.339 |

8 | “ | 1.5 | 8 | 2.9 | 0.342 |

9 | “ | 5.9 | 7 | 2.9 | 0.351 |

10 | Foley et al. narrow | 1.5 | 16 | — | — |

11 | Foley et al. broad | 4.9 | 30 | — | — |

12 | Foley et al. mid (Equation F3; mink = 1.75) | 2.1 | 22 | 2 | 0.498 |

13 | Main model in main report (Equation F3; mink = 1.75) | 1.6 | 25 | 2 | 0.499 |

14 | Optimized fit using Equation F3 (mink = 1.75) | 1.4 | 20 | 2 | 0.388 |

*mink*) in steps of 0.1. Local minima (nearest neighbor comparisons) in the three-dimensional error surface (expressed as the RMS error in dB) are shown in Table G1 (Lines 2–9). The best exponent always had a value of

*mink*= 2.9. Essentially, Table G1 (Lines 2–9) describes a contour through spatial frequency/orientation space where the quality of the fit is almost constant. Put another way, Table G1 implies that a fairly extensive receptive field is needed to produce the level of spatial summation found in the experiments when only a single stage of pooling is used after spatial filtering (i.e., Equation G13 instead of Equation G15). This can be achieved by making the filter element long and thin (narrow orientation bandwidth), short and fat (narrow spatial frequency bandwidth), or anything along a continuum of combinations between the two. The nonlinear exponent must also be set higher than in the energy model but lower than for fourth-root summation (i.e.,

*mink*= 2.9).

*mink*= 2.9 an unlikely candidate.

*mink*≈ 2. This theoretical super PS occurs when the target region is doubled so as to fill the attention window—in the situation here, the set of Stage 1 filter elements monitored by the observer. In principle, this could be responsible for the long-range summation in the “gaps” experiment where the observer's attention window might have been matched to the full stimulus. However, studies that have measured the slope of the psychometric function (Meese & Summers, 2009; Meese & Williams, 2000; Robson & Graham, 1981) have consistently found it to be far too steep to be consistent with super PS, which requires a linear transducer and

*d*′ psychometric slope of unity (equivalent to a Weibull slope parameter

*β*∼ 1.3). A steep empirical slope might be attributed to uncertainty (Pelli, 1985) rather than a nonlinear transducer, but in either case, super PS would be abolished (Meese & Summers, 2009; Tyler & Chen, 2000). Alternatively, the steep slope of the psychometric function is consistent with a second-power (i.e., square-law) contrast transducer (plus a little uncertainty; Meese & Summers, 2009) such as that proposed here and elsewhere (Graham & Sutter, 1998; Klein & Levi, 2009; Lu & Dosher, 2008; Manahilov, Simpson, & McUlloch, 2001). In sum, it seems unlikely that the summation results in the “gaps” experiment here can be attributed to PS of the super PS variety or any other.