To investigate the mechanisms underlying elongated spatial summation with a pattern-masking paradigm, we measured the contrast detection thresholds for elongated Gabor targets situated at 3° eccentricity to either the left or right of the fixation and elongated along an arc of the same radius to access homogeneous retinal sensitivity. The mask was a ring with a Gabor envelope of the same 3° center radius containing either a concentric (iso-orientation mask) or a radial (orthogonal mask) modulation. The task of the observer was to indicate whether the target in each trial was on the left or the right of the fixation. With orthogonal or low contrast iso-orientation masks, target thresholds first decreased with size with slope −1 on log-log coordinates until the target length reached 45′ (specified as the half-height full-width of the Gabor envelope) and then further decreased according to a slope of −1/2, the latter being the signature of an ideal summation process. When the contrast of the iso-orientation mask was sufficiently high, however, the target thresholds, while still showing a −1 slope up to ∼10′, asymptoted up to about 50′ length, suggesting that the presence of the mask eliminated the ideal summation regime. Beyond about 50′, the data approximated another −1 slope decrease in threshold, suggesting the existence of an extra-long channel that is not revealed by the conventional spatial summation paradigm. The full results could be explained by a divisive inhibition model, in which second-order filters sum responses across local oriented channels, combined with a single extra-long filter at least 300’ in extent. In this model, the local filter response is given by the linear excitation of the local channels raised to a power, and scaled by divisive inhibition from all channels in the neighborhood. With the high-contrast iso-orientation masks, such divisive inhibition swamps the response to eliminate the ideal summation regime until the stimulus is long enough to activate the extra-long filter.

*ideal summation*(Tanner & Jones, 1960; Tyler & Chen, 2000).

^{1}This optimal strategy can be implemented as a second-order filter that receives inputs from a set of discrete local sensors with independent and identically distributed noise sources (Chen & Tyler, 2001; Chen et al., 2019). The number of sensors that feed into the matched filter is proportional to the size of its receptive field (Green & Swets, 1966), and the response of the matched filter to the target is proportional to the sum of the responses to the individual samples. Thus, the mean and the variance of the matched filter response increase in proportion to target size. Because the detectability (or

*d*’ in the context of the signal detection theory) of a target depends on the ratio of the mean to the standard deviation of the response (Green & Swets, 1966), the threshold should thus decrease with the square root of target size, corresponding to a −1/2 log–log slope summation function. Some formulations thus simply use the square root of the number of local sensors as the dominator of a

*d*’ calculation (e.g., Kingdom et al., 2015).

*ideal attentional summation*(Tyler & Chen, 2000), because the attentional mechanism monitors only the channels relevant for a given stimulus size. Because the detectability is determined by the channel with the highest SNR, the probability of target detection is then governed by the maximum order statistics of the responses across relevant channels, for which the mean increases and standard deviation decreases with the number of monitored channels (Pelli, 1985; Chen & Tyler, 1999). The SNR in this scenario can be shown by computational simulation to increase initially with the quarter power of target size, with a progressively decreasing power for larger numbers of channels (Tyler & Chen, 2000, Figure 8).

^{2}. The luminance of the monitor at each output setting was measured with a Photo Research PR655 spectroradiometer (Photo Research, Inc., Chatsworth, CA), and the values were used to construct a linearized lookup table.

*r*and θ are the radius and the planar angle of a pixel in polar coordinates (as noted in Figure 1B);

*L*is the mean luminance of the display;

*c*is the contrast;

*f*is the spatial frequency of 2.5 c/°; σ

*and σ*

_{r}_{θ}are the scale parameters (standard deviations) of the Gaussian envelope along the radius and circumference, respectively; and the center of the Gabor ring,

*u*, was at −3° or +3° eccentricity for patterns presented on the left or right of screen, respectively. The parameters σ

*and σ*

_{r}_{θ}controlled the size of the stimuli. The value of σ

*was fixed at 0.14° subtense, and σ*

_{r}_{θ}varied from 0.5° to 40° along the circumference. This arrangement allowed the stimuli to remain at the same cortical magnification factor regardless of their length. For better comparison with results of the previous studies, from here on we specify the length of the targets by their half-height full-width (denoted HHFW in Figure 1A) of the Gaussian envelope, which is 2 × [−ln(0.5) × 2]^

^{0.5}×

*u*× σ

_{θ}= 7.06 × σ

_{θ}, where σ

_{θ}is its width parameter in radians and HHFW is in degrees of visual angle.

_{θ}to be infinite. The orientation of the mask was either the same as or orthogonal to the orientation of the target. There were six possible mask contrasts: 20 × log

_{10}(

*c*) = −∞, −26, −22, −18, −14, or −10 dB. Other than these properties, all other image parameters were the same as those of the target.

*SEM*). The sequence of test conditions and their repetitions were all randomized.

**∞**and −10 dB parallel masks with the generic three-component model of Tyler and Chen (2006) (also see Kao & Chen, 2012) that considers all three common types (slopes of −1, −1/2, and −1/4) of spatial summation (Figure 4). That is, the threshold,

*th*, is given by

*RMSE*) at 0.82 to 1.02 dB, comparable with the mean

*SEM*of 0.80 to 1.11 dB for the three observers. However, this model cannot fit the high-contrast mask data (open symbols in Figure 4). It underestimates the target thresholds at small target sizes and overestimates them at large sizes (Figure 4a). The RMSE increased to 1.68 to 2.05 dB (compare,

*SEM*= 1.22–1.61) or a 63% to 178% increase from the no-mask conditions. It is noteworthy that removing the −1/2 slope component from this generic model did not affect the fits, as the RMSE was the same up to the second decimal digit, suggesting that the spatial summation curves at high contrast do not have a significant contribution from the −1/2 slope component.

*RMSE*(1.32–1.84) fared somewhat better than in Equation 2. We then added the extra linear summation component at the large target length range:

*th*is defined in Equation 3. This model captured all features in the data and gave a better fit (

*RMSE*= 1.18–1.62) than Equation 3. Thus, we can conclude that at high mask contrasts, the summation curves mainly exhibit −1 slope for small lengths, flattening in the middle, and conforming to another −1 slope at large target lengths.

*d*’, which is the ratio between the signal and the noise level in standard deviation units, increases with the square root of the target length. As a result, the detection threshold decreases with target length with a slope of −1/2. The elimination of the −1/2 slope region for parallel masks suggests that the observer was no longer able to adopt this matched-filter strategy. This may occur when the high-contrast mask, as a highly salient stimulus determines the spatial extent of the filter used by the observers. Now, if the matched filter is a single receptive field over the whole extent of the mask configuration, the increase of target size would simply increase the overlap between the target and this receptive field, conforming to the case of complete summation as discussed in the Introduction. In this case, we would expect thresholds to decrease with target size with −1 slope over all target lengths in place of the −1/2 slope region. Our data did not show this effect.

*target*length increased. Such target-size–dependent inhibition would be sufficient to cancel out the ideal summation. Next, we explain these effects with a quantitative model that provides the data fits for Figures 2 to 6.

*j*has a spatial sensitivity profile

*f*(

_{j}*x*,

*y*). The excitation of this linear filter by the

*i*th image component

*C*(

_{i}g_{i}*x*,

*y*), where

*g*(

_{i}*x*,

*y*) defines the contrast independent spatial variation and

*i*indexes either the target or the mask in our experiment, is given as

_{θ}with the space constant for the filter,

*s*

_{θ}

*Se*is a constant depending on the spatial properties, such as spatial frequency or orientation, of both the filter and the target, which thus defines the excitatory sensitivity of the mechanism to the target, and

_{ji}*i*=

*t*or

*m*for target or mask, respectively.

*E*:

_{ji}*j*th local filter is given by the excitation of the

*j*th filter,

*E*, raised by a power

_{j}*p*, in which

*E*= Σ

_{j}*is the sum of the excitations produced by all image components, and is then divided by a divisive inhibition term,*

_{i}E_{ij}*I*, plus an additive constant,

_{j}*z*:

*I*is the summation of a nonlinear combination of the excitations of all relevant filters to filter

_{j}*j*. Thus, this divisive inhibition term,

*I*, can be represented as

_{j}*S*is the weight of the contributions from each local filter to the inhibition term. Here, we assume that the range of the inhibition sources is stimulus-size dependent.

_{j}_{i}*k*th second-order detector receiving inputs from

*n*local filters. The overall response of

_{k}*k*th second-order detectors,

*T*, is then simply

_{k}*T*

_{k,t}_{+}

*) and that to the mask alone (*

_{m}*T*

_{k}_{,}

*). The observer can detect the target if the difference between the response in at least one second-order detector is greater than the limitation imposed by the noise. That is, the decision is based on the second-order filter that has the maximum SNR. As shown in Chen et al. (2019), this second-order filter would generally have a receptive field covering the same area as the target. We can thus drop the subscript*

_{m}*k*and the contribution of the first-order filters, making the decision variable d based on the second-order filter:

*denotes the magnitude of the noise limiting this second-order filter.*

_{a}

_{a}^{2}. The variance of the noise experience by the second-order filter is then

*n*σ

_{a}^{2}, where

*n*is the number of local filters it monitors and in turn is proportional to the target size. Let

*n*=

*rS*, where

*r*is the scaling factor and

*S*is the target size. The decision variable based on the second-order filter is then:

*to be 1, as any non-unity constant here can be absorbed into other parameters. We also set*

_{a}*n*= 1 if

*r*×

*S*is smaller than 1.

*s*in Equation 1 and

_{j}*n*= 1 in Equation 7. Let

*d*, replacing

_{L}*d*in Equation 7, be the contribution of this large detector to the decision. The final decision variable is then:

*Se*to the target,

*Se*, to 100. In total, there were thus just eight free parameters in the model for the summation mechanism to account for the 60 datum points for one observer: excitatory sensitivity to the mask (

_{t}*Se*), inhibitory sensitivity of the local detector to the target and the mask (

_{m}*Si*and

_{t}*Si*, respectively, in Equation 8), the exponents and additive constant of the nonlinear response function (

_{m}*p*,

*q*, and

*z*, respectively, in Equation 7), the size of the local filter (σ

_{θ}in Equation 7), and the scale parameter (

*r*in Equation 11). In addition, for parallel masks only, there are five free parameters for the extra-long receptive field (

*Se*,

_{m}*Si*,

_{t}*Si*,

_{m}*z*, and σ

_{θ}).

*RMSE*across the curves was between 1.40 and 1.72 dB, comparable to the mean

*SEM*of 1.24 dB, and it explained 92% to 93% of all variance in the data across the five datasets. Table 1 summarizes the fitted parameter value and goodness of fit of each participant.

*I*in Equation 7, would be quite small and negligible compared to the additive constant

*z*. For very small targets, we can ignore the largest filter, so that

*n*is about 1. Thus, the target is at threshold if the decision variable

*d*= 1 or

*E*/

^{p}*z*= 1. Plugging Equation 5 into this relationship and rearranging the terms, the threshold

*C*(setting

_{t}*i*=

*t*in Equation 5) occurs when

*s*

_{θ}is small,

*C*is approximated by (

_{t}*z*

^{1/}

*/*

^{p}*Se*)

_{ji}*s*

_{θ}

^{–1}. This expression gives the −1 slope region of the summation function at small target sizes.

*n*is greater than 1, the threshold is then determined by Equation 11, such that the decision variable increases with the square root of target size. Correspondingly, the threshold decreases with the square root of target size, producing the −1/2 slope region of the summation curves.

*z*is negligible compared to the divisive inhibition

*I*, and the response

*R*of Equation 7 is approximately

_{j}*E*/

_{j}^{p}*I*, or with Equation 8 plugged in,

_{j}*I*in Equation 7) would be negligible compared to the additive constant (

_{j}*Z*in Equation 7) in the denominator. Thus, the response function and, in turn, the decision variable would be dominated by the excitation of the linear filters. Taking size into consideration, the decision variable (Equation 11) is proportional to

*n*

^{1/2}

*C*. The TvC function would have shown a dipper if

^{p}*p*was greater than 1. Also, such excitation increases with size (proportional to

*n*). We thus observed that the size of the dipper increased with target size. On the other hand, at high contrast, as shown in Equation 15, the response function is dominated by the ratio between the excitation and the divisive inhibition, and the decision variable is approximately invariant with target size. That is, the threshold at high pedestal contrast should be similar regardless target size. Thus, the threshold at the high pedestal contrast for the larger targets increased considerably to match the threshold level of the smaller targets. As a result, with a smaller dipper and a smaller masking, the TvC functions for smaller target were rather flat.

*Journal of Vision,*11(14), pii:14, doi:10.1167/11.14.14.

*Journal of Vision,*15(15):4, 1–12, https://doi.org/10.1167/15.15.4. [CrossRef] [PubMed]

*Journal of Physiology,*141(2), 337–350. [CrossRef]

*Communication in Statistics: Simulation and Computation,*28, 177–188. [CrossRef]

*Spatial Vision,*12, 267–285. [PubMed]

*Neuroreport,*12(4), 655–661, https://doi.org/10.1097/00001756-200103260-00008. [CrossRef] [PubMed]

*Vision Research,*46(17), 2691–2702, https://doi.org/10.1016/j.visres.2006.02.009. [CrossRef] [PubMed]

*Journal of Vision,*8(4):10, 1–14. [CrossRef] [PubMed]

*Journal of Vision,*19(9):11, 1–13, https://doi.org/10.1167/19.9.11. [CrossRef]

*The Proceedings of the Royal Society (London) Series B,*268, 509–516. [CrossRef]

*Frontiers in Psychology,*5, 456. [CrossRef] [PubMed]

*Journal of Neurophysiology,*71(1), 347–374. [CrossRef] [PubMed]

*Elemente der Psychophysik*. Leipzig: Breitkopf und Härtel.

*Journal of the Optical Society of America A,*11, 1710–1719. [CrossRef]

*Vision Research,*39, 3855–3872. [CrossRef] [PubMed]

*Vision Research,*40(10–12), 1217–1226. [PubMed]

*Journal of Neuroscience,*9(7), 2432–2442. [CrossRef]

*Signal detection theory and psychophysics*. New York: Wiley.

*The Journal of Physiology,*195(1), 215–243. [CrossRef] [PubMed]

*Vision Research,*62, 57–65. [CrossRef] [PubMed]

*Proceedings of the National Academy of Sciences, USA,*96(21), 12073–12078. [CrossRef]

*Journal of Vision,*15(5):1, 1–16, https://doi.org/10.1167/15.5.1. [CrossRef] [PubMed]

*Vision Research,*39, 2729–2737. [CrossRef] [PubMed]

*Journal of the Optical Society of America,*70, 1458–1470. [CrossRef] [PubMed]

*Vision: A computational investigation into the human representation and processing of visual information*. San Francisco, CA: W.H. Freeman and Company.

*Journal of Vision,*4, 930–943, https://doi.org/10.1167/4.10.8. [PubMed]

*Neuroscience,*514, 79–91. [CrossRef] [PubMed]

*Proceedings of the Royal Society B: Biological Sciences,*274(1627), 2891–900. [CrossRef]

*Journal of Vision,*12 (11):9, 1–28, https://doi.org/10.1167/12.11.9. [CrossRef]

*Vision Research,*26(11), 1793–1800. [CrossRef] [PubMed]

*Vision Research,*14, 1039–1042. [CrossRef] [PubMed]

*Journal of the Optical Society of America A,*2(9): 1508–32. [CrossRef]

*Vision Research,*39(5), 887–895. [CrossRef] [PubMed]

*Nature,*391(6667), 580–584, https://doi.org/10.1038/35372. [CrossRef] [PubMed]

*Numerical recipes in C*. Cambridge, UK: Cambridge University Press.

*Vision Research,*21, 409–418. [CrossRef] [PubMed]

*Proceedings of the National Academy of Sciences, USA,*45(1), 100–114. [CrossRef]

*Vision Research,*14(12), 1409–1420. [CrossRef] [PubMed]

*Visual search techniques*(NAS-NRC Publication No. 712, pp. 59–68). Washington, DC: National Academy of Science.

*SPIE Proceedings,*2179, 127–141.

*Vision Research,*40, 3121–3144. [CrossRef] [PubMed]

*Journal of Vision,*6, 1117–1125, https://doi.org/10.1167/6.10.11. [CrossRef] [PubMed]

*Nature,*302(5907), 419–422. [CrossRef] [PubMed]

*Extrapolation, interpolation and smoothing of stationary time series, with engineering applications.*Cambridge, MA: MIT Press.