Most contemporary models of spatial vision include a cross-oriented route to suppression (masking from a broadly tuned inhibitory pool), which is most potent at low spatial and high temporal frequencies (T. S. Meese & D. J. Holmes, 2007). The influence of this pathway can elevate orientation-masking functions without exciting the target mechanism, and because early psychophysical estimates of filter bandwidth did not accommodate this, it is likely that they have been overestimated for this corner of stimulus space. Here we show that a transient 40% contrast mask causes substantial binocular threshold elevation for a transient vertical target, and this declines from a mask orientation of 0° to about 40° (indicating tuning), and then more gently to 90°, where it remains at a factor of ∼4. We also confirm that cross-orientation masking is diminished or abolished at high spatial frequencies and for sustained temporal modulation. We fitted a simple model of pedestal masking and cross-orientation suppression (XOS) to our data and those of G. C. Phillips and H. R. Wilson (1984) and found the dependency of orientation bandwidth on spatial frequency to be much less than previously supposed. An extension of our linear spatial pooling model of contrast gain control and dilution masking (T. S. Meese & R. J. Summers, 2007) is also shown to be consistent with our results using filter bandwidths of ±20°. Both models include tightly and broadly tuned components of divisive suppression. More generally, because XOS and/or dilution masking can affect the shape of orientation-masking curves, we caution that variations in bandwidth estimates might reflect variations in processes that have nothing to do with filter bandwidth.

*within-channel*masking. In what follows, we discuss this in terms of response compression, but the arguments apply equally to the multiplicative noise model.

*spatial frequency*bandwidths tend to decrease with preferred spatial frequency (Baker, Thompson, Krug, Smyth, & Tolhurst, 1998; DeValois, Albrecht, & Thorell, 1982; Tolhurst & Thompson, 1981; Yu et al., 2010).

*d*′ slope of unity) owing to the approximately linearizing effect this has for small signal increments (Foley & Legge, 1981; Meese, Georgeson, & Baker, 2006). Overall then, an explanation of XOM in terms of within-channel masking within isotropic (or similar) detecting mechanisms seems unlikely.

*suppression model,*but sometimes, the

*contrast gain control model*) can accommodate the failings of the within-channel model and was suggested by Foley (1994). In this model, the response of the detecting mechanism (

*resp*) is an accelerating function of its input (

*C*

_{ t }

^{ p }, with exponent

*p*typically ≥2). This is divided by the sum of a string of weighted terms: a constant (

*z*), a copy of the excitatory term raised to an exponent (

*q*), and a contribution from

*i*= 1 to

*λ*mask components (with contrast

*M*) outside the excitatory passband, also raised to the exponent

*q,*where (

*p*− 1) <

*q*<

*p*. Thus, a typical functional form of the suppression model is

*M*-type terms allow the model to accommodate the empirical results reviewed above. For example, a mask can cause masking without facilitation through this pathway, and it can elevate the region of facilitation produced by a pedestal (Foley, 1994). This pathway also leaves the shape of the psychometric function (largely) intact. This is because suppression from this pathway is equivalent merely to increasing the size of the saturation constant

*z,*meaning the contrast response at detection threshold remains an accelerating one (determined by

*p*) and the psychometric function remains steep.

*cross-channel*masking model. Broadly speaking, this model is also consistent with single-cell work in the primary visual cortex, which finds cross-orientation inhibition between orthogonal gratings and bars (Bonds, 1989; Carandini, Heeger, & Movshon, 1997; Morrone, Burr, & Maffei, 1982; Ramoa, Shadlen, Skottun, & Freeman, 1986; Tolhurst & Heeger, 1997).

*plus*a contribution from broadly tuned suppression. That would involve an equation for sigmoidal transduction divided by a broadly tuned suppressive term. Instead, Equation 1 places the broadly tuned suppression on the denominator of the equation along with self-suppression. Thus, it actually describes a purely accelerating contrast transducer that is susceptible to a variety of suppressive interactions.

*before*the orientation tuning of the detecting mechanism. We refer to this as a

*pre-channel*model of masking. One such model involves a general suppressive field in the retina (Shapley & Victor, 1978; Solomon, Lee, & Sun, 2006) or LGN (Alitto & Usrey, 2008; Bonin, Mante, & Carandini, 2005; Nolt, Kumbhani, & Palmer, 2007; Webb, Dhruv, Solomon, Tailby, & Lennie, 2005), which can also accommodate interactions that lie outside the passband or footprint of the subcortical classical receptive field (Meese & Hess, 2004). This might be characterized by an equation with a form similar to Equation 1 but implemented before orientation (cortical) filtering.

^{2}.

_{10}(

*C*%), where

*C*% is Michelson contrast in percent, given by 100(

*L*

_{max}−

*L*

_{min})/(

*L*

_{max}+

*L*

_{min}), where

*L*is luminance. The mask contrast was either 0% (baseline) or 40%. Threshold elevation was calculated by subtracting the baseline threshold from the masked threshold, expressed in dB.

*SE*) of the five threshold estimates.

*G*) for the orientation tuning of the excitatory mechanism. Specifically

*diff*is the unsigned difference between the orientations of the mask and target (in degrees) and

*h*is the half-width at half-height of the excitatory tuning function (in degrees). We also assume that there is a detecting mechanism whose preferred orientation is matched to the target stimulus. Thus, the excitatory response (

*E*) is given by

*C*

_{t}and

*M*are the contrasts (in percent) of target and mask components, respectively, and

*p*is an expansive exponent.

*H*as the half-width at half-height of the suppression function. Our results did not provide strong constraints on

*H,*but preliminary analysis showed that good fits were possible for our data and those of Phillips and Wilson (1984) using

*H*= ±65°, and this is what we used for the main analysis.

*I*(in Equation 2) is given by

*q*is an expansive exponent,

*γ*is the weight of self-suppression, and

*w*is the weight of the nonexcitatory route to suppression. Following earlier work (Legge & Foley, 1980), we set

*p*= 2.4 and

*q*= 2.0. The fact that the inhibitory terms are summed before being raised to the power

*q*means that this arrangement is similar to Foley's model 2.

*C*

_{t}to determine predictions for which

*k*is the model criterion for detection, consistent with a late source of additive noise.

*C*

_{t}was calculated twice for each orientation, once with the mask contrast (

*M*) set to 40% (

*C*

_{mask}) and once with the mask contrast (

*M*) set to 0% (

*C*

_{no}

_{mask}) to predict threshold elevation, given by 20log

_{10}(

*C*

_{mask}/

*C*

_{no mask}). There were four free parameters (

*h, w, γ,*and

*k*) for which values were determined by the simplex algorithm. The algorithm was started from several different sets of initial conditions and the reported values are those for which the RMS error of the fit was lowest. In most cases, the different starting conditions produced very similar results.

*w*) decreases with spatial frequency, consistent with previous analysis (Meese & Holmes, 2007). Note also that in Figures 3a–3d we show the threshold elevations anticipated by the model for the broader range of mask orientations not considered by Phillips and Wilson (1984).

Obs | SF | RMS error (dB) | γ | w | k | ±h° |
---|---|---|---|---|---|---|

DJH | 1 | 0.70 | 6.21 | 0.63 | 0.02 | 18.22 |

DJH | 3 | 0.78 | 1.31 | 0.26 | 0.82 | 15.98 |

TSM | 1 | 0.63 | 2.24 | 0.48 | 0.24 | 29.13 |

TSM | 3 | 0.49 | 0.69 | 0.18 | 1.76 | 26.32 |

SL | 0.5 | 0.56 | 1.45 | 0.46 | 0.46 | 25.35 |

SL | 2 | 1.07 | 1.00 | 0.09 | 1.40 | 17.48 |

SL | 8 | 1.19 | 0.49 | 0.06 | 0.57 | 21.70 |

WS | 0.5 | 0.56 | 2.72 | 0.37 | 0.12 | 16.37 |

WS | 2 | 0.68 | 2.57 | 0.14 | 0.15 | 15.00 |

WS | 8 | 1.20 | 0.53 | 0.06 | 0.52 | 20.80 |

*h*) estimated for the observers from Phillips and Wilson's (1984) study (SL and WS). The original study (open symbols; their Figure 4) reported that bandwidth decreases with spatial frequency. In general, this trend was not found here (filled symbols), which tended to be narrower than previous estimates. These results are replotted in Figure 5 (triangles and diamonds), along with those for TSM and DJH (circles and squares). On average, orientation bandwidth is about ±20° (mean = ±20.36°) for a four-octave range of spatial frequencies (0.5 to 8 c/deg; open circles in Figure 5). We do note, however, that although the (interpolated) average across the four observers (two from each study) shows no effect of spatial frequency, there is a slight downward trend for each observer across the lowest two spatial frequencies (for only one of these observers is the decline greater than 3°), suggesting that a weak effect might exist.

*w*= 0) and for mask orientations restricted to the range 0–50° (similar to Phillips & Wilson's, 1984 range). This increased the estimate of orientation bandwidth markedly in all cases (by an average half-width of 9°) as shown in Figure 6. For the two observers from Phillips and Wilson's study (SL and WS), there is a downward trend of these differences with spatial frequency, consistent with the hypothesis that the spatial frequency dependency of filter bandwidth in Phillips and Wilson's study is, in fact, attributable to a spatial frequency dependency of XOS (Meese & Holmes, 2007). However, the opposite trend is seen for the two observers from the present study. We have no explanation for this difference across studies. However, we suggest that it is probably unwise to attach too much significance to the details of simple functional model fits when the model is clearly wrong (i.e., it contains no XOS). The main message from Figure 6 is that the exclusion of XOS is likely to lead to filter bandwidths being overestimated.

*H*= ±65°, again consistent with the earlier analysis. There were four free parameters (the first three of which relate to the earlier model) as follows: (i) a sensitivity parameter

*k,*(ii) the weight of the broadly tuned component of suppression (

*w*), (iii) the weight of self-suppression

*γ,*and (iv) a parameter,

*α,*that controlled the spatial extent of the Gaussian summation template. The first two parameters were optimized; the second two were set by hand. Orientation pooling was performed using Minkowski summation of response differences to mask and mask-plus-target with an exponent of 2, broadly consistent with previous results (Meese, 2010).

*α*= 1), the dashed blue curves are for when it was slightly larger (

*α*= 1.5). The first point here is that our earlier analysis and claims are not undermined when the model is extended to include spatial summation and multiple spatial filters. For example, filter bandwidths of

*h*= ±20° are consistent with all the results. Furthermore, because the model pools over 18 filter orientations, the analysis is not undermined by the strategy of off-channel looking (Blake & Holopigian, 1985).

Obs | SF | RMS error (dB) | γ | w | k | ±h° |
---|---|---|---|---|---|---|

α = 1 | ||||||

DJH | 1 | 0.758 | 100/ T | 0.61/T ^{1/q } | 0.008T | 20 |

DJH | 3 | 0.803 | 10/ T | 0.22/T ^{1/q } | 0.040T | 20 |

TSM | 1 | 0.856 | 10/ T | 0.44/T ^{1/q } | 0.098T | 20 |

TSM | 3 | 0.473 | 10/ T | 0.21/T ^{1/q } | 0.035T | 20 |

α = 1.5 | ||||||

DJH | 1 | 0.949 | 100/ T | 0.42/T ^{1/q } | 0.008T | 20 |

DJH | 3 | 1.305 | 10/ T | 0.14/T ^{1/q } | 0.019T | 20 |

TSM | 1 | 0.398 | 10/ T | 0.23/T ^{1/q } | 0.054T | 20 |

TSM | 3 | 0.708 | 10/ T | 0.13/T ^{1/q } | 0.004T | 20 |

*α*> 1.0). For the other thee data sets, the fits with

*α*= 1.0 were slightly better than those with

*α*= 1.5, though optimum fits probably lie in between (Table 2).

*α*more directly, we reran the model with fixed parameters,

*h*= ±20° and three different values of

*α*(Figure 8). To test our functional model's susceptibility to variations in

*α*(which was not part of that model), we then fitted our earlier model to the filter model curves in Figure 8 with the four free parameters as before (RMS error was always <0.36 dB; fits not shown). The bandwidths estimated from this fitting are termed

*h*′ and shown above each curve in Figure 8. The message is clear: compared to the filter model, the functional model tends to slightly overestimate bandwidth, and this error is worse when the spatial template exceeds the size of the target (

*α*> 1). Furthermore, forcing

*h*′ = ±20° (the generating bandwidth for the curves in Figure 8) produced a good fit to the curve in Figure 8 for

*α*= 1, but the fits were unacceptable for the other two curves where

*α*= 1.5 and

*α*= 2 (fits not shown). Thus, if real observers use oversized spatial templates and their results are analyzed with a simple functional model, then it seems likely that their filter bandwidths will be overestimated.

*h*= ±30°), though it was based on descriptions of the threshold elevation functions rather than fitting a model. As mentioned in the Introduction section, it is to be expected that this will lead to broader estimates owing to the compressive nature of the psychophysical contrast response (Daugman, 1984; Phillips & Wilson, 1984). Our own estimates (±20°) are consistent with a recent estimate (±19°) that used a reverse-correlation technique (Roeber, Wong, & Freeman, 2008) and a spatial frequency of 2 c/deg. Essock, Haun, and Kim (2009) also measured masking functions using filtered noise masks and reported filter bandwidths of about ±20° at 8 c/deg. Campbell and Kulikowski (1966), Daugman (1984), and Harvey and Doan (1990) all used sine-wave gratings as masks and targets. Respectively, they found orientation bandwidths of ±13.6° at 8 c/deg, about ±12° to ±36° (depending on orientation) at 8 c/deg, and ±12° at 10 c/deg. It is unclear why the estimates from this last group of studies are so much narrower than the others. Overall then, there remains some uncertainty over the issue of orientation bandwidth at the higher spatial frequencies (8 c/deg and above).

*H*= ±65°), which always produced (slightly) better fits than a flat isotropic process in that model (not shown). Similarly, Foley's (1994) broadly tuned suppression function declines gently with orientation difference between target and mask, though his function actually peaks at a 10° difference between the two. Watson and Solomon's (1997) suppression function is also a broadly tuned Gaussian, rather than a purely isotropic function. Nevertheless, these nuances represent minor detail variations across studies; it seems likely that our broad effect is of a similar origin to that of the effects described as isotropic in other studies.

*between*orientation-tuned mechanisms (Blakemore, Carpenter, & Georgeson, 1970; Carpenter & Blakemore, 1973; Heeger, 1992), which represents a third possible route to masking. We did not include it in our models here because (i) good fits to the data were achieved without it and (ii) it would introduce further free parameters. However, further modeling (not shown) confirmed that when such interactions were included, our estimates of orientation bandwidths (

*h*) changed, typically increasing a little.

*can*suppress the detection of foveal contrast increments above threshold (Foley & Chen, 1999; Foley, 1994; Meese, 2004; Meese, Hess, & Williams, 2005). Thus, whatever the details of this interaction (e.g., see Meese et al., 2005), it is a potential pathway to suppression for the experiments here.

*dilution masking*(Meese & Summers, 2007). This is a theoretical process that arises when excitation on the numerator of the contrast gain control equation extends to stimulus regions (in real space or Fourier space) that contain mask contrast but little or no target contrast and can result in masking (see Meese & Summers, 2007). As the relevant and irrelevant numerator terms each pass through their own accelerating contrast nonlinearity before summation, dilution masking is formally distinct from conventional within-channel masking (Meese & Summers, 2007). It is also distinct from cross-channel and pre-channel masking (see Introduction section) and surround suppression. The conditions needed for dilution masking could arise if stimulus summation extended into the surrounding mask region. In fact, this was an explicit feature of our filter-based model, determined by the parameter

*α*(1). Whether this has been mistaken for pure surround suppression in other studies is not clear. However, in any case, our analysis (Figure 8) shows a relation between the effect of dilution masking and mask orientation (i.e., if dilution masking were involved in our (and other) experiments, then it would influence the shape of orientation-masking functions).

*h*) with some caution. Performing experiments with masks and targets each of a similar size to the underlying receptive fields might avoid the problems associated with surround contrast but (i) the potentially complicating factor from tightly tuned cortical interactions would remain, (ii) dilution masking might still take place in the Fourier domain, and (iii) reducing the size of the mask would increase its bandwidth, thereby blurring the precision of the probe. Furthermore, single-cell studies in cats and monkeys imply that human vision contains mechanisms with a range of orientation bandwidths (DeValois, Albrecht et al., 1982; Ringach, Shapley, & Hawken, 2002; Tolhurst & Thompson, 1981; Yu et al., 2010), in which case, what does it mean to derive a single estimate of orientation bandwidth in this type of study? Presumably, orientation-masking studies will tend to tap detecting mechanisms at the narrower end of the distribution, since these will more readily escape the within-channel (self-suppression) effects of masking. Nevertheless, it seems likely that the estimate will represent some aggregate measure of mechanisms toward that end of the distribution.

*h,*though whether this is placed at, before, or after the stage of binocular summation is not clear (Baker & Meese, 2007; Challinor, Meese, & Holmes, 2008; Moradi & Heeger, 2009).

*H*is even less clear. It remains possible that it involves intra-cortical inhibition from oriented (Heeger, 1992; Ringach, Bredfeldt et al., 2002) or nonoriented mechanisms (Hirsch et al., 2003), but some recent single-cell physiology favors a subcortical isotropic locus (Bonin et al., 2005; Freeman et al., 2002; Priebe & Ferster, 2006, 2008; Webb et al., 2005; see also Meier & Carandini, 2002; though also Ringach & Malone, 2007). One possible way forward is to consider the psychophysical effects of interactions between the eyes, which are usually attributed to cortical processes. Although similarities have been found between monoptic and dichoptic XOM (Cass et al., 2009; Meese et al., 2008), these two forms of masking are clearly not the same in general (Meese & Baker, 2009; Baker, Meese, & Summers, 2007; Gheiratmand, Meese, & Mullen, 2009; Meese & Hess, 2004, 2005; Nichols & Wilson, 2009; Roeber et al., 2008), at least in the fovea. They have different spatiotemporal dependencies (Meese & Baker, 2009), different spatial frequency bandwidths, different time courses, and different susceptibilities to adaptation (Baker, Meese, & Summers, 2007). Although dichoptic XOM is typically stronger than the monoptic variety (Baker & Meese, 2007; Meese & Hess, 2004), there are clear examples where this is the other way around (Baker, Meese, & Summers, 2007; Meese & Baker, 2009). These numerous differences make a common post-binocular site for monoptic and dichoptic XOM extremely unlikely, at least as the sole cause of the broadly tuned effects. Thus, although cortical processes might be involved (this seems probable for dichoptic masking), it seems likely that the broadband/isotropic effect here asserts its influence before full binocular summation (Baker, Meese, & Summers, 2007). This suggests an early cortical stage at the latest (e.g., V1) but leaves open the possibility of a subcortical stage. On this view, the broadly tuned component might be attributed entirely to pre-channel masking (see Introduction section). We note that Klein, Carney, Barghout-Stein, and Tyler (1997) have made a similar point. If orientation filtering and an accelerating contrast response followed this stage, then this arrangement might also escape the arguments against the within-channel model reviewed in the Introduction section. In fact, recent developments in binocular contrast vision suggest multi-stage architectures (Baker, Meese, & Georgeson, 2007; Meese & Baker, under review; Meese et al., 2006) that might be developed as we propose.

*after*gain control and a sigmoidal output nonlinearity.

*resp*

_{Θ}is the response of a single orientation-tuned filter with preferred orientation Θ following rectification and spatial pooling. Each of the

*n*responses of the local filter elements at each point (

*i*) in the image was weighted by a circular phase-insensitive spatial template with a two-dimensional Gaussian profile. The standard deviation of the template was

*ασ,*where

*σ*is the standard deviation of the target stimulus in the experiment (

*σ*= 0.708 cycles; Figure 1a) and

*α*is a free parameter. When

*α*= 1, the template was matched to the envelope of the signal. When

*α*> 1, the template was spatially more extensive than the signal. The template was insensitive to orientation, and the same spatial template was used to weight the responses from each of the 18 oriented filters.

*i*are given by:

*r*

_{ Sθi }and

*r*

_{ Cθi }for the sine and cosine phase filters, respectively. (To avoid the tedium of using two-dimensional spatial subscripts, the weighting by the template is not explicit in our formal expressions.)

*p*= 2.4 and

*q*= 2, as in the functional model described in the main body of the report. The parameter

*γ*controls the weight of self-suppression and was a free parameter. The rectified responses of the filter elements were raised to these powers before summation over area (

*i*= 1 to

*n*) and phase on both the numerator and denominator of the gain control equation (Equation A1). The suppressive gain pool also involved summation of the template-weighted sine and cosine phase filter elements at all

*n*locations (

*i*= 1 to

*n*) and all

*m*relative orientations (

*τ*= 1 to

*m*) each weighted further by

*ω*

_{ τ }. These weights were set according to the gentle decline in sensitivity of the broadly tuned component of suppression given by the functional model in the main report (

*H*= ±65°).

*m*= 18 pairs of filters (sine and cosine phases) spaced at intervals of 10° and centered on vertical (i.e., they sampled the full orientation domain in 10° steps). The response differences of these 18 filters across the 2AFC intervals were combined using Minkowski summation with an exponent of 2 to produce the decision variable. The model equations were solved for target contrast such that the target was detected when the decision variable exceeded a criterion level

*k*that was related to the signal-to-noise ratio. This was performed with and without the mask to calculate threshold elevation.

*k, γ, w,*and

*α*(Table 2). The parameters

*γ*and

*α*were adjusted by hand to achieve acceptable fits. We did not experiment much with either parameter but found that the dilution masking produced by setting

*α*> 1 (see General discussion section) helped account for the different breadths of the tightly tuned effect across observers (TSM and DJH) while keeping the filter bandwidths constant. Therefore, we ran the model with

*α*= 1.0 and

*α*= 1.5. We were able to achieve good fits in most cases with

*γ*= 10/

*T*but increased this to

*γ*= 100/

*T*to fit the results for DJH at 1 c/deg, where the masking effects were particularly strong. The constant

*T*is the sum of the template weights for

*α*= 1.0, where there were 32 pixels per cycle of a sine-wave grating. The remaining parameters

*k*and

*w*were adjusted by an optimization routine (Matlab's

*fmins*) to minimize the RMS error of the fit in dB.

*p*, (ii) the denominator exponent

*q*, (iii) the filter orientation bandwidth (

*h*), (iv) the filter spatial frequency bandwidth (1.6 octaves), (v) the filter spacing (10°), (vi) the implicit relation between spatial pooling on the numerator and the denominator of the gain control equation (the spatial extent was the same), (vii) the relation between

*ω*and

*τ*(set according to

*H*= ±65°), and (viii) the Minkowski summation exponent across filters (=2).

*f*noise: Contrast response and the horizontal effect. Journal of Vision, 10 (10):1, 1–21, http://www.journalofvision.org/content/10/10/1, doi:10.1167/10.10.1. [Article] [CrossRef] [PubMed]