In human (D. H. Baker, T. S. Meese, & R. J. Summers, 2007b) and in cat (B. Li, M. R. Peterson, J. K. Thompson, T. Duong, & R. D. Freeman, 2005; F. Sengpiel & V. Vorobyov, 2005) there are at least two routes to cross-orientation suppression (XOS): a broadband, non-adaptable, monocular (within-eye) pathway and a more narrowband, adaptable interocular (between the eyes) pathway. We further characterized these two routes psychophysically by measuring the weight of suppression across spatio-temporal frequency for cross-oriented pairs of superimposed flickering Gabor patches. Masking functions were normalized to unmasked detection thresholds and fitted by a two-stage model of contrast gain control (T. S. Meese, M. A. Georgeson, & D. H. Baker, 2006) that was developed to accommodate XOS. The weight of monocular suppression was a power function of the scalar quantity ‘speed’ (temporal-frequency/spatial-frequency). This weight can be expressed as the ratio of non-oriented magno- and parvo-like mechanisms, permitting a fast-acting, early locus, as befits the urgency for action associated with high retinal speeds. In contrast, dichoptic-masking functions superimposed. Overall, this (i) provides further evidence for dissociation between the two forms of XOS in humans, and (ii) indicates that the monocular and interocular varieties of XOS are space/time scale-dependent and scale-invariant, respectively. This suggests an image-processing role for interocular XOS that is tailored to natural image statistics—very different from that of the scale-dependent (speed-dependent) monocular variety.

^{2}and mirror stereoscope arrangement (previously described in Baker et al., 2007b) was the same for all observers. Neutral density filters were used to reduce the maximum luminance at the eye to 28 cd/m

^{2}.

_{10}(

*C*%), where

*C*% is Michelson contrast in percent, defined as 100(

*L*

_{max}−

*L*

_{min})/(

*L*

_{max}+

*L*

_{min}), where

*L*is luminance.

^{2}. When viewed through the stereoscope, these apertures were a strong aid to fusion, and appeared as a single central display region. A ‘quad’ arrangement of four points equidistant from the center of each aperture was used to aid fixation. The points were placed 3 cycles of the carrier grating away from the center, so their positions varied with the spatial frequency of the stimulus (Meese & Holmes, 2007). We avoided using a central fixation point so as not to confound the suppression that we wished to measure here with the suppression that can arise from that type of fixation point (Meese & Hess, 2007; Summers & Meese, 2007).

*cmpnt*) in the left eye is given by:

*C*

_{L}and

*C*

_{R}are the component contrasts (target or mask) that drive the stage, and

*X*

_{L}and

*X*

_{R}are the orthogonal contrasts (mask or target) for the two eyes. Note that for the experiments here one of the

*C*terms and one of the

*X*terms was always zero. The parameters

*S, m, ω*

_{M}and

*ω*

_{D}are the saturation constant of the gain control, the excitatory exponent, and the weights of the two cross-oriented terms, respectively. The main aim of the modeling was to establish the values of the two weight parameters for each spatiotemporal condition.

*Z*is the saturation constant of this stage,

*p*and

*q*are excitatory and suppressive response exponents, and

*α*is a parameter that controls the weight of facilitation (Meese & Holmes, 2007; Meese, Holmes, et al., 2007; Meese, Summers, et al., 2007). To solve the model equations, the target contrast was adjusted until

*resp(target)*=

*k,*where

*k*is a free parameter and is proportional to the standard deviation of late, additive, performance-limiting noise. Thus, the model contains a total of nine free parameters. However, for the experiments here, precise values of several of these parameters were not important and we were able to round five of them from previous results (Meese et al., 2006), giving:

*m*= 1.3,

*p*= 8,

*q*= 6.5,

*S*= 1 and

*k*= 0.2. The saturation constant Z was then adjusted (Z = 0.0085) so that the model produced the correct intercept (0 dB) on the normalized axes.

*α, ω*

_{ M}and

*ω*

_{D}) were determined using a simplex algorithm to optimize the fits of the model to the results. The model was fitted simultaneously to the monoptic and dichoptic results from all eight spatiotemporal conditions. Following Meese and Holmes (2007),

*α*was yoked across conditions (for each observer, a single value was fitted for the entire experiment), whereas the two suppressive weight parameters were allowed to vary across spatiotemporal conditions. Thus, for observers DHB and WS, we fitted 16 masking functions (to the normalized results) with 17 free parameters. For KP we fitted 14 functions with 15 free parameters. The fits are shown in Figure 5 and the free parameters and figures of merit are shown in Table 1.

Condition | Parameter | DHB | WS | KP | |
---|---|---|---|---|---|

SF (c/deg) | TF (Hz) | α | 0 | 0.70 | 1.20 |

0.5 | 4 | ω _{M} | 0.118 | 0.076 | 0.126 |

0.5 | 15 | ω _{M} | 0.274 | 0.292 | 0.477 |

1 | 4 | ω _{M} | 0.067 | 0.076 | 0.103 |

1 | 15 | ω _{M} | 0.183 | 0.254 | 0.292 |

2 | 4 | ω _{M} | 0.043 | 0.031 | 0.045 |

2 | 15 | ω _{M} | 0.200 | 0.103 | 0.176 |

4 | 4 | ω _{M} | 0.054 | 0.014 | 0.022 |

4 | 15 | ω _{M} | 0.206 | 0.034 | – |

0.5 | 4 | ω _{D} | 0.228 | 0.140 | 0.471 |

0.5 | 15 | ω _{D} | 0.187 | 0.102 | 0.461 |

1 | 4 | ω _{D} | 0.200 | 0.193 | 0.545 |

1 | 15 | ω _{D} | 0.197 | 0.209 | 0.359 |

2 | 4 | ω _{D} | 0.192 | 0.171 | 0.362 |

2 | 15 | ω _{D} | 0.257 | 0.144 | 0.342 |

4 | 4 | ω _{D} | 0.224 | 0.278 | 0.471 |

4 | 15 | ω _{D} | 0.321 | 0.077 | – |

RMSe (dB) | 1.008 | 1.267 | 1.100 |

*α*for each observer ( Table 1). This is because the facilitatory influence of

*α*is apparent in the masking functions only when the masking effect is weak (Meese & Holmes, 2007). For the study here, the masking is weakest for the slow speeds and monoptic masking, and this is where facilitation is seen in the masking functions (Figure 5).

*ω*

_{ D}) was independent of temporal frequency, spatial frequency and the ratio of these two parameters (TF/SF) (see Table 2, and green diamonds and dashed lines in Figure 6). There was some evidence for relations between the monoptic weights (

*ω*

_{ M}) and each of temporal frequency and spatial frequency ( Table 2). However, the greatest variance (97% for the average) was accounted for by the relation between

*ω*

_{ M}and speed (TF/SF) ( Table 2).

SF (r ^{2}) | TF (r ^{2}) | TF/SF (r ^{2}) | |
---|---|---|---|

Monoptic | |||

DHB | 0.093 | 0.812** | 0.666* |

WS | 0.581* | 0.353 | 0.932** |

KP | 0.580* | 0.641* | 0.980** |

Average | 0.436 | 0.553* | 0.970** |

Dichoptic | |||

DHB | 0.347 | 0.105 | 0.057 |

WS | 0.011 | 0.308 | 0.193 |

KP | 0.091 | 0.274 | 0.014 |

Average | 0.013 | 0.396 | 0.104 |

^{0.78}. This is somewhat steeper than the regression slope of 0.51 found by Meese and Holmes (2007) for binocular XOS, using a simpler version of the model and different observers. But we caution against attributing too much significance to the precise value of the monoptic regression slope here (Figure 6; blue circles). We found that we were able to achieve almost equally good fits to the masking functions in Figure 5 by using much lower values of

*p*and

*q*. For example, with

*p*= 3.33 and

*q*= 2 (and

*Z*adjusted accordingly), we found an average regression slope (c.f. Figure 6) of 0.51, the same as that in the Meese & Holmes study.

^{1}However, as several of the (nine) model parameters (Equations 1, 2, and 3) were poorly constrained by the present data set, we preferred the simplicity (and transparency) of the main method used here, where the parameters were set according to those determined from previous data sets. Nonetheless, it is possible that future work might converge on a (slightly) different regression slope from that reported here

^{2}.

*m*-cells are responsive to low spatial and high temporal frequencies, fast

*m*-type stimuli might produce the strong masking we found because of their distinct contrast-response nonlinearity (Derrington & Lennie, 1984). Similarly, slow

*p*-type stimuli would drive the more linear

*p*-cells and therefore produce lower levels of masking. This hypothesis might also go some way towards explaining the weak monoptic facilitation by within-channel summation if there were gentle response acceleration for the initial part of the contrast response of

*p*-cells (c.f. Legge & Foley, 1980).

*m*-type phenomenon (Medina & Mullen, 2009). Second, both monoptic and dichoptic suppression have been found for (parallel and cross-oriented) annular surrounds using contrast matching (Cai, Zhou, & Chen, 2008; Meese & Hess, 2004), indicating that monoptic masking does not derive purely from excitatory drive. Third, the pooling rule across mask orientations is the same for monoptic and dichoptic masks (Meese, Challinor, & Summers, 2008). This latter result suggests that masking from the two distinct ocular pathways (within and between the eyes) involves similar processes. It is unlikely that this is excitatory drive (as posited by the present hypothesis), since the spatial frequency and orientation differences between Meese et al.'s dichoptic mask and target components (a factor of 3 and 45°, respectively) were too great for their binocular summation within a single detecting mechanism (Holmes & Meese, 2004; Meese et al., 2008). Fourth, the nonlinearity that causes conventional pedestal facilitation (Legge & Foley, 1980) survives cross-orientation masking for the binocular (Foley, 1994; Holmes & Meese, 2004) and monoptic cases (unpublished observations). If this facilitation is to be attributed to an accelerating (cortical) transducer (Chirimuuta & Tolhurst, 2005; Kontsevich & Tyler, 1999; Lu & Dosher, 1999, 2008; Legge, Kersten, & Burgess, 1987; Meese & Summers, 2007), then this poses a challenge to any model that attributes masking to an earlier injection of excitatory drive.

*TF*

_{m}and

*TF*

_{p}respectively. Further, an exact measure of spatial frequency (SF) can be derived from the ratio of a similar pair of spatial frequency filters. We will call these

*SF*

_{p}and

*SF*

_{m}for the high and low SF filters respectively. In fact, for neither TF nor SF is it essential that one of the filters is strictly low-pass; a low frequency tuned band-pass filter will do (see Harris, 1986). In any case, we have TF =

*TF*

_{m}/

*TF*

_{p}and SF =

*SF*

_{p}/

*SF*

_{m}. As the scalar quantity speed is given by TF/SF, it follows that this can be derived from the ratio of a pair of filters with the spatiotemporal selectivity of

*TF*

_{m}

*SF*

_{m}and

*TF*

_{p}

*SF*

_{p}. While space-time separability is not a property of the entire spatiotemporal contrast sensitivity function (Kelly, 1979), it is much more so for individual cortical cells (Friend & Baker, 1993; Mazer, Vinje, McDermott, Schiller, & Gallant, 2002) and LGN cells (Derrington & Lennie, 1984; Wolfe & Palmer, 1998), and at least some

*m*- and

*p*-cells in the retina and LGN have the sorts of tuning properties that we require (Derrington & Lennie, 1984; Merigan & Maunsell, 1993). Thus, in principle at least, the scalar quantity, speed, can be derived from the ratio of (linear) filters (

*magno*/

*parvo*) similar to those found sub-cortex. Furthermore, this computation could be used to modulate the suppressive process that we propose by delivering

*ω*

_{m}(that term would then be raised to an appropriate power, here 0.78), offering a neurophysiological underpinning for our monoptic masking result. The details of this in human remain unclear, though we note that suppressive fields have been identified in the LGN and retina of cat (Bonin et al., 2005; Shapley & Victor, 1978) and monkey (Alitto & Usrey, 2008; Webb et al., 2005). Nonetheless, we cannot rule out the possibility that the monoptic effects that we have measured arise at a slightly later stage such as layer 4 of V1 (Hirsch et al., 2003). However, a stage after binocular summation would not be consistent with the masking results of Baker et al. (2007b), suggesting that speed-dependent XOS for monoptic stimulation asserts its influence no later than V1 (and possibly before).

*mask contrast*is normalized to detection threshold, as in our analysis here. Thus, what might appear as contradictory conclusions across the two studies are in fact completely consistent: dichoptic cross-orientation masking depends on the contrast sensitivity to the mask component.

*magno*system in the speed computation. (Recall from above, that the weight of suppression in the monocular pathway is a power function of TF/SF, or

*magno*/

*parvo,*which would be zero for isoluminant stimuli.) In fact, our preliminary re-analysis of Medina, Meese, and Mullen's (2007) data is consistent with this. For binocular chromatic XOS we found that the weight of suppression was indifferent to speed after normalization (unpublished observations), in stark contrast to the achromatic variety (Meese & Holmes, 2007). A more detailed account of this result awaits future elaboration.

*magno*and

*parvo*units is used to compute monoptic (mask) speed and that this controls suppression in order to emphasize ecologically significant fast-moving stimuli. The interocular cortical stage of suppression is scale invariant in space and time and might be involved in contrast gain control, including that of the color-system. In any case, it is clear that contrast masking will not be fully understood by attempts to attribute a single process to its various manifestations.

*p*and

*q*) were chosen as follows. In typical model formulations for contrast discrimination (Legge & Foley, 1980), the log-log slope of the dipper handle is equal to 1 −

*A,*where

*A*is the difference between the overall exponents on the numerator and denominator of the gain control equation. The effective exponent of the contrast term at the output of stage 1 in our model here is

*m*− 1. It follows that the overall exponents on the numerator and denominator at stage 2 are

*p*(

*m*− 1) and

*q*(

*m*− 1) respectively. In our model,

*m*= 1.3, and assuming a typical log-log dipper handle of 0.6 (Legge & Foley, 1980), we have 0.6 = 1 − (0.3

*p*− 0.3

*q*), which gives us 1.333 =

*p*−

*q*. We chose an arbitrarily low denominator exponent of

*q*= 2, which gives us

*p*= 3.333, as we used in the illustration. Note also that as the overall numerator exponent is

*p*(

*m*− 1), then for

*m*= 1.3 and

*p*= 8, as in our main model analysis, the effective numerator exponent is 2.4, consistent with Legge and Foley (1980).

*p*and

*q*in the model. With the existing parameters it can be shown that the model here predicts a psychometric function with a Weibull

*β*= 4. This is slightly steeper than the average

*β*= 2.7 that we found in the experiment in the absence of a mask (the horizontal dashed line in Figure 4). Reducing the value of

*p,*reduces the model psychometric slope. However, as several other model parameters also influence this function, we chose not to set

*p*according to this constraint. Nevertheless, we emphasize that these model details have little bearing on our main conclusions.