In human vision, the response to luminance contrast at each small region in the image is controlled by a more global process where suppressive signals are pooled over spatial frequency and orientation bands. But what rules govern summation among stimulus components within the suppressive pool? We addressed this question by extending a pedestal plus pattern mask paradigm to use a stimulus with up to three mask components: a vertical 1 c/deg pedestal, plus pattern masks made from either a grating (orientation = −45°) or a plaid (orientation = ±45°), with component spatial frequency of 3 c/deg. The overall contrast of both types of pattern mask was fixed at 20% (i.e., plaid component contrasts were 10%). We found that both of these masks transformed conventional dipper functions (threshold vs. pedestal contrast with no pattern mask) in exactly the same way: The dipper region was raised and shifted to the right, but the dipper handles superimposed. This equivalence of the two pattern masks indicates that contrast summation between the plaid components was perfectly linear prior to the masking stage. Furthermore, the pattern masks did not drive the detecting mechanism above its detection threshold because they did not abolish facilitation by the pedestal (Foley, 1994). Therefore, the pattern masking could not be attributed to within-channel masking, suggesting that linear summation of contrast signals takes place within a suppressive contrast gain pool. We present a quantitative model of the effects and discuss the implications for neurophysiological models of the process.

^{2}) or Sony Trinitron Multiscan 200PS (mean luminance of 65 cd/m

^{2}). Both monitors had a frame rate of 120 Hz. Contrast is expressed in dB and is given by 20 times the log of Michelson contrast (

*c*) given by

*c*= 100.(

*L*

_{max}−

*L*

_{min})/(

*L*

_{max}+

*L*

_{min}), where

*L*is luminance. Gamma correction used lookup tables and ensured that the monitor was linear over the entire luminance range used in the experiments. A frame interleaving technique was used for test and mask stimuli, giving a picture refresh rate of 60 Hz. Observers were seated in a darkened room and sat with their heads in a chin and head rest at a viewing distance of 114 cm. A small dark fixation point (4 pixels square) was visible throughout the experiment.

*SE*s were estimated by performing probit analysis on the data gathered during the test stages and collapsed across the two staircases. This resulted in individual estimates based on around 100 trials (McKee, Klein, & Teller, 1985).

*SE*of a threshold estimate was greater than 3 dB, the data for that condition were discarded and the mini-bloc was rerun.

*K*: , where

*K*is a free parameter of the model. The observer’s response is given by , where the constant

*Z*and exponent

*q*are free parameters of the model and

*E*and

*POOL*are functions of stimulus component contrasts as follows: where

*C*

_{ped}and

*C*

_{test}are the pedestal and test contrasts (in %), respectively, and the exponent

*p*is a free parameter of the model.

*POOL*was formulated differently for each of four versions of the model and always included at least two free parameters: an exponent

*q*, introduced above, and a weight

*w*.

*β*is a free parameter and 0 ≤

*β*≤ 1.

*RESP*

_{MASK},

*C*

_{test}was equal to zero. For

*RESP*

_{MASK+TEST},

*C*

_{test}was solved numerically. The model was fit simultaneously to all three masking functions (33 data points) for both observers using a downhill simplex algorithm (Press, Flannery, Teukolsky, & Vetterling, 1989). The algorithm was initialized with 100 pseudo-randomly selected initial values, and the fits reported are those that achieved the lowest root mean square (RMS) error (in dB). For the first three models, there are five free parameters (

*K, Z, t, p*, and

*q*) and for the fourth, there is one additional parameter,

*β*. Parameter values and RMS errors are shown for both observers and all four versions of the model in Table 1, and the fits are shown in Figure 3 for TSM and Figure 4 for DJH.

K | p | q | Z | w | β | RMS Error | |
---|---|---|---|---|---|---|---|

TSM | |||||||

Nonlin | 0.21 | 1.93 | 1.58 | 2.71 | 0.76 | − | 2.08 |

Lin | 0.27 | 4.12 | 3.69 | 2.09 | 0.15 | − | 1.42 |

Hybrid | 0.21 | 2.04 | 1.69 | 2.62 | 0.64 | − | 1.89 |

Compound | 0.27 | 3.34 | 2.91 | 2.15 | 0.28 | 0.53 | 1.34 |

DJH | |||||||

Nonlin | 0.28 | 2.63 | 2.26 | 2.13 | 0.72 | − | 2.04 |

Lin | 0.24 | 3.59 | 3.22 | 2.08 | 0.24 | − | 1.81 |

Hybrid | 0.30 | 3.24 | 2.85 | 2.02 | 0.53 | − | 1.51 |

Compound | 0.28 | 3.7 | 3.32 | 2.04 | 0.43 | 0.15 | 1.33 |

*p*in Table 1, which influences the size of the dip.) In fact, it fits the results for TSM quite well. However, for DJH it fails to capture the depth of the cross-over of the pattern mask functions and the pedestal dipper function, a behavior that was actually well described previously by the nonlinear summation model.

*linearize*the psychometric function (produce a

*d*’ slope of one) for the vertical 1 c/deg test component (Georgeson & Meese, 2004; Meese et al., 2004). Third, contrast matching experiments (e.g., Meese & Hess, 2004) have shown that an oblique 3 c/deg mask attenuates the perceived contrast of a superimposed 1 c/deg test grating. These four lines of evidence provide a very strong case that the 3 c/deg mask components were not exciting the detection mechanism for the 1 c/deg target stimulus and rule out a within-channel masking account of the linear summation of pattern mask components. Instead, we conclude that masking arises from cross-channel suppression and that pattern mask contrasts sum linearly within a suppressive pathway, at least for the stimulus configuration used here. We refer to this as linear suppression. Note, however, that this does not disallow an output nonlinearity after summation. In fact, the models considered here work exactly this way, though other possibilities also exist (see the early adaptation model of Meese & Holmes, 2002).

*RESP*) is thought of as the magnitude of mechanism response, then the model parameter

*K*relates to constant variance Gaussian noise added at the output stage of the model (i.e., after filtering and interactions, but before the decision variable). However, other models have supposed that noise is multiplicative (e.g., Itti et al., 2000), in which case,

*RESP*is better thought of as the signal to noise ratio (Foley, 1994). These two possibilities have prompted some recent debate (Tyler & Chen, 2000; Mortensen, 2002, 2003; Gorea & Sagi, 2001, 2002; Kontsevich, Chen, & Tyler, 2002a; Kontsevich, Chen, Verghese, & Tyler, 2002a) but the picture remains unclear (Georgeson & Meese, 2004). Quite possibly, both types of noise are involved.