A fundamental problem for any visual system with binocular overlap is the combination of information from the two eyes. Electrophysiology shows that binocular integration of luminance contrast occurs early in visual cortex, but a specific systems architecture has not been established for human vision. Here, we address this by performing binocular summation and monocular, binocular, and dichoptic masking experiments for horizontal 1 cycle per degree test and masking gratings. These data reject three previously published proposals, each of which predict too little binocular summation and insufficient dichoptic facilitation. However, a simple development of one of the rejected models (the twin summation model) and a completely new model (the two-stage model) provide very good fits to the data. Two features common to both models are gently accelerating (almost linear) contrast transduction prior to binocular summation and suppressive ocular interactions that contribute to contrast gain control. With all model parameters fixed, both models correctly predict (1) systematic variation in psychometric slopes, (2) dichoptic contrast matching, and (3) high levels of binocular summation for various levels of binocular pedestal contrast. A review of evidence from elsewhere leads us to favor the two-stage model.

*c*) is expressed as

*c*

^{q}, and this predicts a 2

^{(1/q)}improvement for binocular contrast detection. When

*q*= 2, this provides a convenient expression for computing monocular contrast energy and, of course, describes the widespread belief that binocular summation is quadratic (e.g., Blake et al., 1981; Smith et al., 1997; Strasburger, 2001).

*two-stage model*of contrast gain control, where the first and second stages receive monocular and binocular excitation, respectively, but divisive suppression is binocular at both stages. The second model was motivated by very recent work of Maehara and Goryo (2005), published while this article was in preparation. The

*twin summation model*here is a generalization of their model, modified for the stimuli used in our experiments. This model incorporates monocular nonlinearities and binocular summation within parallel excitatory and inhibitory pathways prior to a single stage of contrast gain control. The two-stage model and the twin summation model both correctly predict several novel effects in the psychometric functions we have measured and explain a previously puzzling pattern of results for contrast matching.

^{2}. The system was carefully gamma corrected (linearized), and contrast was controlled, via lookup tables. Pseudo-14-bit grayscale resolution was obtained by using the CRS Bits++ graphical interface in Bits++ mode.

^{2}and was gamma corrected using lookup tables. The experiments were run under the control of a PC. Stimuli were viewed through a mirror haploscope (four pairs of front-surfaced mirrors, set at ±45°) affording a square field size of 11.5° × 11.5° and an effective viewing distance of 52 cm. The visible region of the display consisted of a 256-pixel square array for each eye. The frame rate of the monitor was 120 Hz, which gave a picture refresh rate of 60 Hz due to frame interleaving across eyes. This was done to allow fine control over the contrast presented to each eye.

*c*= 100(

*L*

_{max}−

*L*

_{min})/(

*L*

_{max}+

*L*

_{min}), where

*L*

_{max}and

*L*

_{min}are maximum and minimum luminance values, respectively, and also as decibel contrast relative to 1%, equal to 20 log

_{10}(

*c*).

*ψ*()) to proportion correct using a Simplex algorithm and maximum likelihood estimation. In Experiment 1, the psychometric function was fit to the data from each experimental session. In all cases, the psychometric function was a Weibull function given by

*C*is the test contrast,

*α*is the detection threshold (81.6% correct when

*λ*= 0),

*β*is the slope of the psychometric function, and

*λ*is a lapse rate parameter constrained to be ≤0.04 to allow for finger errors. The lapse parameter can be important when estimating the slope of the psychometric function (Wichmann & Hill, 2001). In Experiment 1, 13 of the 16 psychometric fits gave an estimated lapse rate of 0. For Experiment 3,

*λ*was fixed at a small value (0.01).

_{10}and multiplying by 20.) Binocular summation ratios are shown in Figure 2 for the four different stimulus durations. In all cases, summation is greater than the quadratic prediction of √2 (3 dB) but less than a perfect linear summation ratio of 2 (6 dB) and has an average of 1.70 (4.6 dB). In particular, this result is inconsistent with the quadratic summation model of Legge and the ideal linear summation model of Campbell and Green, both of which predict binocular summation ratios of √2 (3 dB). Psychometric slopes did not differ significantly between monocular and binocular testing. Mean values (±1

*SE, n*= 8) for slope parameter

*β*were 3.00 ± 0.27 (monocular) and 3.31 ± 0.27 (binocular).

Model name | Equation | Data figure ( Figure 3 panel) | Model figure ( Figure 4 panel) |
---|---|---|---|

Legge-type | r e s p ( L , R ) = ( L q + R q ) p / q z + L q + R q (1) | A and B | A |

Late summation | r e s p ( L , R ) = L p + R p z + L q + R q (2) | C | B |

Two-stage gain control | S t a g e 1 ( L ) = L m s + L + R , S t a g e 1 ( R ) = R m s + L + R | D* | C |

r e s p ( L , R ) = ( S t a g e 1 [ L ] + S t a g e 1 [ R ] ) p z + ( S t a g e 1 [ L ] + S t a g e 1 [ R ] ) q (3) | |||

Twin summation/Maehara and Goryo | r e s p ( L , R ) = ( L m + R m ) p z + ( L n + R n ) q (4) | E and F* | D |

Decision variable Δ r, (all models) | Δ r = r e s p ( L m a s k + L t e s t , R m a s k + R t e s t ) − r e s p ( L m a s k , R m a s k ) (5) |

*q*to be a free parameter. The current formulation is the same as the treatment by Meese and Hess (2004; where

*q*= 2) and, hereafter, is called the Legge-type model.

*q*= 3.03 (see Table 2) results in far too little binocular summation at low mask contrasts. To achieve the level of summation found in these detection data (mask contrast = 0%), the exponent must be reduced to

*q*= 1.34. However, this adjustment results in the unfortunate side effects shown in Figure 3B. Specifically, the value of

*q*controls (1) the depth of the dip in monocular and binocular masking and (2) the separation between monocular and dichoptic masking at high contrasts. With the low value of

*q*needed as described above, both of these features are underestimated. (The second effect occurs because as

*q*approaches 1, the monocular and dichoptic conditions become increasingly similar for this model.) Clearly, the Legge-type model cannot survive in its present form.

Data figure ( Figure 3 panel) | Model figure ( Figure 4 panel) | Free parameters | RMS error (dB) | m | n | s | p | q | z | σ | B _{sum} (dB)
= 4.5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

Legge-type | A | A | 4 | 1.75 | – | – | – | 3.47 | 3.03 | 4.76 | 0.250 | 2.0 |

Legge-type (constrained) | B | A | 3 | 3.68 | – | – | – | 1.67 | 1.34 (Fixed) | 3.38 | 0.207 | 4.5 |

Late summation | C | B | 4 | 1.66 | – | – | – | 2.76 | 2.34 | 4.59 | 0.212 | 2.1 |

Two-stage | D* | C | 6 | 0.873 | 1.28 | – | 0.985 | 7.99 | 6.59 | 0.077 | 0.194 | 3.5 |

Maehara and Goryo (constrained) | E | D | 4 | 1.17 | = p | = q | – | 1.88 | 1.75 | 6.036 | 0.255 | 3.2 |

Twin summation | F* | D | 6 | 0.664 | 1.43 | 1.28 | – | 2.47 | 2.4 | 7.06 | 0.259 | 4.0 |

*m*= 1.28) and, thus, binocular summation is high, but dichoptic masking remains severe due to the inclusion of interocular suppression (with the interocular suppression removed, the model reverts to behavior like that in Figure 3B when refit to the data; Georgeson, Meese, & Baker, 2005). The model also captures all the other main features of the data including the different levels of facilitation in each of the three conditions and the convergence of monocular and binocular masking functions (see Figure 4C). In particular, the second-stage excitatory exponent (

*p*) allows for deeper regions of facilitation than would be seen with only a first-stage transducer, where

*m*= 1.28.

*m*=

*p*and

*n*=

*q,*and a fit of the model with this constraint is shown in Figure 3E. Of the three different four-parameter models we have tried, this model undoubtedly produces the best overall fit (see Table 2). However, it does not produce a sufficiently marked dip in the dichoptic condition and it tends to underestimate binocular summation at low mask contrasts. This model's behavior falls somewhere between the Legge-type model and the two-stage model. We were able to improve upon this with the more general, six-parameter,

*twin summation model*(Table 1). The fit is shown in Figure 3F and captures all features of the data very well. (We also tried a five-parameter version with the constraint

*p*=

*q*. This performed very well [RMS error = 0.76 dB] but slightly underestimated the level of dichoptic facilitation.)

*m, n, s, p, q*, and

*z,*as appropriate) were adjusted so that the difference in responses between test and nontest intervals (Δ

*r*) was equal to a constant (across mask contrasts and test conditions) related to the standard deviation of late additive noise (

*σ*). As the value of

*σ*was unknown, it was a free parameter in the model (see Tables 1 and 2). For 2IFC, signal detection theory shows that percent correct is equal to Φ(

*d*′/√2), where Φ() is the standard normal integral. From this, it follows that 81.6% correct (“threshold”) corresponds to

*d*′ = 1.273. Because, by definition,

*d*′ = Δ

*r*/

*σ,*it follows that

*σ*= Δ

*r*/1.273. To produce the model psychometric functions, Ψ(Δ

*C*), we calculated percent correct as Φ(Δ

*r*/(

*σ*√2)) for a range of values of Δ

*C*(threshold, ±15 dB). Slopes of these model psychometric functions were derived by fitting Weibull functions,

*C*), given by

*α*is the test contrast corresponding to 81.6% correct (threshold) and

*β*is the slope of the psychometric function.

*β*∼ 6). This is broadly consistent with previous evidence for super-steep psychometric slopes for high-contrast dichoptic masks (Meese et al., 2004). Psychometric slopes also become very steep when stimulus uncertainty is a strong factor. For example, Meese, Hess, and Williams (2001) found an average

*β*= 5.7 for three uncertain observers in a contrast discrimination task well above detection threshold. However, as the dichoptic mask should reduce, rather than increase, uncertainty, it is unlikely that uncertainty is responsible for the effects we have observed here. Instead, we look for this as an emergent property of the models we are testing.

*R*(

*c*), can account for both contrast discrimination and contrast matching. The neural codes for these two tasks are not necessarily closely related, but a model for contrast coding that did account for both would have greater power and generality. In a unified scheme, we should expect discrimination to be related to both the derivative d

*R*/d

*c*and the noise

*σ,*as above, whereas matching could depend solely on the mean

*R*. We thus challenged our models, based on discrimination data, to predict the contrast-matching data of Legge and Rubin (1981). In their experiment, observers adjusted the overall contrast of a dichoptic test stimulus, which had different contrasts in the two eyes, to match a binocular standard of fixed contrast (see figure caption for experimental details). Typical results for one of their observers (G.R.) are replotted in Figure 6 for four different standard contrasts. Note that the left- and right-eye contrast axes for the dichoptic test stimulus are normalized by the contrast of the standard. There are two notable features of the data. First, proportionally higher dichoptic contrasts are needed to match the low-contrast binocular standards (open symbols), as compared with the high-contrast binocular standards (filled symbols). Second, there is a tendency for the matching functions to curve back in on themselves as they approach the monocular axes, although this effect is clearer for the right eye than the left eye.

*q*= 3.03) are shown in Figure 6A. The model loosely describes the general curvature in the data but fails to capture the dependence on standard contrast level. Predictions for the quadratic summation model (

*q*= 2 in the Legge-type model)

^{1}are shown in Figure 6B. As recognized by Legge and Rubin (1981), the curvature is less tight for lower exponents of the monocular transducer (cf. Figures 6A and 6B), but there is no exponent that will cause the curves to distinguish between the different binocular contrasts or to fold back on themselves.

*binocular averaging rule,*equivalent to the linear sum of the contrasts in the two eyes (and equivalent to

*q*= 1 in the Legge-type model here). This predicts that all of the data should fall on a straight oblique line from the top left to the bottom right of the plots (not shown). Clearly, this model does not describe the data. This is noteworthy because the threshold model of Campbell and Green (1965) also assumes linear summation of monocular contrasts (see Introduction section), and thus, that model does not extend to suprathreshold conditions in a straightforward way.

*d*′) in the initial region of the 2IFC psychometric function. That is, the psychometric function should be nonmonotonic for dichoptic masking. This is unlikely to have compromised our results in Experiment 2 because the staircase procedure would have steered the test stimulus levels away from the paradoxical effect. However, we have tested the prediction by using the method of constant stimuli to measure psychometric functions over a wide range of test contrasts. Our preliminary investigation has revealed good evidence for this paradoxical effect for high-contrast dichoptic masks (Meese, Georgeson, & Baker, 2005), and this issue is continuing to receive our attention.

Data figure ( Figure 7 panel) | Model figure ( Figure 3 panel) | Free parameters | RMS error (dB) 2 funcs | RMS error (dB) 3 funcs | m | n | s | p | q | z | σ | B _{sum} (dB)
= 4.8 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Legge type | – | A | 4 | 1.65 | 1.54 | – | – | – | 2.61 | 2.21 | 6.41 | 0.252 | 2.5 |

Late summation | – | B | 4 | 1.92 | 1.72 | – | – | – | 3.13 | 2.66 | 7.08 | 0.233 | 2.0 |

Two-stage | A, C, and E | C | 6 | 1.22 | 1.36 | 1.24 | – | 1.42 | 7.67 | 6.10 | 0.062 | 0.158 | 4.2 |

Maehara and Goryo (constrained) | – | D | 4 | 1.33 | 1.27 | = p | = q | – | 1.81 | 1.67 | 8.68 | 0.247 | 3.55 |

Twin summation | B, D, and F | D | 6 | 0.95 | 1.01 | 1.37 | 1.28 | – | 2.9 | 2.71 | 7.06 | 0.259 | 4.7 |

_{10}(2

^{1/β}) dB of binocular summation, where

*β*is the Weibull slope parameter of the psychometric function (Quick, 1974), assuming equal sensitivity of the two eyes. From Experiment 2, we estimate slope

*β*= 3.61 at detection threshold, predicting 1.66 dB of summation, considerably less than the 4.5 dB of summation found empirically (Table 1). Within this framework, a much lower value of

*β*= 1.33 would be required to meet the observed level of summation. However, the assumptions of high threshold theory have long been discredited (e.g., Nachmias, 1981), and in any case, they do not readily extend to suprathreshold conditions (Experiment 3); hence, this account will not suffice on any grounds.

_{10}(2

^{1/4}) = 1.5 dB for binocular summation by probability summation (or “attentional summation” as Tyler and Chen prefer to call it). Changing the base level of uncertainty or, equivalently, introducing an accelerating nonlinearity, does not change this conclusion appreciably. Some situations that did predict substantially higher levels of summation than the fourth-root rule were found (see Figure 11a in Tyler & Chen, 2000), but they produced psychometric slopes that were far too shallow compared with our data. Thus, probability summation cannot account for binocular summation.

*direct effect*of suppression from the mask on the test is not the only factor involved; there is also an

*indirect effect,*caused by suppression from the test on the mask, because the test contrast is so high. This causes the response to the mask to be less in the test interval (test + mask stimulus) than in the null interval (mask alone), and hence, greater test contrast is needed to overcome this. Explorations with the model confirm that both of these factors are important. If the pathway for the eye carrying the dichoptic mask is “lesioned” just before binocular summation, then binocular summation of mask and test does not occur and the indirect effect cannot be a factor. When this is done, dichoptic masking remains but (1) there is no facilitation, (2) masking occurs at much lower mask contrasts, and (3) the masking function is much less steep (the log–log slope drops from 1 to about 0.6). The masking in the lesioned model is due to the direct effect, and the drop in masking at the higher mask contrasts illustrates the contribution of the indirect effect when the model is intact.

*p*=

*q*) or four (

*m*=

*p, n*=

*q*), with only a small drop in overall performance (see Results section) and, hence, might be preferred over the six-parameter, two-stage model on those grounds.

*c*is contrast,

*m*is the transducer exponent, and

*σ*is the standard deviation of the noise for each monocular signal. Thus,

*d*

_{bin}′ =

*d*

_{mon}′, for any value of the exponent,

*m*. Assuming

*d*′ is some constant

*k*at threshold, we obtain from Equation A1 the contrast thresholds:

*c*

_{mon}= (

*kσ*)

^{1/ m}and

*c*

_{bin}= (

*kσ*

^{1/ m}. Thus, the binocular advantage (threshold ratio) is

*c*

_{mon}/

*c*

_{bin}= 2

^{1/(2 m)}irrespective of

*k*and

*σ*. The Weibull slope parameter

*β*≈ 1.3

*m*(Pelli, 1987; Tyler & Chen, 2000) is typically around 3.5 in our data set and that of others, implying

*m*≈ 2.7. Hence, the expected binocular threshold improvement for the ideal observer in this case is

*c*

_{mon}/

*c*

_{bin}= 2

^{1/5.4}= 1.14, equivalent to 1.2 dB, compared with 4.5 dB observed in our experiments. If contrast transduction were linear (

*m*= 1), then the threshold ratio predicted from this approach is √2 (3 dB), but this implies psychometric slopes

*β*of about 1.3, far shallower than those observed at detection threshold. Thus, this analysis shows that ideal binocular combination of the noisy responses, after either linear or accelerating transduction, cannot account for the high (4.5 dB) level of binocular summation that we observed.

*q*.) Any nonlinearities after the binocular sum will transform the dichoptic and binocular stimuli in identical ways and, thus, cannot affect the match. The matter is slightly more complicated for the twin summation model because binocular summation takes place in parallel excitatory and suppressive pathways. For this model, matching depends on the final output after combination of the two pathways.