Stereo depth perception depends on the fact that objects project to different positions in the two eyes. Because our eyes are offset horizontally, these retinal disparities are mainly horizontal, and horizontal disparity suffices to give an impression of depth. However, depending on eye position, there may also be small vertical disparities. These are significant because, given both vertical and horizontal disparities, the brain can deduce eye position from purely retinal information and, hence, derive the position of objects in space. However, we show here that, to achieve this, the brain need measure only the *magnitude* of vertical disparity; for physically possible stimuli, the sign then follows from the stereo geometry. The magnitude of vertical disparity—and hence eye position—can be deduced from the response of purely horizontal-disparity sensors because vertical disparity moves corresponding features off the receptive fields, reducing the effective binocular correlation. As proof, we demonstrate an algorithm that can accurately reconstruct gaze and vergence angles from the population activity of pure horizontal-disparity sensors and show that it is subject to the induced effect. Given that disparities experienced during natural viewing are overwhelmingly horizontal and that eye position measures require only horizontal-disparity sensors, this work raises two questions: Does the brain in fact contain sensors tuned to nonzero vertical disparities, and if so, why?

*P*in the left retina ( Figure 1). The object that caused this image could lie anywhere along the ray that projects to the point

*P*(red dashed line in Figure 1). The image of this ray in the right eye defines a one-dimensional (1D) line on the right retina. This

*epipolar line*is the locus of all possible matches in the right eye for point

*P*in the left eye. Objects at different distances fall at different places along the epipolar line. For any given eye position, therefore, disparity can be described with a purely 1D measure. However, changes in eye position shift the epipolar lines on the retina, making disparity genuinely 2D. The two dimensions of disparity thus carry different information: The component along the epipolar line carries information about the outside world (the location of objects in space), whereas the orientation of epipolar lines carries information about the observer (the current position of the eyes).

*and*whose vertical-disparity tuning is taken into consideration in the readout. Thus, in this picture, the brain would contain no vertical-disparity detectors. If, as we propose, the precise vertical-disparity tuning of individual neurons is ignored, then any scatter in vertical-disparity tuning simply represents noise—for a given vertical disparity, the sensor's response is either slightly larger or slightly smaller than the visual system would have expected. Thus, for simplicity, we shall consider a model stereo system in which all disparity detectors are tuned to

*exactly*zero vertical disparity. That is, their receptive fields are identical in profile and are located at the same vertical position in both retinae. Such pure horizontal-disparity neurons ( Figure 2) can still sense binocular correlation between the two eyes' images, even when there is a small amount of vertical disparity: They tolerate vertical disparities that are small compared to the receptive-field size. When the vertical disparities are too large, of course, they simply perceive the images in the two eyes as being uncorrelated. This mirrors the psychophysical evidence that stereo performance declines as vertical disparity increases (Duwaer & van den Brink, 1982; Farell, 2003; McKee et al., 1990; Westheimer, 1978, 1984).

^{1}). At first sight, this appears a fatal flaw, ruling out almost all the well-known illusions of vertical disparity, such as the induced effect. But in fact, these illusions depend on the interaction between vertical disparity applied to the stimulus and vertical disparity due to eye position. These reinforce or cancel, depending on their sign, outside the organism, resulting in a characteristic pattern of vertical disparity magnitude—and hence binocular correlation—across the visual field. A visual system containing only horizontal-disparity sensors can deduce gaze angle and vergence from this pattern, and we demonstrate that such a system is subject to the induced effect. Thus, in fact, our model visual system can experience all the illusions of vertical disparity demonstrated to date. Furthermore, the fact that such information has to be derived from the pattern of sensor response across large regions of the visual field provides a possible reason why vertical disparities, unlike horizontal ones, are pooled over large regions of the visual field (Adams et al., 1996; Howard & Pierce, 1998; Kaneko & Howard, 1996; Stenton et al., 1984). Hence, both classes of psychophysical phenomena could potentially be mediated solely by activity in horizontal-disparity sensors.

*horizontal*-disparity detectors? We shall show that, for all experiments published to date, the answer seems to be yes. Vertical disparity in the stimulus reduces the effective binocular correlation sensed by a population of horizontal-disparity detectors like the one sketched in Figure 2. This allows one to deduce a local map of the unsigned magnitude of vertical disparity. Given this map of magnitudes, we show that the global constraints on stereo geometry make it possible to infer the signs; thus, the full vertical-disparity field—at least for disparities generated by physically possible stimuli—could potentially be deduced from the activity of purely horizontal-disparity detectors. Hence, both classes of psychophysical phenomena could potentially be mediated solely by activity in horizontal-disparity sensors. We conclude that the existing evidence does not conclusively demonstrate that the visual system contains detectors tuned to nonzero vertical disparity. However, without such detectors, then it should be possible to recreate the effects of vertical disparity by suitably manipulating the binocular correlation. So far, we have not been able to achieve this. We suggest that this failure is the most compelling evidence to date that the visual system really does encode vertical disparity.

*anatomical correspondence*. We define anatomical correspondence such that, when the eyes are in primary position, objects at infinity project to anatomically corresponding points in the two eyes. We define our retinal coordinate frame as drawn in Figure 3A. This employs a Cartesian coordinate system (

*x, y*) on an imaginary plane tangent to the retina at the fovea. Any point

*P*on the retina can be mapped onto this planar retina by drawing a line from the nodal point through the point

*P*and seeing where it intersects the plane (red line in Figure 3A). To describe the position of any point on the retina in angular coordinates, we use (

*x, y*), as shown in Figure 3B. For example, the blue lines in Figure 3A shows

*stereo correspondence*are viewing the same object in space. The retinal disparity is the difference between the retinal coordinates of stereoscopically corresponding points. For example, if an object projects to (

Symbol | Description | Application |
---|---|---|

C _{stim}( x _{c}, y _{c}) | Binocular correlation of the stimulus, as a function of position on the cyclopean retina | Equations 17 and 19 |

C | Effective binocular correlation sensed on average by a cell | Equations 18 and 19 |

D | Vergence angle, H _{R} − H _{L} | Figure A1(B) and Equation 1 |

D _{1/2} | Half the vergence angle, ( H _{R} − H _{L})/2 | |

Δ x | Horizontal position disparity, in distance on a planar retina, x _{R} − x _{L} | Equation 13 |

Δ x ^ | Horizontal angular disparity, in degrees, x ^ R − x ^ L | Equation 10 |

Δ y | Horizontal position disparity, in distance on a planar retina, y _{R} − y _{L} | |

Δ y ^ | Horizontal angular disparity, in degrees, y ^ R − y ^ L | Equation 10 |

f | Focal length of eyes | Figure A1(B) and Equation 7 |

H _{c} | Cyclopean gaze direction, ( H R + H L ) / 2 | Equation 2 |

H, H _{L}, H _{R} | Helmholtz azimuthal angle, Helmholtz azimuthal angle of the left eye, Helmholtz azimuthal angle of the right eye, respectively, in degrees to the left | Figure A1(B) and Equation 3 |

I _{1/2} | Half the interocular distance | Figure A1(B) |

V, V _{L}, V _{R} | Helmholtz elevation, Helmholtz elevation of the left eye, Helmholtz elevation of the right eye, respectively, in degrees downward | Equation 3 |

X | Horizontal position in head-centered space, in Cartesian coordinates | Figure A1(A) and Equation 8 |

X ^ | Horizontal position in head-centered space, in degrees to the left | Figure A1(A) and Equation 8 |

x | Horizontal retinal position, in distance on a planar retina | Figures 3, A1(B), and A1(C) and Equations 4 and 7 |

x _{c} | Horizontal cyclopean location, in distance on a planar retina, ( x R + x L ) / 2 | |

x ^ | Angular vertical retinal position, in degrees | Figures 3, A1(B), and A1(C) and Equation 7 |

x ^ c | Horizontal angular cyclopean location, in degrees, ( x ^ R + x ^ L ) / 2 | Equation 11 |

Y | Vertical position in head-centered space, in Cartesian coordinates | Figure A1(A) and Equation 8 |

Y ^ | Vertical position in head-centered space, in degrees above the horizontal | Figure A1(A) and Equation 8 |

y | Vertical retinal position, in distance on a planar retina | Figures 3, A1(B), and A1(C) and Equations 4 and 7 |

y ^ c | Vertical angular cyclopean location, in degrees, ( y ^ R + y ^ L ) / 2 | Equation 11 |

Y | Vertical position in head-centered space, in Cartesian coordinates | Figures 3, A1(B), and A1(C) and Equation 7 |

Y ^ | Vertical position in head-centered space, in degrees above the horizontal | Figure A1(A) and Equation 8 |

Z | Distance in front of observer, in Cartesian head-centered coordinates | Figure A1(A) |

*cyclopean retina*. To begin with (Figure 9), we consider simplified units with Gaussian receptive fields, representing the overall activity of neurons with many different orientations and phases. Later (Figures 10, 11, and 12), we consider more realistic units with Gabor receptive fields, in which different orientations and phases are explicitly represented.

*p*of being the same contrast (both black or both white) and probability 1 −

*p*of being opposite contrasts (one black, one white). The binocular correlation of the stimulus is

*C*

_{stim}= 2

*p*− 1. To see how

*C*

_{stim}can be estimated from the output of an energy-model neuron, recall that the response of an energy-model unit is (

*L*+

*R*)

^{2}, where

*L*and

*R*are the outputs from the left- and right-eye receptive fields, respectively (see the 1 for details). This can be divided into two components: a sum of monocular terms

*M*=

*L*

^{2}+

*R*

^{2}and a binocular component

*B*= 2

*LR*. We assume that the visual system is able to keep track of both these components separately. This could be done, for example, by differencing the outputs of matched tuned-excitatory and tuned-inhibitory neurons to estimate

*B*and summing the same outputs to estimate

*M*.

*B*/

*M*provides a measure of the binocular correlation sensed by the neuron. For example, if 〈

*B*〉 and 〈

*M*〉 represent, respectively, the expected value of the binocular and monocular components, averaged over many random-dot patterns with the cell's preferred disparity, then the ratio 〈

*B*〉/〈

*M*〉 will be equal to the binocular correlation

*C*

_{stim}of the stimulus. In this article, we consider only stimuli with 100% correlation. In this case, if the disparity of the stimulus perfectly matches the disparity tuning of the cell, 〈

*B*〉/〈

*M*〉 will be 1. If there is a mismatch between the cell's preferred disparity and the disparity of the stimulus, then the value of 〈

*B*〉/〈

*M*〉 will be smaller, reflecting the smaller effective binocular correlation of the stimulus within the receptive field (although the stimulus correlation at the correct disparity is still 100%). For a sensor with Gaussian receptive fields, like that shown in Figure 2, the value of 〈

*B*〉/〈

*M*〉 falls off as a Gaussian function of the difference between the cell's preferred disparity and that of the stimulus, with a standard deviation equal to √2 times the standard deviation of the receptive field ( Equation 19; Figure 9D).

*B*〉/〈

*M*〉, where 〈

*B*〉 and 〈

*M*〉 represent the

*expected*value of the binocular and monocular components, respectively, averaged over all possible random-dot stimuli. Obviously, this is not available to the brain when it views a single stimulus. For any individual neuron responding to a single random-dot image, the value of

*B*/

*M*is extremely “noisy,” reflecting random variations in the pattern of black and white dots. This means that an estimate of eye position that uses only one neuron at each point in the visual field is noisy and unreliable. However, at each position in the visual field, the brain contains a multitude of neurons tuned to a range of orientations, spatial frequencies, pattern of ON/OFF regions, and so forth. Combining information from all these neurons greatly improves the estimate of binocular correlation and, hence, of eye position. To demonstrate this, in our later simulations ( Figures 10, 11, and 12), we calculate the responses of 30 neurons at each point on the cyclopean retina, covering three preferred orientations and 10 preferred phases (see the 1 for details). We calculate the total binocular component, Σ

_{n}

*B*

_{n}, and monocular component, Σ

_{n}

*M*

_{n}, for these neurons and estimate the binocular correlation from their ratio, (Σ

_{n}

*B*

_{n})/(Σ

_{n}

*M*

_{n}). This is far less noisy than the ratio

*B*

_{n}/

*M*

_{n}for any one neuron (Figure 10) and approximates the expected value 〈Σ

_{n}

*B*

_{n}〉/〈Σ

_{n}

*M*

_{n}〉.

*horizontal*correspondence problem can still be solved from a population of purely horizontal-disparity detectors. Roughly speaking (ignoring the problem of false matches, which arises when the stimulus disparity is not constant), the horizontal disparity of the stimulus can be deduced from the preferred horizontal disparity of the units that are reporting the largest binocular correlation. Any vertical disparity in the stimulus will reduce the size of this peak binocular correlation but will not affect which sensor is reporting the peak. In practice, for a realistic visual scene containing objects at different depths, the false-matching problem is nontrivial and requires additional constraints such as a prior preference for smooth surfaces. However, this need not concern us here. It is clear that the brain is able to solve this correspondence problem with great accuracy, and the important point is that any vertical disparity in the stimulus need not affect the solution of the horizontal correspondence problem. Thus, we can assume that the brain has access to the horizontal-disparity field of the stimulus.

*expected*value of 〈Σ

_{ n}

*B*

_{ n}〉/〈Σ

_{ n}

*M*

_{ n}〉, where the angle brackets represent averaging over all possible random-dot patterns with the given disparity field, and compare this to the actual value (Σ

_{ n}

*B*

_{ n})/(Σ

_{ n}

*M*

_{ n}), which our neuronal population gave us for the particular random-dot pattern to which it was exposed. Our fitting routine searches for the eye position that best predicts the observed pattern of response magnitudes.

*not*evidence that the visual system contains a population of vertical-disparity detectors. Before we demonstrate this, it will be helpful to review the current literature on how the induced effect produces its depth illusion.

^{2}The vergence can also be deduced, from the rate at which vertical disparity increases away from this locus. Numerous psychophysical studies show that the brain makes some use of the vertical-disparity field in calibrating the information available from horizontal disparity.

*stimulus*. This combines with the vertical-disparity field caused by the viewing geometry—if the eyes are not in primary position—to yield the vertical disparity actually experienced on the retina. Once this is realized, the similarities between the induced effect and oblique gaze become clear. This is illustrated first of all in Figure 5C. Here, as in Figure 5A, the eyes are fixating the center of the square, on the midline. But now, the square presented to the left eye has been magnified vertically: Each

*Y*coordinate has been multiplied by 1.08. The plot at the bottom of Figure 5C shows the retinal image in the two eyes; the red dotted lines show the original, unmagnified image for comparison. Note that the vertical magnification has shifted the locus of zero vertical disparity. Whereas before, the red and blue lines crossed on the vertical meridian, now they cross to the right of the vertical meridian, just as if the eyes were gazing to the left ( Figure 5B). Thus, it is already clear that vertically magnifying one eye's image, as in the induced effect, mimics oblique gaze (Mayhew, 1982; Ogle, 1964).

*y*= 0. However, where the vertical bar crosses the

*X*-axis depends on gaze angle. As we saw in Figures 5 and 6, the horizontal location of this locus of zero vertical disparity reveals whether the eyes are looking to right or left of center. This is clearly visible in Figure 9—compare the location of the peak response in the top row, where the eyes are looking left, with that in the bottom row, where they are looking right. The convergence state is also encoded in this correlation field. When the eyes are strongly converged, as in the bottom of Figure 9, the correlation falls off steeply from its peak; where the convergence is less, the rate of falloff is slower. Note that in Figure 9D, the color scales are different for the two rows. However, the contour lines marking vertical disparity are drawn at the same values (multiples of 0.1°) in both cases. The fact that the contour lines are much closer in the bottom row shows that the rate of change is steeper where the eyes are more converged. The rate of falloff depends both on vergence and receptive-field size: Vergence determines the rate of increase of vertical disparity ( Figure 9C), whereas receptive-field size determines how much a particular vertical disparity reduces correlation ( Equation 20). However, if we know the sensors' receptive-field size, we can read off both gaze angle and vergence state from the correlation field in Figure 9D.

*average*response over many random-dot patterns, of which Figure 8 is just one example. In reality, the visual system usually has only one stimulus available from which to deduce eye position. Thus, we have yet to demonstrate that eye position can be reliably recovered under these circumstances. In practice, neither of these shortcomings is serious. The Gaussian receptive fields used in Figure 9D can be regarded as representing the sum of receptive fields tuned to many different orientations and phases. Rather than averaging the response of a single sensor over many images, the visual system can reduce variation by averaging the response of many sensors to a single image. Hence, including a realistic range of neuronal receptive fields also solves the problem of noise.

*disparity*was in every case zero). As explained in the 1, the response of energy-model units can be divided into a “binocular” component

*B*and a “monocular” component

*M*. To obtain a measure of binocular correlation corresponding to that shown in Figure 9D, but for a single random-dot pattern, we calculate the values of

*B*and

*M*for every neuron in the population and then divide the sum of all the

*B*s by the sum of all the

*M*s (see the 1, Equation 24). The result is shown in Figure 10B. As in Figure 9, the top row is for a gaze angle of −2° and a vergence angle of 3.5°, whereas the bottom row is for a gaze angle of 5° and a vergence angle of 8°. The color scales are the same for all panels in the same row. For comparison, Figure 10A shows the

*B*/

*M*ratio for a single neuron in the population. Because of the chance pattern of black and white dots in the stimulus, this is so noisy that it carries very little information about the eye posture. In contrast, Figure 10C shows the value we would expect to obtain if we averaged over all possible random-dot patterns (Equation 17), completely removing all stimulus-related variation. Clearly, summing over 30 neurons (Figure 10B) has greatly reduced the variability experienced with just 1 neuron (Figure 10A). The response to a single random-dot pattern (Figure 10B) is now very similar to the expected result of averaging the responses to all possible random-dot patterns (Figure 10C)—and what is important is that it allows us to deduce gaze direction and vergence.

*SEM*), respectively. For the example where the true values were 5.0° and 8.0°, the fitted values were 5.0° ± 0.2° and 8.5° ± 0.1°, respectively. The accuracy of the gaze angle measurement was largely limited by the receptive-field size (

*SD*of Gabor envelope was 1°).

*D*= 15°, gaze angle is recovered to better than 0.5°. With small vergence angles, small gaze angles can still be recovered accurately: When

*D*= 3.5°, the gaze angles of −2° is recovered with a mean absolute error of 0.7°. However, for large gaze angles, there are significant errors: The two gaze angles >10° are recovered with a mean error of 4° for this small vergence. This is because, as the vergence approaches zero with a large gaze angle, the locus of zero vertical disparity no longer falls within the central 20° simulated here. Vertical disparity and, hence, effective correlation vary progressively less as a function of horizontal position on the retina, and therefore, the fit becomes less and less constrained. However, there is no evidence that the visual system can recover large gaze angles with this accuracy from retinal information; hence, this way of extracting gaze parameters is certainly accurate enough to explain the available psychophysics. In Figure 11 (right panel), the fitted vergence is shown as a function of the actual vergence for four different gaze angles. Vergence is recovered to within 0.5° or so. There is a slight bias: Vergence is systematically overestimated. This may reflect inaccuracies in the fitting assumptions (the least squares fit assumes that errors above and below the expected value are equally likely, which is not the case), as well as the deficiencies of the approximate expression used in the fitting algorithm ( Equation 23 in place of the correct expression, Equation 17). Nevertheless, these results clearly demonstrate that both gaze angle and vergence can be accurately estimated from the activity in a realistic population of neurons, all tuned to zero vertical disparity.

*Y*) of the dots on the screen was then multiplied by √

*M*in the stimulus presented to the right eye and divided by √

*M*in the left eye. We then calculated the response of the sensor population to this stimulus and passed this to the fitting algorithm. Sample results are shown in Figures 12A and 12B, where

*M*= 1.01. As in Figure 10, at each point on the cyclopean retina, the color shows the response of the sensor that is tuned to the horizontal disparity of the stimulus (although the stimulus here is frontoparallel, its disparity is nonzero in the periphery due to the curvature of the horopter). The heavy black lines show the retinal horizontal and vertical meridians, whereas the dashed line marks the locus of zero vertical disparity on the retina. Figure 12A shows the correlation calculated from the response of 30 neurons, tuned to different orientations and phases, to a single random-dot pattern. Figure 12B shows the expected correlation that would be obtained if we averaged over all random-dot induced-effect stimuli. Because of the magnification, the region of peak response is shifted away from the vertical meridian, mimicking the effect of oblique gaze. Accordingly, given the population response shown in Figure 12A, our fitting algorithm returned a value of

*H*

_{c}= −6.9°, although the actual gaze angle was zero.

*H*

_{c}= 0°). Of course, this gives the actual location of the simulated dots in space: on a frontoparallel screen. Figure 12D, on the other hand, shows the visual scene reconstructed using the erroneous estimated gaze angle,

*H*

_{c}= −6.9°. The dots now lie on a plane that is slanted away from frontoparallel. This explains the slanted percept experienced in the induced effect.

*this*ability demonstrates that the visual system contains a significant population of vertical-disparity detectors, tuned to a range of vertical disparities.

*x*= 0. But when the eyes are misaligned vertically, the intersections move away from the vertical meridian. For right hypervergence ( Figure 13B), the top intersection moves to the left of the retina, whereas the bottom intersection moves to the right. However, for left hypervergence ( Figure 13C), this pattern is reversed. Now, the locus of zero vertical disparity occurs on the top right and bottom left of the retina. Thus, from tracking the locus of zero vertical disparity, we can deduce the sign of the vertical vergence error.

*C*

_{max}= 0.96 for sensors tuned to the horizontal disparity of the stimulus. From Equation 21, we deduce that vergence error is causing a vertical disparity of 2

*σ*√ln(

*C*

_{max}

^{−1}) = 0.2°, where

*σ*is the standard deviation of the Gaussian RFs used in the simulation, 0.5°. Thus, we have correctly obtained the magnitude of the vergence error. Its sign can be deduced from the location of the peaks in the population response: If they are on the top left and bottom right of the retina, the vergence is negative. Gaze angle and vergence can also be deduced. To obtain gaze angle, we locate the vertical line along which the response is approximately constant at the same value, 0.96, as it had on the horizontal meridian. The position of this line, here −2°, gives the azimuthal gaze angle. Vergence can be deduced from the rate of change of response away from this cross-shaped contour of constant activation. We have not considered an example with elevation, but it is easy to see qualitatively how this would work. With elevation, the horizontal contour along which correlation is approximately constant would be shifted upward or downward from the horizontal meridian. The amount of this shift would indicate the elevation, and the rest of the calculation would proceed in an analogous way.

*retinal*vertical disparities significantly different from zero. Thus, for the moment at least, our model must stand or fall by psychophysical evidence.

*perceptual*consequences of vertical disparity could all be due to its effect on these detectors, produced via an effective reduction in binocular correlation. Vertical vergence eye movements are supported by a very small population of vertical-disparity detectors at the fovea, which are of little use for perception because vertical disparity is always zero at the fovea once correct alignment has been achieved. This accords with evidence that vertical disparity is more potent at eliciting vergence movements if it is closer to the fovea (Howard et al., 2000). It also explains the different pattern of saccades to peripheral targets with horizontal versus vertical disparity. Under normal viewing conditions, the vertical disparity at each location in the visual field can be predicted from a knowledge of the eyes' position and stereo geometry. The brain takes advantage of this and programs saccades to peripheral targets with the appropriate vertical vergence, based on the vertical disparity that is expected at that location in the visual field. If this vertical vergence turns out to be incorrect, a new “expected vertical disparity map” can be learnt quite rapidly (McCandless, Schor, & Maxwell, 1996). In contrast, no such open-loop programming exists for saccades to horizontally disparate peripheral targets. Here, the horizontal vergence (prior to a saccade) is appropriate to the individual target and does not have to be learnt (Collewijn, Erkelens, & Steinman, 1997; Rashbass & Westheimer, 1961). This strongly suggests that the oculomotor system has access to a detailed local map of horizontal disparity, measured instantaneously across the whole visual field. In contrast, for vertical disparity, the oculomotor system has access only to a remembered map, built up gradually from measurements made at the fovea. While doubtless an oversimplification, this version of our model explains all existing psychophysical and physiological data in a very economical way.

*σ,*a vertical disparity of Δ

*y*is roughly equivalent to reducing the binocular correlation by a factor of exp(0.25Δ

*y*

^{2}/

*σ*

^{2}). Thus, it is not possible to reproduce the effects of vertical disparity with binocular correlation in a broadband image because the reproduction will not agree across scales. Even if the image is filtered, it is impossible to stimulate just one spatial frequency/orientation channel; hence, one would expect the illusion to be less compelling than in the real induced effect. Therefore, failing to mimic the induced effect in this way still leaves open the possibility that vertical disparity is equivalent to decorrelation within a single channel.

*not*simply mediated by binocular correlation. Because we have shown here that the other perceptual effects of vertical disparity can be explained purely by pure horizontal-disparity detectors, this null result, if confirmed, would be the first conclusive perceptual evidence that the stereo system does contain vertical-disparity detectors. It therefore warrants further investigation.

*X, Y, Z*), whose origin is at the midpoint between the two eyes' nodal points (Figure A1).

*Z*is the depth axis (

*Z*increases with distance from the observer),

*Y*is the vertical axis (

*Y*increases as the object moves upward), and

*X*is the horizontal axis (

*X*increases as the object moves leftward).

*H*is the angle by which the eye's optic axis is rotated about an axis passing through the nodal point and parallel to the

*Y*-axis (Figure A1(B)). Positive values of

*H*indicate that the eye is turned to the left. When the eyes are converged,

*H*will be different for the two eyes. We use subscripts to denote the value for individual eyes:

*H*

_{L},

*H*

_{R}. In expressions that could apply to either eye, we shall write

*H*without any subscript; it should then be understood that

*H*should be replaced with

*H*

_{L}to obtain an expression valid for the left eye and with

*H*

_{R}for the right eye.

*XZ*plane. However, when considering vertical vergence errors, we shall need the Helmholtz elevation angles,

*V*

_{L},

*V*

_{R}, describing the angle by which each eye's axis is rotated about the

*X*-axis. Positive values of

*V*indicate that the eye is looking down. In the Helmholtz coordinates we use, this elevation is applied

*after*the azimuthal rotation. For the eyes to be correctly fixated, their Helmholtz elevations must be the same:

*V*

_{L}=

*V*

_{R}. If the Helmholtz elevations are different, then the gaze rays of the eyes do not intersect (even at infinity), and there is a vertical vergence error. In our previous article (Read & Cumming, 2004), we did not allow for this possibility and only considered the case

*V*=

*V*

_{L}=

*V*

_{R}.

*R*

_{H}represents the eye's azimuthal rotation about the

*Y*-axis, and

*R*

_{V}represents its elevation about the

*X*-axis. Their product

*R*represents the final position of the eye (the order is important; as mentioned above, the elevation in our coordinate system is applied after azimuthal rotation). Obviously, to obtain matrices for each eye,

*H, V*in these expressions must be replaced with

*H*

_{L},

*V*

_{L}or

*H*

_{R},

*V*

_{R}as appropriate. As an example of how these matrices are used, consider finding the direction of the optic axis. In primary position, the eye's optic axis is parallel to the

*Z*-axis and may be represented by the vector

**Z**= (0,0,1). With azimuth

*H*and elevation

*V,*the optic axis is parallel to the vector R

**Z**.

*x, y*). When the eye is in primary position (

*H*= 0), the

*x*- and

*y*-axes are parallel to the

*X*- and

*Y*-axes, respectively. An object at

**P**= (

*X, Y, Z*), such as the red point in Figure A1(B), projects to the point (

*x*

_{L},

*y*

_{L}) on the left retina and to the point (

*x*

_{R},

*y*

_{R}) on the right. The image coordinates (

*x, y*) may be expressed very simply in terms of the eye's rotation matrix.

**X**,

**Y**, and

**Z**are unit vectors along each of the axes.

**N**is a vector representing the nodal point of the eye. The rotation matrix R was given in Equation 3. When evaluating this expression for a particular eye, the appropriate values of

**N**and R must be used. For the left eye,

**N**

_{L}= (

*I*

_{1/2}, 0, 0), and for the right,

**N**

_{R}= (−

*I*

_{1/2}, 0, 0), compare Figure A1(B).

*f*is the focal length of the eye, and

*I*

_{1/2}is half the interocular distance. To obtain R for the left eye, replace

*H, V*with

*H*

_{L},

*V*

_{L}in Equation 3.

*V*

_{L}=

*V*

_{R}= 0. In this case,

*R*=

*R*

_{H}, and the retinal coordinates of the images of an object at (

*X, Y, Z*) are:

*M,*and shrinking the left eye's image by 1/√

*M*. For induced-effect stimuli, therefore,

*Y*in Equation 6 should be replaced with

*Y*/√

*M*for the left eye and

*Y*√

*M*for the right eye.

*X, Y, Z*) and retinal coordinates (

*x, y*) are in units of distance. As we shall see below, these are convenient mathematically. However, it is more usual in visual science to present results in degrees. Figure 3 showed how retinal coordinates could be expressed as angles:

*f*is the focal length of the eye. Similarly, the direction to an object in space can be expressed as

*M,*we obtain

*x*

_{c}and horizontal disparity Δ

*x*of each point. We do this by inverting Equation 5, expressing

*X*and

*Z*in terms of

*x*

_{L}and

*x*

_{R}. We obtain

*x*

_{c}= (

*x*

_{L}+

*x*

_{R})/2, Δ

*x*=

*x*

_{R}−

*x*

_{L},

*H*

_{c}= (

*H*

_{L}+

*H*

_{R})/2,

*D*=

*H*

_{R}−

*H*

_{L}):

*H*

_{c}and vergence

*D,*we of course reconstruct the actual position in space of the object whose images fell at

*x*

_{L},

*x*

_{R}in the two retinae ( Figure 12A). If we use the estimates of

*H*

_{c},

*D*derived from fitting the neuronal responses (cf. Equation 23), we can reconstruct the position as it would be perceived by the visual system ( Figure 12D).

*XZ*plane; that is, Helmholtz elevation is zero for both eyes, and we work in positional, rather than angular, coordinates. We define positional vertical disparity to be

*y*

_{L}and

*y*

_{R}from Equation 6 and then eliminating the object's vertical position

*Y,*Equations 13 and 14 yield the following relationship between vertical cyclopean position and disparity in positional coordinates:

*X*and

*Z*as a function of

*x*

_{L}and

*x*

_{R}on the planar retina, we obtain

*x*

_{L},

*x*

_{R}with the cyclopean location and disparity:

*x*

_{L}=

*x*

_{c}− Δ

*x*/2,

*x*

_{R}=

*x*

_{c}+ Δ

*x*/2. Then, substituting these expressions into Equation 15 and simplifying, we obtain

*x*

_{c},

*y*

_{c}), given that the horizontal disparity at that position is Δ

*x*(

*x*

_{c},

*y*

_{c}), the Helmholtz elevation is zero, and the Helmholtz azimuths are

*H*

_{L},

*H*

_{R}.

*XZ*plane project to the horizontal meridian on the retina, irrespective of the eyes' azimuthal gaze directions or the position of the object within the

*XZ*plane. Thus, vertical disparity is zero along the horizontal retinal meridian,

*H*

_{c}. This is a slight approximation, and we here do a more rigorous analysis.

*zero*disparity will be the same for both types of disparity; thus, we can exploit the relatively simple expression we were able to derive in positional coordinates to deduce the conditions under which angular vertical disparity is zero. From Equation 16, we find that vertical disparity is zero when either

*not*use this approximation. The predicted vertical-disparity field was calculated exactly, using Equation 16, and optimization was performed on the whole field, not just the locus of zero disparity.

*x, y*) rather than the more intuitive angular coordinates (

*x*

_{c},

*y*

_{c})—say (1, 2)—the correlation is

*C*

_{stim}= 0.8 and the 2D disparity is Δ

*x*

_{stim}= 0.4, Δ

*y*

_{stim}= 0.02. The disparity means that the pixel at (

*x*

_{c}− Δ

*x*

_{stim}/2,

*y*

_{c}− Δ

*y*

_{stim}/2) = (0.8, 1.99) in the left eye corresponds to the pixel at (

*x*

_{c}+ Δ

*x*

_{stim}/2,

*y*

_{c}+ Δ

*y*

_{stim}/2) = (1.2, 2.01) in the right eye. If the stereogram were perfectly correlated, then these pixels would therefore be the same, either both white or both black; thus, their product would always be 1 (taking white to be +1 and black to be −1). In fact, the correlation is only 80% at that point in the image; hence, the expected value of their product is only 80% (i.e., there is a 90% chance that both pixels are black or both are white, but a 10% chance that they have opposite polarities).

*x*

_{pref},

*y*

_{pref}), and their horizontal position disparity defines their preferred stimulus disparity, Δ

*x*

_{pref}. The RFs always have the same vertical location

*y*; thus, their preferred vertical disparity is zero.

*L*+

*R*)

^{2}. This full-squared output can be thought of as the combined outputs of a push–pull pair of simple cells, each of which computes a half-squared output. We used tuned-excitatory units, for which the receptive-field profiles are identical in the two eyes, differing only in their horizontal position. Thus, the inputs from the two eyes are

*I*

_{L}(

*x, y*) and

*I*

_{R}(

*x, y*) are the images on the two retinae. These are expressed relative to the mean luminance, so that

*I*(

*x, y*) is positive for bright features and negative for dark.

*ρ*(

*x, y*) is a receptive-field profile centered on zero. For an individual unit, this profile is displaced on the retina depending on the unit's preferred horizontal disparity and cyclopean position. (

*x*

_{pref},

*y*

_{pref}) is the unit's preferred cyclopean location on the retina. Δ

*x*

_{pref}is its preferred horizontal disparity; the centers of the left and right receptive fields feeding into the unit are offset horizontally from one another by Δ

*x*

_{pref}, giving the unit its disparity tuning. In our simulations, we consider only units tuned to the stimulus horizontal disparity, so that Δ

*x*

_{pref}= Δ

*x*

_{stim}(

*x*

_{pref},

*y*

_{pref}).

*M*=

*L*

^{2}+

*R*

^{2}, and a binocular term

*B*= 2

*LR*. When the stimulus is 100% correlated and the unit is viewing corresponding regions of the image in its two receptive fields, then

*L*=

*R,*and thus, these two terms become equal:

*M*=

*B*. In general, for images with arbitrary disparity and correlation, we can calculate the expected values, 〈

*B*〉 and 〈

*M*〉, where the average is taken over many different random-dot patterns with the same disparity and correlation fields:

*x*

_{c},

*y*

_{c}represent position on a cyclopean retina. Δ

*x*

_{stim}(

*x*

_{c},

*y*

_{c}) and Δ

*y*

_{stim}(

*x*

_{c},

*y*

_{c}) are the horizontal- and vertical-disparity fields of the stimulus, and

*C*

_{stim}(

*x*

_{c},

*y*

_{c}) its binocular correlation. Note that all three are allowed to vary as a function of position on the cyclopean retina; that is, these expressions are not restricted to frontoparallel stimuli. Similar expressions were derived in Prince, Pointon, Cumming, and Parker (2002, p. 206) and Read and Cumming (2003, p. 2814). Although we have generalized to allow for varying vertical- and horizontal-disparity fields and for varying binocular correlation, the details of the derivation are sufficiently similar that it does not seem worth reproducing them.

*C*= 〈

*B*〉/〈

*M*〉, for Gaussian receptive fields:

*C*has a particularly simple form:

*C*=

*C*

_{stim}(

*x*

_{pref},

*y*

_{pref}). However, notice that any mismatch between the sensor's preferred disparity and that of the stimulus causes a reduction in response. The response falls off as a Gaussian function of the disparity mismatch, with

*SD*equal to √2 that of the Gaussian RF. A population of these correlation detectors, which included all possible horizontal and vertical disparities, would encode both the local 2D disparity and the local correlation of the stimulus. Roughly speaking—ignoring the complexities of the correspondence problem—at each position on the cyclopean retina (

*x*

_{c},

*y*

_{c}), the local correlation

*C*

_{stim}(

*x*

_{c},

*y*

_{c}) would be given by the response of the maximally responding sensor tuned to that cyclopean position (i.e., with

*x*

_{pref}=

*x*

_{c},

*y*

_{pref}=

*y*

_{c}), and the local disparity would be given by the disparity tuning of that maximally responding sensor (i.e., Δ

*x*(

*x*

_{c},

*y*

_{c}) = Δ

*x*

_{pref}, Δ

*y*(

*x*

_{c},

*y*

_{c}) = Δ

*y*

_{pref}). The model stereo system considered here falls short of this in that the population contains only horizontal disparity sensors. Thus, the horizontal disparity of the stimulus can still be deduced from the response of the maximally responding sensor, but the vertical disparity and binocular correlation are confounded. A maximal response of

*C*

_{stim}=

*C*

_{max}or that it has 100% binocular correlation and a vertical disparity of magnitude

*average*response, averaged over all binary noise stimuli. For any individual noise stimulus, the value of the energy-model components

*B*and

*M*may be quite different. This leads to considerable noise in the field

*B*/

*M*for any individual stimulus. This noise only affects regions of the image where there is significant vertical disparity. Along the locus of zero vertical disparity, and because we are considering only sensors tuned to the horizontal disparity of the stimulus, the receptive fields in each eye are viewing corresponding regions of the visual scene. This means that although the output from each eye,

*L*and

*R,*fluctuates depending on the particular pattern of black and white dots,

*L*is always equal to

*R,*because each eye always sees the same dot pattern as the other eye. Thus,

*B*/

*M*= 2

*LR*/(

*L*

^{2}+

*R*

^{2}) is always equal to 1. Thus, the locus of zero vertical disparity and, hence, the gaze angle can be reliably deduced even from the response of a single sensor to a single image. However, estimates of vergence are much more seriously affected. The estimate of vergence depends on measuring how rapidly the effective binocular correlation falls off from its peak value of 1 along the locus of zero vertical disparity. Away from this locus, vertical disparity in the stimulus means that the receptive fields are not, in general, seeing exactly corresponding regions of the image. This means that

*L*and

*R*are not quite equal, even if the sensor's horizontal disparity is matched to that of the stimulus. Not only is the mean value 〈

*B*〉/〈

*M*〉 less than 1, but the actual value

*B*/

*M*for any individual image is very noisy. This makes the estimates of vergence returned by fitting very unreliable.

*B*〉 and 〈

*M*〉 over all the receptive fields used in the population. In practice, these expressions are too slow to evaluate for use in a fitting algorithm, because they involve an integration over the entire cyclopean retina. However, excellent results are obtained if we make the approximation that the stimulus disparity remains constant across the receptive field (the stimulus correlation is assumed to be constant at 1). We use receptive fields that are 2D Gabor functions with an isotropic Gaussian envelope. Thus, for the

*n*th unit in the population:

*θ*

_{ n}is the preferred orientation of the

*n*th neuron, and

*φ*

_{ n}is its overall phase (note that the phase of the Gabor is the same in both eyes; thus the phase disparity is always zero). Under the assumption of constant stimulus disparity, it can be shown that the expected monocular and binocular components of this energy unit's response are:

*I*

_{ n}and

*J*

_{ n}only have to be calculated once for each neuron in the population; the expected value of 〈

*B*

_{ n}〉 for different eye positions can then be calculated very quickly from Equation 23 (recall that different eye positions imply different vertical-disparity fields Δ

*y*

_{stim}, according to Equation 16).

*x*

_{L j},

*y*

_{L j}) and (

*x*

_{R j},

*y*

_{R j}) at which the

*j*th dot struck each retina. For each sensor, the output from each eye's receptive field was calculated by summing the values of the receptive field at each white dot position and subtracting the values of the receptive field at each black dot position. Thus, for the

*n*th sensor:

*c*

_{ j}is +1 for white dots and −1 for black dots. The monocular and binocular components for each sensor were then computed as

*B*

_{ n}= 2

*L*

_{ n}

*R*

_{ n},

*M*

_{ n}=

*L*

_{ n}

^{2}+

*R*

_{ n}

^{2}. At each cyclopean location shown in Figure 10, we calculated

*B*

_{ n}and

*M*

_{ n}for 30 simple cells, with Gabor receptive fields ( Equation 22). The 30 units were made up of three different orientations (

*θ*= 0°, 60°, 120°) and 10 different receptive-field phases (

*φ*= 0°, 36°, … 288°, 324°). In each case, the spatial-frequency full-width half-power bandwidth was 1.5 octaves, the preferred spatial frequency was 0.3 cpd, and the envelope was an isotropic Gaussian with an

*SD*of 1°. For every binocular unit, the receptive-field profiles were identical in the two eyes. The receptive-field positions differed only horizontally. Each unit was given a horizontal position disparity equal to the stimulus horizontal disparity at the center of its cyclopean receptive field.

*B*

_{1}/

*M*

_{1}for one sensor in the population, with orientation

*θ*= 0° and phase

*φ*= 0°. This is very noisy, reflecting the wide variation depending on the particular pattern of black and white dots experienced by sensors in different parts of the retina. Figure 10B shows what happens if we first sum the binocular and monocular components over all sensors in the population, before taking the ratio, that is, (Σ

_{ n}

*B*

_{ n})/(Σ

_{ n}

*M*

_{ n}). This surface is much smoother. For comparison, Figure 10C shows the expected values, (Σ

_{ n}〈

*B*

_{ n}〉)/(Σ

_{ n}〈

*M*

_{ n}〉), which we would expect to get if we averaged the binocular and monocular components obtained from many different random-dot stimuli. Because we have summed over 30 units with different receptive-field properties, the value obtained from just one random-dot pattern ( Figure 10B) is very similar to the value expected from averaging over all possible random-dot patterns ( Figure 10C).

*H*

_{c}and vergence

*D,*the predicted vertical-disparity field, Δ

*y*

_{pred}, can be obtained from Equation 16. Because the properties of each sensor (

*σ, θ, φ*) are known, approximate expressions for the expected components for each sensor, 〈

*B*

_{ n}〉 and 〈

*M*

_{ n}〉, can be calculated from Equation 23. Recall that this ignores variation in stimulus disparity across a receptive field. The predicted correlation