In the Pulfrich effect, an illusion of depth is produced by introducing differences in the times at which a moving object is presented to the two eyes. In the classic form of the illusion, there is a simple explanation for the depth percept: the interocular delay introduces a spatial disparity into the stimulus. However, when the moving object is viewed stroboscopically, this simple explanation no longer holds. In recent years, depth perception in the stroboscopic Pulfrich effect has been explained by invoking neurons that are sensitive both to stereo disparity and to direction of motion. With such joint motion/disparity encoders, interocular delay causes a perception of depth by causing a shift in each neuron’s preferred disparity. This model has been implemented by N. Qian and R. A. Andersen (1997). Here we show that this model’s predictions for perceived disparity are quantitatively at odds with psychophysical measures. The joint-encoding model predicts that the perceived disparity is the virtual disparity implied by the apparent motion; in fact, the perceived disparity is smaller. We show that the percept can be quantitatively explained on the basis of spatial disparities present in the stimulus, which could be extracted from pure disparity sensors. These results suggest that joint encoding of motion and depth is not the dominant neuronal basis of depth perception in this stimulus.

*v*, has position

*x*when it is first seen by the right eye. By the time this same image reaches the left eye, the image in the right eye will have moved to a new position,

*x*\s+

*vΔt*. At this moment, the left eye’s image is at x, while the right eye’s image is at

*x*\s+

*vΔt*, so there is a spatial disparity,

*\gDx*=

*vΔt*, which is interpreted as depth in the usual way. Thus, the illusion of depth in the classic Pulfrich effect is a consequence of the stimulus and — while confirming that spatial disparity results in a percept of depth — does not inform us further about the neuronal mechanisms of depth perception.

*t*, both eyes see an image of the target in the position it occupied at time

*t*, but the right eye sees this image immediately, while the left eye does not see this image until time

*t*+ Δ

*t*. The difference between this and the classic stimulus is that here the right eye is not presented with an image at time

*t*+ Δ

*t*. The brain must therefore “remember” the right eye’s image from time

*t*to match it with the image that occurs later in the left eye. This stimulus does have the potential to inform us about the neuronal mechanisms involved. For example, the perception of depth in this stimulus depends on the temporal integration period over which stereo matches are possible (Lee, 1970a; Morgan, 1979).

^{2}, contrast 99%, frame rate 96 Hz) viewed via a Wheatstone stereoscope. At the viewing distance used (89 cm), each pixel in the 1280×1024 display subtended 1.1 arcmin, and anti-aliasing was used to render with sub-pixel accuracy. The subjects were all experienced in stereo psychophysics; two were authors and one was a colleague without detailed knowledge of the experiments. No specific instructions were given; specifically, subjects were not instructed to avoid tracking the target.

*T*for the inter-flash interval of the notional stroboscope, and

*X*for the inter-flash distance. The speed of the apparent motion is therefore

*v*=

*X/T*. It will be convenient to express the interocular delay Δ

*t*as a fraction of the inter-flash interval

*T*, and the perceived disparity Δ

*x*as a fraction of the inter-flash distance

*X*; thus, the results (Figure 5–7) show Δ

*x/X*as a function of Δ

*t/T*. We define positive interocular delay to mean that the right eye sees a given stimulus first. We define positive disparity to be far (uncrossed).

*x/X*=Δ

*t/T*. In contrast, we postulate that the perceived disparity is the weighted average of all disparities physically present in the stimulus, considering all appearances of the stimulus in the left eye as possible matches for a given appearance of the stimulus in the right eye. For example, consider the three possible matches in the left eye for the appearance of one target in the right eye shown in Figure 3A. The zero-disparity match (brown) has the shortest interocular time difference, and hence is given greatest weight. As shown in Figure 3B, we assign a weight that falls off as a Gaussian function of the temporal separation: where the time-constant

*τ*represents the binocular integra-tion time. This represents a simple and reasonably realistic approximation to the autocorrelation function of the temporal impulse function, which is what should control the binocular response. In the example of Figure 3, the match with a negative disparity (pink) has smaller temporal separa-tion and hence greater weight than the match with a posi-tive disparity (green), hence the weighted average of all three matches will yield a negative disparity. Averaging over all possible matches, we predict that the disparity perceived in a strobe Pulfrich stimulus is, as a fraction of the strobe inter-flash distance

*X*, where the index

*j*describes each possible match, and

*j*= 0 corresponds to the match that is most nearly simultaneous. In Figure 3, the brown, pink, and green matches are

*j*= 0,

*j*= −1, and

*j*= +1, respectively. Note that because this equation results in a negative (near) perceived disparity when the interocular delay is positive (right eye experiences a given stimulus first), it applies when the target is moving to the left. For target motion to the right, a similar derivation results in the same equation with an overall minus sign.

*x*as a fraction of inter-flash distance

*X*, and interocular delay Δ

*t*as a fraction of inter-flash interval

*T*. The dots lie close to the identity line Δ

*x/X*= Δ

*t/T*, indicating that the perceived disparity Δ

*x*is close to the virtual disparity

*ν*Δ

*t*implied by the apparent motion

*ν*=

*X/T*of the target, just as would be the case in the classic Pulfrich effect if the target were illuminated continuously. These results are in agreement with those re-ported by Burr and Ross (1979), and with the predictions of the joint-encoding model (Qian & Andersen, 1997).

*t*later in the left eye. Although both appearances are presented at the same position on the screen, because the eyes have moved through an angle

*ν*Δ

*t*in this time, the target appears at different positions in the two retinas (Figure 2B). Eye movements would thus introduce a real relative disparity of

*ν*Δ

*t*between target and background, which would naturally be perceived as a difference in depth. Thus, the perceived disparity would equal the virtual disparity

*ν*Δ

*t*, independently of the neuronal mechanisms encoding depth, and in particular independently of whether interocular delay can influence depth perception. Burr and Ross felt that it was unlikely that subjects could track a stroboscopically presented target with the required accuracy. However, detailed recording of eye movements by Morgan and colleagues (Morgan & Turnbull, 1978; Morgan & Watt, 1982, 1983; Ward & Morgan, 1978) suggest that, in fact, such accuracy may be possible.

*same*retinal disparity is applied to both target and background (because both the retinal slip caused by the eye movement, and the interocular delay in the stimulus, are the same for both target and background). The relative spatial disparity between target and background thus remains zero (Figure 2D). Any perceived difference in their depth is due only to the movement of the target across the static background. Thus, the results are unaffected by eye movements. This stimulus therefore allows us to measure the depth that would be perceived in the stroboscopic Pulfrich stimulus if the eyes remained stationary.

*T*= 31 ms, Figure 6A), the perceived disparity is the virtual disparity, so points lie on the identity line. However, at the longer interocular delays, the perceived disparity falls below the virtual disparity, so points fall on a sigmoid curve. This strongly suggests that the results of Experiment 1 were due to tracking eye movements. Traces of this sigmoid pattern are visible in Experiment 1 for

*T*= 125 ms (Figure 5C), perhaps because subjects had difficulty in smoothly tracking the apparent motion of the target at this long inter-flash interval. The results of Experiment 2 suggest that if subjects could keep their eyes perfectly still while viewing the strobe Pulfrich stimulus, they would in general perceive less than the virtual disparity. Note that the difference between the stimuli in Experiments 1 and 2 is very subtle: Only the relative timing (between the eyes) for the appearance of stationary background dots is altered. It is therefore hard to see any explanations for the different results, other than the effects of tracking eye movements.

*T*of the stroboscope is very short compared to the binocular integration time, then the apparent motion of the flashing target is indistinguishable from true continuous motion. The perceived disparity Δ

*x*is thus equal to

*ν*Δ

*t*, where

*ν*is the speed of the apparent motion, as in the classic Pulfrich effect. (2) If, on the other hand, the inter-flash interval is much longer than the integration time, then a given appearance of the target in the delayed eye will either go un-matched, or will be matched only with the immediately preceding appearance in the leading eye, all previous appearances having been “forgotten”; either way, no depth will be perceived and Δ

*x*= 0. From Figure 6,

*T*= 31 ms and

*T*= 125 ms seem to be examples of these respective extremes. We sought to develop this argument to see whether it can quantitatively explain the perceived disparity.

*τ*of the Gaussian weight function represents the integration time of binocular disparity sensors. Our experiments in macaque V1 (Read & Cumming, 2005) suggest that this is around 16 ms.

*τ*= 16 ms derived from the physiology, for the dependence of perceived disparity on interocular delay and inter-flash interval. With no free parameters, this model successfully captures the change from a linear relationship between perceived disparity Δ

*x*and interocular delay Δ

*t*at short inter-flash intervals, to a sigmoid relation-ship at long intervals. In contrast, the joint-encoding model (Qian & Andersen, 1997) predicts that the perceived disparity, Δ

*x*, is always proportional to the interocular delay, Δ

*t*(identity lines in Figure 6), which is clearly not sup-ported by the data.

*τ*as a free parameter results in a marginally better fit with

*τ*= 21ms, shown with the purple curves in Figure 6; the improvement in log likelihood was not significant given the extra parameter.

*λ*= 10% and

*τ*= 16 ms). This prediction is shown with the black curve in Figure 6. Allowing for a contribution from joint encoding has slightly improved the match to experimental data, especially at the longest inter-flash interval

*T*= 125 ms. Note that again the black curve has not been fitted to the experimental data; the parameters

*τ*and

*λ*in Equation 3 were taken from the physiology. If we do fit

*λ*and

*τ*as free parameters, the improvement from the original values (

*λ*= 0,

*τ*= 16 ms, orange curve) is not worth the additional two free parameters (

*χ*

^{2}log likelihood test); similarly, if we keep either

*λ*or

*τ*at the original values and fit the other as a free parameter, there is no significant improvement in fit.

*T*= 63 ms stimulus. At higher interocular delays, subject BC showed a direction-dependent bias: That is, he was more likely to judge the rightward-moving half of the pat-tern to be in front, and the leftward-moving half behind. Our trick of removing a constant bias by looking at the difference in subjects’ judgments when the target is moving left versus when it is moving right obviously fails to deal with this direction-dependent bias. This is why BC’s results lie above the identity line for the largest interocular delays in Figure 7. Fortunately, this problem has little effect on the estimates of integration time (the largest interocular delays for

*T*= 63 ms, Δt/

*T*= 0.5, have no effect on the fit at all, because the fit always intersects the identity line at that point, regardless of the parameters

*τ*and

*λ*). Because the depth percept was weaker, it was impossible to get reliable results at

*T*= 125 ms, so the experiment was performed only at inter-flash intervals of 31 ms and 63 ms. These are in any case the most informative in constraining the integration time.

*λ*= 0%, orange curve) and second with a contribution at the level suggested by the physiological incidence of joint motion/disparity sensors (

*λ*= 10%, black curve). Once again, the agreement is excellent. The estimates of binocular integration time obtained from the present study are also in reasonable agreement with the results of a previous study (Read & Cumming, 2005). There, we measured the decline in stereoacuity as interocular delay was added to dynamic random-dot stereograms with the same spatial structure as the stimuli used in Experiment 3. Stereoacuity declined as a Gaussian function of interocular delay, with a SD of 11 ms (mean for the two subjects BC and HN). Thus, the two independent psychophysical measures in humans are in reasonable agreement with each other and with the results of monkey physiology: All suggest a binocular integration time of around ∼15 ms.

*T*= 31 ms, green). This at first seems paradoxical: Surely perception should be clearest when the stimulus is presented most frequently. To understand this effect, we need to consider the activity in a population of disparity sensors for different inter-flash intervals (Figure 9). At zero interocular delay, the most strongly activated sensors are those tuned to zero disparity. At long inter-flash intervals (Figure 9C and 9F), these are essentially the only active sensors, because the other disparities present in the stimulus are so widely separated in time that they do not cause significant activation. At short inter-flash intervals (Figure 9A and 9D), sensors tuned to many different disparities — integer multiples of the inter-flash distance — on either side of zero become activated as well, although less strongly.

*τ*= 21 ms, purple curve in Figure 6). Two further parameters, representing the amount of noise and its dependence on neuronal activity, are fitted to the threshold data for each subject individually. This model gives a good fit to the observed thresholds for each subject, demonstrating that a reasonably simple and plausible way of implementing disparity averaging with noisy neurons can account for the stereoacuity data.

*T*= 31 ms, 63 ms, and 125 ms, shown in green, blue, and red, respectively). Interestingly, the rate of increase seems to be steeper for the middle value,

*T*= 63 ms, than for the shorter and longer inter-flash intervals. As indicated by the curves in Figure 8, this effect can also be explained by the implementation of disparity averaging developed in the 1. Here we give an intuitive account of how this works.

*T*= 31 ms, Figure 8), the threshold is already high for zero interocular delay, but does not increase much further when an interocular delay is introduced.

*k*is over all members of the population;

*r*

_{k}represents the firing rate of the

*k*

^{th}sensor and Δ

_{k}its preferred disparity.

*r*includes some noise. In real neurons, the variance of firing is not constant, but increases with the mean firing rate (Dean, 1981). We model this by taking the noise on each disparity sensor to be a Gaussian random variable with variance equal to (

*b*

_{k}+

*c*

*r*

_{k}

^{p}), where

*b*

_{k}is the sensor’s base-line noise level, Δ

_{k}is its preferred disparity,

*r*

_{k}is its mean firing rate for this stimulus, and

*c, p, d*are constants (the same for all sensors and stimuli). We assume that

*p*> 0, so that the second term in the variance represents noise that increases with neuronal firing.

*B*

^{2}. In contrast, the second term depends on the stimulus; it is contributed to only by those disparity sensors that are activated by a particular stimulus, because

*r*

*k*= 0 for the other sensors. For a strobe Pulfrich stimulus presented with a real spatial disparity Δ

*x*

_{expt}, as in our nulling experiments, the most active disparity sensors are those tuned to the stimulus disparity Δ

*x*

_{expt}itself. However, because of the periodic nature of the stimulus, sensors tuned to Δ

*x*

_{expt}plus or minus a whole number of strobe inter-flash distances

*X*are also active, although more weakly because the left-eye and right-eye members of the matches they are responding to are separated by longer in time. Thus, the active sensors are those for which Δ

_{k}= (

*jX*+Δ

*x*

_{expt}), and for these

*r*=

*k**w*(

*jT*+Δ

*t*), where w is the weight function describing how the activity of disparity sensors decays as a function of interocular delay (Equation 1). Note that this formulation implicitly assumes that each sensor responds only to its own preferred disparity. A more realistic model would allow neurons to respond less strongly to disparities close to its preferred disparity, implementing spatial as well as temporal filtering. This would be important for tracking the response of the population as the inter-flash interval decreased toward the continuous limit, when the stimulus would contain many closely spaced disparities. However, because in our stimuli the disparity spacing

*X*is at least 0.1°, the simplification is acceptable. The variance

*V*on the denominator of Equation 4 therefore reduces to and the perceived disparity itself is a Gaussian random variable with mean and standard deviation

*x*

_{perc}is positive or negative. In our nulling paradigm, we find how much disparity it is necessary to apply to make the subject answer “near” as often as “far” (i.e., the value of Δ

*x*

_{expt}for which <Δ

*x*

_{perc}>=0). Equation 5 shows that, as required, this nulling disparity is minus the perceived disparity given in Equation 2. We are now finally in a position to extract the disparity threshold θ. This is the amount by which disparity must be increased beyond the nulling value to make the subject answer “far” 84% of the time. But this is just the standard deviation of the perceived disparity when the stimulus disparity Δ

*x*

_{expt}is Δ

*x*

_{null}+θ. For simplicity, we shall ignore the small change in standard deviation caused by changing the stimulus disparity from Δ

*x*

_{null}to (Δ

*x*

_{null}+θ), and take the disparity threshold to be the standard deviation of the perceived disparity when the stimulus disparity Δ

*x*

_{expt}is Δ

*x*

_{null}. Thus, finally, the disparity threshold is approximately where

*B*represents the value of the threshold when the interocular delay is zero and the inter-flash interval

*T*is long compared to the integration time (to see this, note that under these conditions, the

*j*≠ 0 terms in the sums in Equation 7 become negligible).

*c*and

*p*, is critical in making the thresh-old at zero interocular delay increase as the inter-flash interval is reduced, in accordance with the psychophysics. If the noise level were independent of neuronal activity,

*c*= 0, then Equation 7 for zero delay would reduce to

*θ*=

*B*/Σ

_{j}

*w*(

*jT*). As the inter-flash interval

*T*shortens, the sum in this expression increases, and so threshold reduces. If the noise were entirely dependent on signal (

*B*= 0), then the threshold would be zero for large

*T*, and would increase above zero as

*T*decreased. Thus, the dependence of the noise on neuronal firing is critical in enabling the model to reproduce the increase in zero-delay threshold as inter-flash interval decreases.

*B, c, p*, and

*d*, plus the integration time

*τ*implicit in the function

*w*(Equation 1). The value of

*τ*was chosen by fitting the perceived disparity for all subjects simultaneously (τ = 21 ms, purple curve in Figure 6). Thus,

*τ*was set without reference to the threshold data shown in Figure 8, and was the same for all subjects. The value of

*p*was set equal to 1.5. This was designed to capture experimental findings that the variance of neuronal spike counts increases in proportion to the mean. The exponent of 1.5 is a little higher than experimental evidence suggests, but was chosen because it gave a better fit to the threshold data. Again, this value was the same for all subjects. The remaining parameters

*B, c*, and

*d*were fitted to the ob-served thresholds, for each subject independently.

*B*comes from the baseline noise in neuronal firing; as noted above, it sets the minimum value of the threshold for long inter-flash intervals and no interocular delay.

*B*was 8 arcsec for HN, 11 arcsec for BC, and 32 arcsec for JR.

*c*represents the amount of signal-dependent noise relative to baseline;

*c*was 0.043 for HN, 0.028 for BC, and 0.326 for JR (the units of c are not meaningful because they are in terms of the “firing rate” of the notional sensors).