In the Pulfrich effect, an interocular time delay results in the perception of depth. Two modified versions, the stroboscopic Pulfrich effect and dynamic visual noise with a delay, are generally explained by postulating an early stage of space/time-inseparable filtering, encoding motion and disparity jointly. However, most disparity sensors in monkey V1 do not show joint motion/disparity encoding, and we recently showed that depth perception in the stroboscopic Pulfrich effect is equally compatible with space/time-separable filtering. Here, we demonstrate that this filtering can be implemented with a population of physiologically plausible energy model units. Similar results are obtained whether the neurons are pure disparity sensors (like most V1 neurons) or joint motion/disparity sensors (like MT). We also demonstrate that the dynamic noise stimulus produces correlations between the activity in pure disparity sensors, and in a separate population of pure motion sensors. These correlations are sufficient to explain the percept. Thus, joint encoding of motion and disparity is not required to explain depth perception in Pulfrich-like stimuli: a brain which encoded motion and disparity in entirely separate neuronal pathways could still experience all of these illusions.

*t*, and that the object, moving with speed

*v*, has a position

*x*when it is first seen by the left eye. By the time this same image reaches the right eye, the image in the left eye will have moved to a new position,

*x*+

*v*Δ

*t*. At this moment, the right eye's image is at

*x*whereas the left eye's image is at

*x*+

*v*Δ

*t*, so there is a spatial disparity

*v*Δ

*t*. Because any neuronal mechanism that produces depth from binocular disparity will also produce depth in the classic Pulfrich effect, we learn nothing new about brain mechanisms.

*inseparable*functions of space and time); the logic behind this conclusion is laid out with particular clarity by Anzai, Ohzawa, & Freeman (2001). In this view, the receptive fields are tilted relative to the space/time axes, so the neuron is sensitive to stimulus direction of motion (Anzai et al., 2001; Carney, Paradiso, & Freeman, 1989; Morgan & Fahle, 2000; Pack, Born, & Livingstone, 2003; Qian, 1997). Binocular neurons with such receptive fields would jointly encode both motion and disparity. A signature property of such joint motion/disparity sensors is their distinctive tilted tuning profile when probed with stimuli containing both interocular delay and binocular disparity (Figure 2A). Their preferred disparity changes as a function of interocular delay; they “cannot distinguish an interocular time delay from a binocular disparity” (Qian & Andersen, 1997).

- For the stroboscopic Pulfrich stimulus, we consider the response of pure disparity sensors (binocular neurons with space/time-separable receptive fields, built according to the energy model of Ohzawa, DeAngelis, & Freeman, 1990). These neurons are not sensitive to direction of motion, and their preferred disparity remains constant as interocular delay changes, as in Figure 2B. We show that our disparity-averaging model (Read & Cumming, 2005b) can be simply implemented by averaging the response of these neurons, weighted by their disparity preference. This produces a value for effective disparity which is in excellent agreement with psychophysics experiments.
- For the dynamic visual noise stimulus, we examine the correlation between a population of pure disparity sensors and a population of pure motion sensors (monocular neurons with space/time-inseparable receptive fields). We show that the activity of pure disparity sensors is correlated with the activity of pure motion sensors. If the right eye experiences a delay Δ
*t,*then near-preferring disparity sensors are correlated with rightward-preferring motion sensors, whereas far disparity sensors are correlated with leftward motion sensors. Motion sensors tuned to speed v are most strongly correlated with disparity sensors tuned to a disparity of ∼*v*Δ*t*. We argue that this correlation is sufficient to explain why motion is perceived in opposite directions on either side of the fixation plane, why speed increases with distance from fixation, and why the percept reverses when the noise is anti-correlated (Tyler, 1977).

*ρ*(

*x*,

*t*) represents the response to a stimulus at retinal position

*x*that occurred at time

*t*relative to the present moment. We adopt the convention that negative values of

*t*represent times before the present moment. In accordance with causality, we set

*ρ*(

*x*,

*t*)=0 for all

*t*>0 because the cell cannot be influenced by the stimuli that have not yet occurred. Because the experimental stimulus contains only horizontal motion, we need include only one spatial dimension. For our model disparity sensors, except where otherwise specified, we use space/time-separable receptive fields, where the function

*ρ*(

*x*,

*t*) can be expressed as the product of a spatial component

*ρ*

_{x}(

*x*) and a temporal component

*ρ*

_{t}(

*t*). Neurons with space/time-separable receptive fields are not sensitive to direction of motion.

*ρ*

_{0}(

*x*,

*t*), as a template; a receptive field at position

*x*

_{L0}, for example, can be written as

*ρ*

_{0}(

*x*−

*x*

_{L0},

*t*). Except where otherwise noted, we model the spatial component of the receptive field profile as a Gabor function:

*σ*=0.1° and

*f*=2 cycles per degree, corresponding to a full-width half-maximum power bandwidth of about 2.3 octaves. In fact, this choice is irrelevant because we prove in the 1 that, with the read-out rule of Equation 15, the same effective disparity is obtained whatever function is chosen for the spatial component. Except where otherwise stated, the temporal component is modeled as a Gaussian:

*t*

_{lag}, is 50 ms. This means that the cell's response gradually rises after the appearance of a stimulus, reaching a peak 50 ms after stimulus onset, and decaying thereafter. This receptive field is shown in Figure 3A. A more realistic temporal kernel would be biphasic, reflecting the band-pass temporal tuning of most real V1 cells. However, this would generate problems with the binocular temporal integration. The energy model predicts that the response to interocular delay should be governed by the cross-correlation of the temporal kernels. Band-pass temporal kernels would therefore generate a biphasic response to interocular delay, yet this is not observed in the responses of V1 neurons (Anzai et al., 2001; Read & Cumming, 2005a). This is a known problem of the binocular energy model, which has yet to be addressed. It causes particular difficulties for our model of the strobe Pulfrich effect. Here, the cross-correlation of the temporal kernels also acts as a weight function controlling the weight given to different interocular delays when disparities are averaged (see 1). If this weight function is biphasic, then matches at some interocular delays have the effect of repelling the effective disparity away from the disparity of the match, a phenomenon with no psychophysical support. For all these reasons, we restricted ourselves to monophasic temporal kernels when modeling the strobe Pulfrich effect. In the dynamic noise simulation, we do also consider Gabor receptive fields with band-pass temporal frequency tuning.

*θ*=3.6 deg/s, the apparent velocity of the strobe stimulus.

*σ*

_{1}=0.025 and

*σ*

_{2}=0.008, where

*t*is in seconds and

*x*in degrees.

*t*:

*I*(

*x*,

*t*) represents the image.

*I*(

*x*,

*t*) is the luminance at retinal position

*x*and time

*t,*relative to the mean luminance. Thus, values of 0 represent gray, positive values represent bright features and negative values represent dark features. The function

*ρ*(

*x*,

*t*) represents the space/time receptive field, as described in the preceding section.

*δ*. Without loss of generality, we have also assumed that one of the flashes occurs at

*t*=0, and that the image in the left eye is then at position

*x*=0.

*T*is the interflash interval of the stroboscope;

*X*is the distance the target moves in this period; Δ

*t*is the interocular delay. Positive values of Δ

*t*means that the right eye's image is delayed relative to the left eye's; negative Δ

*t*means that left is delayed relative to right.

*t*:

*j,*although terms with

*j*>

*t*/

*T,*representing appearances of the target which have not yet occurred, make no contribution (recall that the receptive field function is zero for positive values of its time argument).

*x*

_{L0}and

*x*

_{R0}. The difference between these two defines the preferred disparity Δ

*x*

_{pref}=

*x*

_{L0}−

*x*

_{RO,}controlling the distance from the observer of stimuli which optimally drive the cell. Their mean value gives the neuron's preferred cyclopean position

*x*

_{pref}=(

*x*

_{L0}+

*x*

_{R0})/2, controlling the visual direction of optimal stimuli. Thus, we can write each neuron's left- and right-eye receptive fields,

*ρ*

_{L},

*ρ*

_{R}, as a shifted version of the reference receptive field

*ρ*

_{0}, which is centered on the origin. We write

*ρ*

_{L}(

*x*,

*t*)=

*ρ*

_{0}(

*x*−

*x*

_{L0},

*t*),

*ρ*

_{R}(

*x*,

*t*)=

*ρ*

_{0}(

*x*−

*x*

_{R0},

*t*). In terms of the neuron's preferred disparity Δ

*x*

_{pref}and cyclopean position

*x*

_{pref}, we have

*ρ*

_{L}(

*x*,

*t*)=

*ρ*

_{0}(

*x*−

*x*

_{pref}−Δ

*x*

_{pref}/2,

*t*),

*ρ*

_{R}(

*x*,

*t*)=

*ρ*

_{0}(

*x*−

*x*

_{pref}+Δ

*x*

_{pref}/2,

*t*). Substituting into Equation 9, we find that, for the neuron tuned to disparity Δ

*x*

_{pref}and cyclopean position

*x*

_{c}, the response from the left eye at time

*t*is

*x*

_{pref}and disparities Δ

*x*

_{pref},

*C*(

*t*,

*x*

_{pref},Δ

*x*

_{pref}). We now need a read-out rule relating the activity of this population to perceptual judgments performed in psychophysics experiments.

*A*(Δ

*x*

_{pref}) is the (unnormalized) time-averaged activity of the pool of neurons with preferred disparity Δ

*x*

_{pref}. Note that, in computing the time average, we need only integrate over one strobe interflash interval

*T*because the integral over cyclopean position is periodic with period

*T*. As the target moves, the activity moves across the population: If at time

*t*the most active cells are those tuned to some particular cyclopean position

*x*

_{pref}, then at time

*t*+

*T*the most active cells will be those tuned to

*x*

_{pref}+

*X,*but the distribution of activity across sensors tuned to different disparities will be the same.

*M*and

*B*(Equation 7). In this stimulus, when averaged over a population of cells tuned to different cyclopean positions but the same disparity, the sum of the monocular terms is independent of the cells' preferred disparity: It simply indicates the presence of a stimulus somewhere in the visual field. Equation 12 can thus be rewritten as

*M*is the “baseline” contribution from the monocular components

*M,*which is independent of the preferred disparity, and the integral represents the contribution from the binocular component

*B,*which does depend on the preferred disparity. It is the binocular component that endows the energy model with its key property of disparity tuning even for stimuli which contain no monocular cues to disparity, such as random-dot patterns; the monocular terms contribute only a baseline response that is observed even with binocularly uncorrelated patterns. In the simpler stimuli considered here (bars), the distinct image features that carry the disparity are visible monocularly. However, the monocular stimulus location gives no reliable information about the disparity of the target, so the binocular component of the response is the only part that is useful for estimating disparity. We therefore examine the disparity-dependent term in Equation 13:

*D*(Δ

*x*

_{pref}) is the amount by which the total response of the pool of neurons tuned to disparity Δ

*x*

_{pref}, averaged over time, exceeds the baseline response of all pools. Obviously this will be larger for pools whose preferred disparity, Δ

*x*

_{pref}, corresponds to a disparity present in the stimulus. We now wish to choose a neuronal read-out rule that implements disparity averaging, because this is what appears to happen psychophysically. We shall use the set of responses

*D*(Δ

*x*

_{pref}) as if it were a probability distribution. For example, if there were two pools whose responses were above baseline, disparity averaging means that the effective disparity lies between the preferred disparities of the two pools. This can be achieved by postulating that the effective disparity is the mean of the disparity distribution implied by

*D*(Δ

*x*

_{pref}):

*D*(Equation 14) was evaluated at 151 different values of preferred disparity Δ

*x*

_{pref}between ±(4

*σ*+

*X*(4

*σ*

_{t}+|Δ

*t*|)/

*T*). The resulting distribution

*D*was used to calculate effective disparity as in Equation 15. The limits, notionally infinite, were chosen to make sure of including all neuronal pools whose time-averaged activity is above baseline. The integration limits on

*x*

_{pref}were set to ±(4

*X*+

*σ*), centered on the most active neuronal population, again to make sure of including all the members of the neuronal population which would be activated above baseline during one stimulus temporal period. All integrals were performed by the rectangle rule, using 61 steps in the integral over cyclopean position and 151 steps in the integral over time. The sums in Equations 10 and 11 were evaluated by initially performing the sum from

*j*=−15 to

*j*=15, and then continuing to add pairs of

*j*on either side of zero until the fractional change was less than 2 parts in a million. To check that these accuracy parameters were fine enough, we redid the simulation using 101 values of disparity, 41 steps in cyclopean position, 101 in time and evaluated the sums in Equations 10 and 11 to an accuracy of 5 parts in a million. The results did not change appreciably.

*σ*

_{1}=0.06° and a short axis of

*σ*

_{2}=0.02°, centered on

*x*=Δ

*x*

_{pref}/2 in the left eye and

*x*=−Δ

*x*

_{pref}/2 in the right. That is, the left-eye receptive field is

*σ*

_{x}=

*σ*

_{1}and

*σ*

_{y}=

*σ*

_{2}, whereas for vertical orientations

*σ*

_{x}=

*σ*

_{2}and

*σ*

_{y}=

*σ*

_{1}. The right-eye receptive field is similar with Δ

*x*

_{pref}replaced by −Δ

*x*

_{pref}. The standard deviation along the temporal axis,

*σ*

_{t}, is 10 ms.

*θ*

_{pref}. The single receptive field, in the left eye only, is centered on the cyclopean location of the disparity sensors' receptive fields, that is, the origin.

*σ*

_{1}and

*σ*

_{2}are the same as for the disparity sensors: 0.06° and 0.02°, respectively. We show results for ν=5

*deg*/s(

*σ*

_{3}=0.004,

*σ*

_{4}=0.046 in units where time is in seconds and distance in degrees), and ν=10deg/s(

*σ*

_{3}=0.002,

*σ*

_{4}=0.098).

*x,*

*y,*and

*t,*only slices through the receptive fields can be shown. Figure 4A shows the spatial profile of the receptive field at the moment of the cell's peak response (50 ms after the onset of a stimulus). This is the same for both disparity and motion sensors. The example shown here is tuned to horizontal orientations. Figure 4B–C show the vertical space/time profile of the receptive field, at the retinal position

*x*=0. Figure 4B is for a disparity sensor. The receptive field is space/time separable, as for the disparity sensor in Figure 3. Figure 4C is for a downward motion sensor. Here, the receptive field is space/time inseparable, meaning that the cell is tuned to a particular speed and direction of motion. The pixelation in this figure reflects the detail with which the receptive fields were sampled in the simulation: receptive field functions were evaluated on a grid of 117

*x,*49

*y,*and 80

*t*values. Due to the horizontal position disparity, the population of disparity sensors included members whose receptive fields were centered on a range of

*x*positions (whereas the receptive fields for the motion sensors were all centered on

*x*=0, like the example in Figure 4A). This is why the grid had to extend further in the

*x*direction than in the

*y*direction. The sampling was the same for both

*x*and

*y*: one pixel represented 0.45 arcmin in both directions.

*x*

_{pref}replaced by −Δ

*x*

_{pref}), and the motion sensors are

*θ*

_{pref}is the preferred orientation of each neuron and also defines the preferred direction of motion for motion sensors. All neurons, both disparity and motion sensors, had the same tuning to spatial and temporal frequency:

*f*=2 cycles per degree, ν=10 Hz,

*σ*=0.1°,

*σ*

_{t}=10 ms. The results were essentially the same as those as shown in Figure 12 for the Gaussian receptive fields.

_{L},ν

_{R}from each eye's receptive field in response to an image

*I*(

*x*,

*y*,

*t*) was calculated as in Equation 5, with an additional integral over all values of vertical retinal position

*y*. The response of each disparity sensor,

*C*

_{D}(

*t*,Δ

*x*

_{pref},

*θ*

_{pref}), is given by the squared sum of the inputs from the two eyes (Equation 6), whereas the response of each motion sensor

*C*

_{M}(

*t*,

*θ*

_{pref}) is given by the squared input from the left eye. We then calculated the correlation coefficient

*r*between the 500 responses of the motion sensor tuned to an orientation

*θ*

_{pref}, and the corresponding responses of disparity sensors tuned to the same orientation

*θ*

_{pref}and different disparities Δ

*x*

_{pref}:

*j*. A single 633 ms presentation yields curves with the same features as are visible in Figure 12, but with noise. To obtain the smooth curves shown in Figure 12, we repeated this process 500 times and took the average correlation coefficient.

*t*is less than half the strobe period

*T,*then this match has zero spatial disparity.

*T*/2, there are other possible matches, separated by longer periods of time, which do contain spatial disparity. We have previously developed a simple quantitative model that, while granting that matches separated by the shortest amount of time have the greatest influence on perception, also allows more widely separated matches to influence perception (Morgan, 1979; Read & Cumming, 2005b; Tyler, 1977). We refer to this as the disparity-averaging model. This model assumes that the disparity assigned to an object is made up of a weighted average of all possible matches between appearances of the target in the left and right eyes. The disparity of each match is weighted by the time delay between the left- and right-eye image in each match, so that matches between appearances which occur at nearly the same time in the two eyes influence perception more than matches between appearances which occur at very different times. The effective disparity in the stroboscopic stimulus is:

*w*is the weight function describing how the weight given to a potential match falls off as a function of the interocular delay between the left and right members of the pair,

*T*is the interflash interval of the stroboscope, and

*X*is the distance traveled by the target during this period.

*t*is short enough, the illumination is effectively continuous, so the strobe Pulfrich stimulus must produce the same depth as the classic Pulfrich effect. That is, the effective disparity becomes equal to

*v*Δ

*t,*the “virtual disparity” between the apparent motion trajectories of the target in the two eyes (Figure 1; Burr & Ross, 1979). It is easy to verify that Equation 21 satisfies this. As the interflash interval increases, Equation 21 correctly predicts that the effective disparity will fall below the virtual disparity as the interflash interval increases. When the weight function is a Gaussian with mean 0 and standard deviation ∼15–20 ms, Equation 21 provides an excellent account of human perception (Read & Cumming, 2005b).

*x*

_{pref}and cyclopean position,

*x*

_{pref}.

*t*equal to 40% of the interflash interval. The stimulus is represented by the space/time diagrams along the top row (A–D). Dots indicate appearances of the stimulus in the left (red) and right (blue) eyes. Some of these are labeled for convenience in discussing the stimulus. The four columns show the response of the population at four different times in one period of the stimulus. The current time in each column is indicated by the yellow vertical line in the space/time plots A–D. A small complication is that because the neurons have a temporal lag in their response, they are not driven by the stimulus currently displayed, but rather by the stimulus as it was at previous times. The background of the space/time plot is shaded to show the temporal kernel of the neurons. The darker the shading, the less responsive the neurons are to stimuli at that time. The maximum responsiveness, indicated by the bright region, occurs 50 ms before the current time.

*x*=0 in the left eye and will shortly make an appearance in the right. However, these appearances have not yet begun to influence the neurons. The neurons are responding optimally to the second-to-last appearance of the target in the left eye (L1, at

*x*=−

*X*,

*t*=−

*T*), as shown by the fact that L1 falls in the middle of the bright band indicating the temporal kernel in the space/time diagram. This appearance has activated all the neurons with a left-eye receptive field close to −

*X*. These neurons lie along a downwards diagonal stripe in the population plots, because the preferred disparity Δ

*x*

_{pref}and cyclopean position

*x*

_{pref}compatible with a particular left-eye location

*x*

_{L}are given by

*x*

_{pref}+Δ

*x*

_{pref}/2=

*x*

_{L}, which defines a downward diagonal stripe on axes of (Δ

*x*

_{pref},

*x*

_{pref}).

*x*=−

*X*in the right eye are now responding to the most recent appearance of the target in the right eye, R1 (at

*x*=−

*X*and

*t*=−0.6

*T*; due to the interocular delay, this is 0.4

*T*later than the corresponding appearance in the left eye). This response shows in Figure 5F as an upward diagonal stripe,

*x*

_{pref}−Δ

*x*

_{pref}/2=

*x*

_{R}. Naturally, the neurons which are firing most are those whose receptive fields are at

*x*=−

*X*in both eyes because these receive excitation from both eyes. This explains the peak in the population activity at cyclopean position

*x*

_{pref}=−

*X*and Δ

*x*

_{pref}=0. Figure 5J shows only the binocular component (Equation 7) of the cells' response. This has removed the stripes due to monocular activation and focuses attention on the peak. Figure 5N shows the disparity distribution at this moment, that is, the binocular component averaged across cyclopean position. The distribution is symmetric and centered on Δ

*x*

_{pref}=0.

*x*=−

*X*in the right eye. Because the input is essentially monocular at this moment, the binocular component in Figure 5K is very weak. However, weak activation is visible at disparities of

*X*and 0, corresponding to the pairings L1↔R1 and L2↔R1, marked with green arrows in Figure 5C. The disparity distribution, Figure 5O, represents the average of these, and peaks at Δ

*x*

_{pref}=

*X*/2.

*X,*corresponding to the match L2↔R1. This is visible in Figures 5LP, where the binocular component shows a peak for detectors tuned to a disparity of

*X,*not zero. Any sensible read-out rule will therefore predict the perception of a nonzero disparity at this moment. However, this peak at disparity

*X*is weaker than the response at zero disparity in the second column. Consequently, one would expect a single disparity judgment based on this activity over time to lie close to zero than to

*X*. The exact value of the disparity judgment will depend on what rule is used to combine these population distributions over time, which we explore below.

*T*is now 20 ms, which is only twice the

*SD*of the receptive field temporal kernel. The delay is once again 0.4

*T,*that is, 8 ms. Because the stimulus period is now so much shorter relative to the neurons' temporal integration period, the population response varies very little with time. Instead of seeing strong peaks at zero disparity at some times, and weak peaks at disparity =

*X*at others, as in Figure 5, the long integration time averages these peaks out. That is, the time averaging performed by the spatiotemporal filters themselves is sufficient that the peak in the population activity is nearly constant, located slightly to one side of zero disparity, at a preferred disparity of 0.4

*X*. It drifts up the vertical axis over time (Figure 6I–L), reflecting the apparent motion of the target, which stimulates neurons with different preferred cyclopean positions each time it appears. However, the disparity distribution remains constant (Figure 6M–P).

*X,*as in Figure 5L.

*X*respond weakly. (The stimulus contains other possible matches, with disparities of 2

*X*, 3

*X*, etc., but the halves of these matches are separated by so long a time that the neurons do not respond to them.) Under our disparity-averaging read-out rule, the effective disparity lies in between the two peaks, but closer to the stronger peak. This is shown with the black line in Figure 7. The effective disparity is thus a weighted average of the disparities present in the stimulus, with the weight depending on the temporal delay between the different possible matches. In Figure 7B, the strobe interflash interval is short relative to the integration time of the neurons, so the most active pool is always the same, namely, the pool with preferred disparity 0.4

*X*.

*t*and interflash intervals

*T*. The solid curves show the effective disparity obtained with the original disparity-averaging equation, Equation 21, when the weight function is a Gaussian with

*SD*equal to τ√2 (inset in Figure 8; this is the cross-correlation of the temporal receptive fields, which are Gaussians with

*SD*τ). The results are the same. Thus, the read-out rule presented here represents a simple way to implement the weighted disparity-averaging equation with a population of physiologically plausible model neurons.

*x*/

*X*vs. Δ

*t*/

*T*lie along the identity line). Figure 9 shows an example of the latter case. In this simulation, instead of averaging V1 activity over time and then extracting a single disparity for the whole stimulus, an instantaneous disparity, Δ

*x*

_{inst}, is assigned at every moment based on the preferred disparity of the most active V1 cells (winner takes all). The resulting disparities were then averaged over time. Formally, this rule is

*X*40% of the time, and so assigns effective disparity 0.4

*X*(the virtual disparity, Figure 9). With this read-out rule, the effective disparity is always the virtual disparity, even for long interflash intervals where human subjects report disparities much closer to zero. Although this rule does not therefore match experimental data, it is nevertheless of interest. It demonstrates that it is possible for a population of pure disparity sensors to encode the virtual disparity implicit in the apparent motion of the strobe stimulus, although the sensors do not respond to motion. As we shall see below (Figure 14), the reason for this paradoxical result is that even pure disparity sensors become sensitive to direction of motion in stimuli with an interocular delay, due to the geometrical equivalence of motion and disparity in such stimuli (Pulfrich, 1922).

*x*

_{inst}(Equation 22) by the height of the peak:

*x*

_{pref}, because the receptive fields in the two eyes are centered on different positions. Note that the preferred disparity was always horizontal, irrespective of the orientation preference of the disparity sensor. In contrast, all the motion sensors are monocular, with a single receptive field centered on the origin of the left retina. All neurons in the simulation have identical spatial and temporal frequency tuning.

*θ*

_{pref}and disparity Δ

*x*

_{pref}in turn, we calculated the correlation

*r*(

*θ*

_{pref},Δ

*x*

_{pref}) between the 500 successive responses of the motion sensor tuned to

*θ*

_{pref}and the 500 successive responses of the disparity sensor tuned to

*θ*

_{pref}and Δ

*x*

_{pref}(Equation 20). To remove noise, we repeated this process 500 times with different random noise stimuli and averaged the correlation coefficient. The curves in Figure 12A shows this average correlation coefficient as a function of preferred orientation and disparity. The four colors correspond to the four motion sensors, tuned to motion leftward (red), rightward (blue), downward (green dashed), or upward (pink, largely obscured under the green curve). Each point on the curve shows the correlation between the activity of that motion sensor, and the activity of the disparity sensor with the same preferred orientation and with preferred disparity indicated on the horizontal axis.

*x*

_{pref}=0, resulting in a strong correlation between the activity of the motion sensor and the zero-disparity sensor. Importantly, this correlation is identical for motion detectors sensing opposite directions (up vs. down), so it provides no basis for any sensation of motion in any one direction.

*v,*a rightward sensor tuned to −

*v,*a far disparity sensor tuned to disparity

*v*Δ

*t*, and a near disparity sensor tuned to −

*v*Δ

*t*. Whatever the precise form of the spatiotemporal filters, provided that the image pairs 1 and 2 activate the leftward motion sensor more than its rightward partner, then they must activate the far disparity sensor more than its near disparity partner. This is the origin of the correlation in Figure 12 between activity in leftward motion sensors and in far disparity sensors.

*v*Δ

*t*, marked with vertical lines in Figures 12C–D. Similar results were also obtained with Gabor receptive fields, Equations 18 and 19, comparing channels tuned to different spatial frequencies and hence different speeds (not shown). Again, the reason for this is clear from Figure 13. A pair of frames which optimally stimulates a leftward motion sensor tuned to speed

*v*must also stimulate the disparity sensor tuned to disparity

*v*Δ

*t*. This means that larger disparities will be associated with higher speeds, in accordance with the reported shearing percept of this stimulus (Tyler, 1974, 1977).

*t*, then far disparity sensors are more strongly correlated with leftward motion sensors than rightward ones (and vice versa for near disparity sensors). If we consider only motion sensors tuned to speed

*v*, then this difference between leftward and rightward is most pronounced for disparity sensors tuned to

*v*Δ

*t*. The reason for this was explained by Tyler (1974; 1977). Suppose the right eye is delayed by frame, and suppose two successive frames presented to the left eye happen to contain motion to the left. But then, when the left eye is viewing the second frame, the right eye is viewing the first frame, which must therefore contain far disparity. The very structure of the stimulus means that rightward motion inevitably co-occurs with near disparity, and leftward motion with far disparity (Figure 13).

*n*frames before being replaced by a new dot in a random position. They reported that perceived depth increases with

*n*. Many authorities still consider this compelling evidence in favor of joint motion disparity encoding. However, when one considers the changes that occur between any pair of frames in this stimulus, it is easy to see why our model accounts for the results equally well. On each frame only 100/

*n*% of the dots are replaced, whereas in a standard noise stimulus 100% of the dots are replaced on each frame. The dots that are not replaced move coherently and hence produce a spatial disparity as in the classic Pulfrich effect. The dots that are replaced behave just like the dynamic noise we simulated above. Thus, the stimulus of Morgan and Ward is a simple sum of a standard interocularly delayed noise stimulus (to which it reduces when

*n*=1) and a random-dot strobe Pulfrich stimulus (to which it asymptotes as

*n*→∞, Figure 15). Morgan and Ward asked their subjects to match a probe to the depth of the dots. They do not report results for

*n*=1, but it seems clear that in this case, the matching depth must be zero because there is symmetry about fixation. Our own results with this stimulus confirm that interocular delay in zero-disparity noise does not bias depth perception (see Figure 10 of Read & Cumming, 2005a). As

*n*increases above 1, the symmetry about fixation is broken, because there is now more power in horizontal motion to the right than in other directions. In terms of depth, this places more power at the virtual disparity (blue arrows in Figure 15). It therefore does not seem surprising that the matching depth reported by Morgan and Ward's subjects moved away from zero towards this virtual disparity. For large

*n,*the matching disparity was equal to the virtual disparity implied by the apparent motion of the stimulus, as expected because the stimulus is now a strobe Pulfrich stimulus with a short interflash interval (25 ms) (Morgan, 1979; Read & Cumming, 2005b). Thus, the results of Morgan and Ward follow naturally from the percepts elicited by dynamic noise and the Pulfrich effect, both of which can be explained with separate motion/disparity encoding. We are not aware of any stimuli that require joint motion/disparity encoding to explain them.

*D*from Equations 7, 10, 11, and 14, we obtain the expanded form:

*ρ*

_{0}(

*x*,

*t*) is space/time separable, that is, it can be written as the product of a spatial component

*ρ*

_{0x}(

*x*) and a temporal component

*ρ*

_{0t}(

*t*). Note that, for convenience, the sums over

*j*and

*k*are written as extending to infinity, rather than terminating at

*j*=⌊

*t*/

*T*⌋ etc. This does not alter the result, because, due to causality,

*ρ*

_{0t}(

*t*) is zero for

*t*>0 (future time). Thus, terms for which

*j*exceeds ⌊

*t*/

*T*⌋ contribute nothing to the sum anyway.

*x*

_{pref}, is performed first:

*S*for the cross-correlation of the spatial components of the receptive fields in the two eyes:

*S*(

*d*)=∫

_{−∞}

^{+∞}

*d*

*x*

*ρ*

_{0x}(

*x*)

*ρ*

_{x0}(

*x*−

*d*). Then the integrals over

*x*

_{pref}in Equation A1 are simply

*x*

_{pref}to Δ

*x*

_{pref}′=(

*j*−

*k*)

*X*−Δ

*x*

_{pref}, the integral over Δ

*x*

_{pref}can be rewritten as

*S*(

*d*) is even-symmetric about zero. The second term therefore vanishes, and we have

_{−∞}

^{+∞}

*d*Δ

*x*′

_{pref}

*S*(Δ

*x*′

_{pref}) therefore cancels out between the numerator and denominator, and we are left with

*t*within the summation over

*j*, and replace the integration variable

*t*with

*t*′=

*j*

*T*−

*t*:

*j*and

*k*in both numerator and denominator with sums over

*n*=(

*j*+

*k*)/2 and

*m*=(

*j*−

*k*)/2. To do this, we note that for any function

*f*,

*t*′ in Equation A2). Now, different values of

*n*simply select different ranges of time to integrate over. Thus, we can replace the summation over

*n*and finite integral over

*t*with a single infinite integral over

*t*:

*m*into a single sum over a variable

*p*:

*p*, and the second over odd values, and we have now unified these.