Free
Article  |   November 2011
Speed judgments of three-dimensional motion incorporate extraretinal information
Author Affiliations
Journal of Vision November 2011, Vol.11, 1. doi:10.1167/11.13.1
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Arthur J. Lugtigheid, Eli Brenner, Andrew E. Welchman; Speed judgments of three-dimensional motion incorporate extraretinal information. Journal of Vision 2011;11(13):1. doi: 10.1167/11.13.1.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

When tracking an object moving in depth, the visual system should take changes of eye vergence into account to judge the object's 3D speed correctly. Previous work has shown that extraretinal information about changes in eye vergence is exploited when judging the sign of 3D motion. Here, we ask whether extraretinal signals also affect judgments of 3D speed. Observers judged the speed of a small target surrounded by a large background. To manipulate extraretinal information, we varied the vergence demand of the entire stimulus sinusoidally over time. At different phases of vergence pursuit, we changed the disparity of the target relative to the background, leading observers to perceive approaching target motion. We determined psychometric functions for the target's approach speed when the eyes were (1) converging, (2) diverging, (3) maximally converged (near), and (4) maximally diverged (far). The target's motion was reported as faster during convergence and slower during divergence but perceived speed was little affected at near or far vergence positions. Thus, 3D speed judgments are affected by extraretinal signals about changes in eye rotation but appear unaffected by the absolute orientation of the eyes. We develop a model that accounts for observers' judgments by taking a weighted average of the retinal and extraretinal signals to target motion.

Introduction
How do we perceive the speed of objects moving in depth? If the eyes are fixating, an object's three-dimensional (3D) speed toward and away from the observer is signaled by the change in position of the object on the left and right eyes' retinas across time. An estimate of the object's 3D speed could, therefore, be derived from retinal cues, such as changes in binocular disparity (Cumming & Parker, 1994; Harris & Watamaniuk, 1995; Regan & Gray, 2009) or differences in interocular velocity (Beverley & Regan, 1973; Rokers, Cormack, & Huk, 2009; Shioiri, Saisho, & Yaguchi, 2000). Under typical viewing conditions, however, observers track the object's movement with their eyes. The resulting change in eye vergence minimizes the absolute binocular disparity of the object, reducing the magnitude of the retinal signal. Thus, it would be sensible for the visual system to combine information about retinal motion and eye vergence pursuit to estimate the object's true motion. 
The visual system could derive information about vergence pursuit from the retinal slip of static scene structures and/or from extraretinal signals related to eye movement. However, the idea that extraretinal signals contribute to 3D motion perception is contentious (Brenner, van den Berg, & van Damme, 1996; Erkelens & Collewijn, 1985b; Harris, 2006; Regan, Erkelens, & Collewijn, 1986), and it is widely held that observers are insensitive to large changes in eye vergence when these are not accompanied by changes in relative disparity. For example, Erkelens and Collewijn (1985b) and Regan et al. (1986) reported that large changes in the absolute disparity of an extensive (30 × 30 deg) stimulus did not give rise to sensations of 3D motion, even though they induced vergence pursuit (Erkelens & Collewijn, 1985a). As a result, they concluded that extraretinal signals about changes in eye vergence provide poor information about motion in depth. Further, Brenner et al. (1996) showed that observers did not perceive the 3D motion of a large object whose absolute disparity changed by 3 deg. In these studies, changing vergence signals conflicted with the absence of looming. An indication that changes in size are critical to motion in depth is that some 3D motion is perceived without retinal slip when the target does not convey looming information, for example, when small targets are used (Brenner et al., 1996; González, Allison, Ono, & Vinnikov, 2010; Harris, 2006; Howard, 2008; Regan et al., 1986). These studies suggest that judgments of 3D motion are informed by extraretinal cues when the cue conflict is less evident. 
Nefs and Harris (2008) investigated the effect of vergence pursuit eye movements on induced motion (the perception that a stationary target moves in the presence of a moving inducer). They showed that when participants pursued a fast moving inducer, induced motion of the target was tenfold higher than when they were asked to track the target. They accounted for their findings on the basis that the visual system estimates 3D motion by taking the sum of retinal and extraretinal signals, with a gain factor attenuating the influence of the extraretinal signal (also see Nefs & Harris, 2007). Welchman, Harris, and Brenner (2009) showed that the retinal slip that initiates ocular pursuit is not responsible for the extraretinal contribution to judgments of 3D motion sign (approaching or receding). Such judgments are best explained on the basis that observers combine the instantaneous retinal slip with extraretinal (vergence) signals. 
Here, we extend the technique developed by Welchman et al. (2009) to have observers make judgments about the motion of a small target that is surrounded by a large, moving background (Figure 1). As in the previous study, we vary the position of the background continuously over time, creating a sinusoidal vergence demand (Figure 2A) that induces pursuit with a high gain (Erkelens & Collewijn, 1985a). This stimulus ensures that the observers' eyes are smoothly pursuing the target (and could therefore provide extraretinal information) during the test portion of the experiment in which the target starts to move relative to the background (Figures 1 and 2B). By using a large background stimulus that has a constant retinal size, we ensure that observers cannot perceive their eye vergence pursuit due to the conflict between vergence changes and the absence of looming (Erkelens & Collewijn, 1985b; Regan et al., 1986; Welchman et al., 2009). Thus, any influence of vergence pursuit on the interpretation of changes in relative disparity in terms of judged speed would indicate that extraretinal signals contribute directly to such judgments. 
Figure 1
 
Schematic showing the lateral motion information available to the left eye for 3D motion judgments. At time 1, when the target (red dot) has not yet started to “move,” the target is at the center of the moving background (horizontal black line) and the eyes are fixated on F 1 (note that this is not necessarily representative but is assumed for clarity). At time 2, the background has moved by extent B (black arrow) and the target by extent T (red arrow), so the target has moved relative to the background by extent TB (green arrow). The eyes have moved by P (blue arrow) and are now fixating on F 2, slightly behind the center of the background (B 2). This “lag” causes a retinal slip of the background, B′, and a retinal slip of the target, T′. The inset shows the interpretation of such lateral target motion in terms of motion in depth, assuming that the right eye sees a mirror symmetrical image.
Figure 1
 
Schematic showing the lateral motion information available to the left eye for 3D motion judgments. At time 1, when the target (red dot) has not yet started to “move,” the target is at the center of the moving background (horizontal black line) and the eyes are fixated on F 1 (note that this is not necessarily representative but is assumed for clarity). At time 2, the background has moved by extent B (black arrow) and the target by extent T (red arrow), so the target has moved relative to the background by extent TB (green arrow). The eyes have moved by P (blue arrow) and are now fixating on F 2, slightly behind the center of the background (B 2). This “lag” causes a retinal slip of the background, B′, and a retinal slip of the target, T′. The inset shows the interpretation of such lateral target motion in terms of motion in depth, assuming that the right eye sees a mirror symmetrical image.
Figure 2
 
An illustration of how movement of the target and background were used to separate retinal and extraretinal cues to motion estimation. (A) An illustration of the motion in depth of the target and background in each condition. The background moved back and forth sinusoidally, in opposite directions in the two eyes, throughout the entire experiment (frequency = 0.25 Hz). In terms of binocular cues, this corresponds with oscillations in depth, but due to the absence of looming, these oscillations are not perceived. Target motion was perceived when we changed the relative disparity of the target with respect to the background. We did so at four phases of the background's oscillation (orange arrows): far, near, converging, or diverging. (B) Velocity of the target with respect to the background (upper panel) and of the targets and the background (lower panel) for the left eye. The same relative velocity corresponds to different target velocities in the four conditions.
Figure 2
 
An illustration of how movement of the target and background were used to separate retinal and extraretinal cues to motion estimation. (A) An illustration of the motion in depth of the target and background in each condition. The background moved back and forth sinusoidally, in opposite directions in the two eyes, throughout the entire experiment (frequency = 0.25 Hz). In terms of binocular cues, this corresponds with oscillations in depth, but due to the absence of looming, these oscillations are not perceived. Target motion was perceived when we changed the relative disparity of the target with respect to the background. We did so at four phases of the background's oscillation (orange arrows): far, near, converging, or diverging. (B) Velocity of the target with respect to the background (upper panel) and of the targets and the background (lower panel) for the left eye. The same relative velocity corresponds to different target velocities in the four conditions.
In our experiment, we briefly move the target with respect to the background—thereby introducing a relative retinal motion component (and thus changing relative disparity) in addition to the absolute motion of the whole stimulus (i.e., the sinusoidal displacement of the target and background). We measure judgments of approach speed under four conditions: when the eyes are pursuing in converging (approaching) or diverging (receding) directions and when the eyes are at the maximum (near) and minimum (far) vergence excursions. Thus, we contrive that the same magnitude of the retinal cue (changing relative disparity) is combined with different magnitudes of the extraretinal cue (Figure 2B). 
Given the presence of both retinal and extraretinal cues to 3D motion, we consider two potential models for the visual system's use of this information. First, observers might ignore all extraretinal information, so that judgments of the target's approach speed depend only on retinal velocity and are unaffected by differences in vergence pursuit. This relative velocity model would take as its inputs the retinal slip velocities of the background (B′) and of the target (T′). Speed judgments would be calculated as the difference between T′ and B′. If so, eye pursuit velocity (P) has no bearing on the observer's judgment, because the difference between T′ and B′ is unaffected by adding a constant to each. This retinal model would predict psychometric functions from our four experimental conditions that lie on top of each other when expressed in terms of relative speed (Figure 3A). Alternatively, observers might judge the velocity of the target by combining the pursuit velocity (P) with the retinal velocity of the target (T′). This absolute velocity model describes the total vergence demand of the target and predicts that judgments of speed will be faster during convergence and slower during divergence (see Figure 3D). Both of these models ignore the fact that the perceived speed should depend on the static convergence of the eyes (i.e., the viewing distance). We will examine this scaling issue in the Discussion section. 
Figure 3
 
Model predictions for (A, B) a relative velocity model and (C, D) an absolute velocity model. The relative velocity could be judged by taking the difference between the retinal slip velocity of the target (T′) and the retinal slip velocity of the background (B′). The absolute velocity of the target could be judged by summing the pursuit velocity (P) and the retinal velocity of the target (T′). Note that since T′ = TP and B′ = BP, the predictions for the perceived velocity are TB and T, respectively, irrespective of the pursuit velocity. Predictions are shown both in terms of the relative speed (A, C) and in terms of the absolute speed (B, D) on the screen. Whenever curves overlap, only the red curve is visible.
Figure 3
 
Model predictions for (A, B) a relative velocity model and (C, D) an absolute velocity model. The relative velocity could be judged by taking the difference between the retinal slip velocity of the target (T′) and the retinal slip velocity of the background (B′). The absolute velocity of the target could be judged by summing the pursuit velocity (P) and the retinal velocity of the target (T′). Note that since T′ = TP and B′ = BP, the predictions for the perceived velocity are TB and T, respectively, irrespective of the pursuit velocity. Predictions are shown both in terms of the relative speed (A, C) and in terms of the absolute speed (B, D) on the screen. Whenever curves overlap, only the red curve is visible.
Methods
Observers
Two of the authors and four naive observers who were recruited from staff of the Faculty of Human Movement Sciences at the VU University of Amsterdam took part in the study. They all had normal or corrected-to-normal vision and were screened to ensure that they could discriminate 1 arcmin of disparity in a briefly (300 ms) presented random dot stereogram. 
Apparatus
Images were presented stereoscopically on a mirror stereoscope with two 24″ CRT (Sony GDM-FW900) monitors, each seen by one eye through a mirror. The monitors displayed 1096 by 686 pixels at a refresh rate of 160 Hz (an individual pixel subtended about 3.1 arcmin). The distance from the observer's eyes to the monitors was about 50 cm. Observers responded by pressing keys on a keyboard. Binocular eye movements were recorded using an Eyelink II eye tracker (SR Research) at a sampling rate of 500 Hz. 
Stimulus
Observers fixated a small blue target dot (diameter = 7 arcmin), surrounded by a large background (20 cm/22 deg wide, 30 cm/31 deg high) of randomly positioned green triangles (side length 1.7 cm/2 deg), avoiding a small region (3 cm/3.4 deg wide, 1.5 cm/1.7 deg high) around the target. We masked visible changes in the position of the background by rotating the triangles around their centers at a speed of 45 deg/s. Half of the triangles rotated clockwise and the other half rotated anti-clockwise. To measure the influence of extraretinal signals, the observers' eyes had to be smoothly pursuing the background in depth. We induced these vergence pursuit eye movements by continuously varying the lateral positions of the left and right eyes' images in counter-phase following a sinusoidal profile (frequency = 0.25 Hz). The amplitude of the lateral movement of the background was ±5 mm (34 arcmin), corresponding to a movement in depth from about 9 cm behind the screen to 7 cm in front of the screen. This corresponds to a vergence change of about 1.14 deg (or 0.57 deg in each eye; Movie 1). 
To ensure that the modulations of absolute disparity in the background were imperceptible, we kept the retinal size of the background constant. This created a conflict between monocular and binocular cues for the background's simulated position in depth (looming cues signaled no 3D motion although binocular cues signaled motion). While looming information is significant for the large background, for our small target its influence is negligible (the maximum change in retinal size of the target is 2% of 7 arcmin, which is well below our rendering resolution). In previous studies, we (Welchman et al., 2009) and others (Erkelens & Collewijn, 1985b; Regan et al., 1986) have shown that keeping the retinal size of the background constant successfully prevents observers from discriminating approaching or receding motion (neither was there any perception of change in the apparent size of the stimulus). To further ensure that there was no relative retinal motion from static objects—and thus that eye movement was indicated by extraretinal signals rather than relative retinal slip from static structures—we took a number of precautions. Specifically, the experiment was conducted in full darkness and any residual light from objects within the field of view was removed by surrounding the CRTs and mirrors in dark cloth. In addition, we reduced the luminance of the CRTs to ensure that observers could not see the “black” background illumination of the displays nor could they see the mirrors. Finally, we ensured that movement of the background stimulus did not reach the edges of the displays, ensuring that there were no cues from occlusion that would arise if the stimulus moved off the screen. 
Procedure
On each trial, the target's disparity relative to the background changed at one of five rates. The target always moved to the right across the background in the left eye and to the left across the background in the right eye, consistent with approaching 3D motion. Once the target started moving relative to the background, its color also changed from blue to red. After the target approached the observer for 300 ms, it disappeared, but the moving background remained visible. The blue target reappeared after 1 s, moving on its sinusoidal profile with zero disparity with respect to the background. We used five rates of relative disparity change, spaced in steps of 0.5 mm/s of lateral movement around the mean rate of 2.5 mm/s. This corresponds to rates of change of disparity of about 21 to 48 arcmin/s. In the interest of consistency, we will report our results in units of mm/s on the screen, as we experimentally manipulate the changing disparity information by means of lateral motion on the screen (in opposite directions in the left and right eyes). On each trial, observers judged whether the speed of the target was faster or slower than the mean speed of the stimulus set (cf. McKee, 1981). We measured psychometric functions for approach speed in four interleaved conditions: when the eyes were moving to (a) converge or (b) diverge and when the eyes were at the endpoints of their trajectories in (c) near and (d) far vergence positions (see Figure 2A for a cartoon). 
We prevented dark adaptation by presenting a white screen every 10 trials (approximately every 80 s) for 5 s. During this interval, we also calibrated the eye tracker by displaying an isolated black fixation target that jumped back and forth laterally every 500 ms by ±0.57 deg (the amount that the eyes were to move in opposite directions while pursuing the background). The median version response for each eye was considered to correspond to this distance. Each observer completed 400 trials (4 conditions, 5 stimulus levels, 20 trials) in two sessions. As it was unlikely that observers formed a reliable criterion for the mean speed within 10 trials, we discarded the first 10 trials from each session. 
Eye movement recording and analysis
During the experiment, observers were instructed to maintain fixation on the target. We recorded the left and right eye positions. To analyze these eye movement data, we first calibrated raw gaze positions by manually selecting fixations in calibration blocks and then converted these to degrees of visual angle. Preprocessing of eye movements involved the removal of trials in which blinks or saccades occurred during or shortly (200 ms) before or after target presentation (2% of trials) and trials in which the eye position data were excessively noisy (5% of trials) and did not resemble fixations or saccades, potentially due to instability in the eye tracker's estimate of the eye position. Eye trace signals were screened “blind” in that neither the experimental condition nor the observer's psychophysical response was known when inspecting the eye movement traces. The removal of individual trials due to blinks and noise required the agreement of two of the authors. We calculated horizontal vergence as the right minus the left horizontal eye position, with each position being related to the positions when fixating the screen center (so negative values for vergence correspond to positions that are nearer than the screen). 
Results
Eye movements
Our first analysis investigated how well our observers made vergence pursuit eye movements in response to the large moving stimulus. Figure 4A shows the eye position data for the critical part of a single trial in which the target was presented while the eyes were converging. To characterize the gain and the phase lag of vergence pursuit, we first combined the parts of the trajectory that we considered the critical portion of each trial (i.e., the eye trace during target presentation ±200 ms) and then fit a sine function to the average vergence response across observers with the function's amplitude and phase as free parameters. Vergence pursuit gain was then calculated as the peak amplitude of the fit sine function divided by the peak amplitude of the vergence demand. Phase lag was calculated as the difference in phase between the best fitting sine and the vergence demand of the stimulus. 
Figure 4
 
Measured eye positions from 200 ms before the target started moving until 200 ms after the target disappeared. The dashed lines indicate the background position. (A) Eye movements from a single trial in which the eyes were converging when the target was presented. Top row: Left and right eye traces. Bottom row: Version and vergence traces (i.e., mean of and difference between the left and right eye traces). (B) Average vergence response for each condition (expressed in terms of half the lateral distance between where the two eyes were directed on the screen).
Figure 4
 
Measured eye positions from 200 ms before the target started moving until 200 ms after the target disappeared. The dashed lines indicate the background position. (A) Eye movements from a single trial in which the eyes were converging when the target was presented. Top row: Left and right eye traces. Bottom row: Version and vergence traces (i.e., mean of and difference between the left and right eye traces). (B) Average vergence response for each condition (expressed in terms of half the lateral distance between where the two eyes were directed on the screen).
We found that observers made accurate vergence pursuit movements in response to the changing absolute disparity of the stimulus (Figure 4B), with a vergence gain of about 0.97. This is in line with previous studies that used a frequency of 0.25 Hz, which reported pursuit gains that approached unity for velocities of up to 1.5 deg/s (Erkelens & Collewijn, 1985a). We found that there was an average delay (pursuit lag) of approximately 100 ms between the changing disparity of the background and the vergence response. 
Perceived 3D speed
We next examined speed judgments under the four experimental conditions (convergence, divergence, near vergence, and far vergence). We obtained psychometric functions for approach speed as a function of the target's motion relative to the background (Figure 5). Fitting these data with a cumulative Gaussian yielded the point of subjective equality (PSE; 50% point on the curve) that provides a measure of the perceived speed of the target. We defined speed discrimination thresholds (Δv) as the standard deviation of the fitted Gaussian and we defined increment thresholds (i.e., Weber fractions) as the ratio of threshold speed to the mean speed (Δv/v). 
Figure 5
 
Psychophysical results. (A) Psychometric functions for the fraction of trials in which observers responded that the target approached faster than the average approach speed as a function of the relative velocity on the screen. Separate functions for the conditions in which the eyes were in near (red) and far (blue) vergence positions and when the eyes were converging (green) and diverging (cyan). (B) Points of subjective equality with standard errors. The solid horizontal line represents the mean speed of the target with respect to the background. (C) The individual subjects' data.
Figure 5
 
Psychophysical results. (A) Psychometric functions for the fraction of trials in which observers responded that the target approached faster than the average approach speed as a function of the relative velocity on the screen. Separate functions for the conditions in which the eyes were in near (red) and far (blue) vergence positions and when the eyes were converging (green) and diverging (cyan). (B) Points of subjective equality with standard errors. The solid horizontal line represents the mean speed of the target with respect to the background. (C) The individual subjects' data.
We found that judgments of 3D speed differed significantly between conditions (repeated measures ANOVA on the PSEs: F 3,15 = 18.5, p < 0.001). The most striking feature of the data was that a target with the same retinal speed was seen as faster during convergence than during divergence (Figure 4—green vs. cyan data series). The difference between the PSEs was 1.42 mm/s of lateral speed (consistent with a 3D speed of about 2.2 cm/s, about 60% of the standard speed of the target; t 5 = 6.03, p < 0.01). We found no significant shift between the psychometric functions for the nearly stationary eyes in the far and near vergence positions (the difference was 0.14 mm/s; t 5 = 0.69, p = 0.52). 
Increment thresholds did not differ significantly between the conditions (F 3,15 = 2.36, p = 0.113). This result was expected, given that judgments were made with respect to an internal standard that was the mean of the entire stimulus set. We found that the average increment threshold for approach speed was 0.22 (across observers and conditions). This is consistent with previously reported thresholds for the speed of motion in depth (0.20; Harris & Watamaniuk, 1995) and lateral motion (0.25; McKee, 1981). 
Discussion
When judging 3D motion of a target, observers tend to change vergence to track the object. To judge 3D motion correctly, the visual system should, therefore, take account of eye movements. Here, we isolated extraretinal cues to vergence from the retinal cues that would normally accompany vergence eye movements and tested how extraretinal information is combined with retinal signals to judge 3D speed. Our results show that extraretinal cues to vergence pursuit movements systematically affected judgments of 3D speed: An object's approach speed is reported to be faster during convergence and slower during divergence. This must be due to the rotation of the eyes rather than to distance scaling because judgments were not sensitive to whether the eyes were at near or far vergence positions. Specifically, the transformation of a changing retinal disparity signal to 3D speed depends on knowing the viewing distance: From binocular geometry, the same rate of change in disparity at a far distance should result in a faster perceived 3D speed than when it is presented at a closer distance. Interestingly, in our experiment, we did not find evidence for such scaling. 
In the Introduction section, we outlined sets of predictions for performance in these different conditions under two different models. When we compare our psychophysical results (Figure 5) with these models' predictions (Figure 3), it is clear that neither provides a good account of our results. We will consider two possible reasons for this: that the perceived speed is determined by a combination of the two sources of information and that the perceived speed is scaled by ocular convergence despite the background appearing to remain at a constant distance. However, first, we will briefly discuss the eye movements themselves. 
Effects of pursuit lag
Although observers pursued the background almost perfectly, we found that the eyes lagged the background by about 100 ms (Figure 4B). Do we need to consider this lag when interpreting our results? Both the models we outlined in the Introduction section are insensitive to lag: The relative velocity model considers the difference between the retinal velocity of the background (B) and target (T), so introducing an equal increment or decrement to both has no effect on their difference; the absolute velocity model is independent of the magnitude of the pursuit signal (P) because it only depends on target velocity (T)—lag will affect pursuit and retinal slip of the target (T′) to equal and opposite extents, so their sum will not change. We can formulate modified models that consider that retinal slip is combined with a later eye movement (Rotman, Brenner, & Smeets, 2004, 2005), in which case the target will appear to move faster in the far than in the near condition, because a later eye movement will include more convergence for the far target and more divergence for the near target. We see some indication of this in Figure 5A, but the individual subjects' data in Figure 5C suggest that this is all due to one subject (S2). For the approaching and receding conditions, this effect should be minimal because the velocity of the background hardly changes near the time at which the target is presented. 
Scaling angular velocity by viewing distance
From our results, it is clear that changes in the orientation of the eyes affect judgments of 3D speed. In seeking to explain our data, we have so far only considered models that use angular velocity measures. As the retinal projection of movement depends on the viewing distance, it is reasonable to ask whether observers recovered the real-world velocity, scaling the angular retinal velocities by the viewing distance. In our displays, there are two potential sources of information about the viewing distance: the vergence position of the eyes (cf. Backus & Matza-Brown, 2003; Brenner & van Damme, 1998; Collett, Schwarz, & Sobel, 1991; Enright, 1991; Foley, 1980; Frisby, Catherall, Porrill, & Buckley, 1997; Taroyan, Buckley, Porrill, & Frisby, 2000) and the gradient of vertical disparities in the projection of the background (Backus, Banks, van Ee, & Crowell, 1999; Bradshaw, Glennerster, & Rogers, 1996; Brenner, Smeets, & Landy, 2001). These two sources were always consistent with each other. The difference in vergence between the near and far conditions was about 1.14 deg. Changes in the vertical extent of the stimulus (i.e., the vertical separation between triangles at the top and bottom of the background in the two eyes) were 4.6 arcmin (0.2%) at most. Note, moreover, that changes in vertical disparity are negligible for the target (the object the observers were judging), so vertical disparity can only contribute to scaling (or judging eye rotation on the basis of the background's retinal image deformation). Considering the data from the near and far vergence conditions allows us to assess the extent to which observers scaled angular velocity estimates to judge 3D speed. In particular, if observers scaled the retinal velocities by the vergence distance, we would expect that the same retinal velocity is perceived as faster at the far vergence position. 
Figure 6 shows what our data would look like if observers recovered the 3D speed of the target by scaling the angular measurements by eye vergence. Comparing the predictions of a real-world (3D) speed model (Figure 6A) with our psychophysical results (Figure 5A), it is clear that observers did not recover the real-world speed. If we consider the individual subjects' data (Figure 5C), it is clear that only one participant (Subject S2) shows a difference between judgments in the near and far vergence positions. However, this observer's data do not match the predictions of the real-world speed model (Figure 6A). We conclude that judgments of 3D speed are affected by extraretinal signals about changes in eye orientation (i.e., the differences between the converging and diverging conditions) but are unaffected by extraretinal signals relating to the baseline vergence of the eyes (i.e., the lack of significant differences between the near and far conditions). This lack of scaling conforms to previous work on the scaling of velocity in the frontoparallel plane (McKee & Welch, 1989), suggesting that the visual system codes 3D velocity signals in angular dimensions, uncorrected for viewing distance. 
Figure 6
 
Predictions for a model that recovers the real-world speed of the target, both (A) in terms of the relative speed on the screen and (B) in terms of the real-world speed that would coincide with the simulated target positions on the screen. Note that this model scales the lateral motion by the viewing distance when judging speed, in contrast to our original models.
Figure 6
 
Predictions for a model that recovers the real-world speed of the target, both (A) in terms of the relative speed on the screen and (B) in terms of the real-world speed that would coincide with the simulated target positions on the screen. Note that this model scales the lateral motion by the viewing distance when judging speed, in contrast to our original models.
Relative contribution of retinal and extraretinal signals
It is clear that none of the models we have considered so far can account for our results: Observers do not base their judgments on the relative speed of the target with respect to the background or the absolute angular speed of the target. In addition, we have shown that observers do not recover the target's real-world velocity. What estimate of speed were our subjects using? Both of our original models exploit angular velocity measurements and both would normally (if the background were not “moving”) provide estimates of the target's motion in depth. Observers may, therefore, exploit a weighted average of both signals. To test whether this could account for observers' performance, we constructed a model that calculates a weighted average of information about the retinal slip of the target relative to the background (Model 1; T′ − B′) and information about the absolute speed of the target (Model 2; P + T′). The weight term (w) determines the relative contributions of both sources of information. Specifically, we described the estimated speed (V) as 
V = w ( P + T ) + ( 1 w ) ( T B ) ,
(1)
which can be simplified to 
V = w P + T ( 1 w ) B .
(2)
The best way to combine the two measures is for the weight term w to be chosen such that the precision of V is maximized (Cochran, 1937). This is so when w is related to the variances associated with the retinal slip of the background (σ B 2) and of the pursuit signal (σ P 2) by 
w = σ B 2 σ P 2 + σ B 2 .
(3)
Note that although the variance in the percept also depends on the variance associated with the retinal slip of the target (σ T 2), the latter contribution is independent of w so it does not influence the weight. Using previously published data (Experiment 1 from Welchman et al., 2009), we estimated the relative sensitivity of observers to retinal and extraretinal signals to be a factor of 2.86. Thus, we estimate that σ P = 2.86σ B. Substituting this in Equation 3 gives w = 0.11. 
This weighted model provides an excellent fit to the psychophysical results (Figure 7). We therefore conclude that the visual system estimates the speed of motion in depth by taking a weighted average of changes in relative retinal disparity (Model 1) and changes in the target's angular rate of convergence as estimated from the sum of retinal slip and extraretinal eye velocity (Model 2). This is consistent with studies that have shown that observers use extraretinal signals when judging speed in the frontoparallel plane (Brenner & van den Berg, 1994; Champion & Freeman, 2010; Freeman, 2001; Freeman & Banks, 1998; Freeman, Champion, & Warren, 2010; Freeman & Fowler, 2000; Turano & Massof, 2001). The weight that we find for the extraretinal component is lower than the extraretinal gain terms found for the estimation of lateral motion, which are typically between 0.6 and 0.8 (Freeman & Banks, 1998; Freeman & Fowler, 2000; Turano & Massof, 2001). However, looming is likely to provide a substantial contribution to judgments of motion in depth, for which there is no equivalent for lateral motion. Alternatively, the different weights may be due to the much larger angular velocity relative to each eye required for lateral motion than for a similar amount of motion in depth. 
Figure 7
 
Predictions for a model based on a weighted average of the absolute speed of the target and the speed of the target relative to the background. The solid curves show model predictions. Data points show the actual fractions of judgments. (A) Predictions and judgments in terms of the relative speed on the screen. (B) Predictions and judgments in terms of the weighted model. Note that the curve for far (blue) is always obscured by the curve for near (red).
Figure 7
 
Predictions for a model based on a weighted average of the absolute speed of the target and the speed of the target relative to the background. The solid curves show model predictions. Data points show the actual fractions of judgments. (A) Predictions and judgments in terms of the relative speed on the screen. (B) Predictions and judgments in terms of the weighted model. Note that the curve for far (blue) is always obscured by the curve for near (red).
Other work also suggests a lower weight (ca. 0.4) for extraretinal information when judging 3D motion (Howard, 2008). Moreover, the weight is likely to depend on the speeds involved. For instance, consider that under our paradigm the target appears static until it starts moving relative to the background. From Equation 2, this might appear unexpected. Specifically, the only way to estimate V = 0 when there is no relative disparity and some angular motion is to assume that the weight given to angular motion is zero during this portion of the trial. In fact, giving zero weight to the angular motion in this situation is quite reasonable as the variance associated with the estimate of speed from relative motion is likely to be very low (i.e., the lack of relative retinal motion in the display is quite certain). For two objects moving in depth at slightly different speeds, as is the case in the critical conditions of our experiment (approaching and receding), the certainty in the estimate of their relative motion is likely to be large because the difference in speed is small, whereas the certainty in the estimate of each of their absolute speeds is likely to be small because both objects are moving fast. In light of this, our study probably underestimates the role of extraretinal information in judging motion in depth because relative motion was probably given more weight than would normally be the case. 
In conclusion, by independently manipulating the extraretinal signal and measuring the resulting biases in speed perception, we demonstrate that the human visual system exploits extraretinal signals when judging the speed of motion in depth, just as it does for judgments of lateral speed. We present a model that uses a weighted combination of changes in relative disparity and differences in angular motion to reproduce measured perceptual judgments. Finally, we show that perceived speed is not scaled by the orientation of the eyes, indicating that under our paradigm people judge angular speeds rather than velocities in 3D space. 
Supplementary Materials
Supplementary Movie - Supplementary Movie 
Movie 1. A stereoscopic movie that illustrates a single trial of the experiment. The background (rotating triangles) and target (small dot) move sinusoidally across the display in counter phase for the two eyes. At a critical point in the sequence, in this movie while the eyes were converging, the target dot changed colour (here illustrated by an increase in brightness) and moved towards the participant. Participants judged the approach speed of the dot during this period. The movie is designed for red-green anaglyphic viewing (red filter over the left eye). With all other visible references extinguished, participants have no reliable sensation of the large changes in the absolute disparity of the stimulus. 
Acknowledgments
This work was funded in part by the European Community's Seventh Framework Programme FP7/2007-2013 under Grant Agreement Number 214728-2. We thank two anonymous reviewers for their comments and suggested improvements. 
Commercial relationships: none. 
Corresponding author: Arthur J. Lugtigheid. 
Email: lugtigheid@gmail.com. 
Address: School of Psychology, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK. 
References
Backus B. T. Banks M. S. van Ee R. Crowell J. (1999). Horizontal and vertical disparity, eye position, and stereoscopic slant perception. Vision Research, 39, 1143–1170. [CrossRef] [PubMed]
Backus B. T. Matza-Brown D. (2003). The contribution of vergence change to the measurement of relative disparity. Journal of Vision, 3, (11):8, 737–750, http://www.journalofvision.org/content/3/11/8, doi:10.1167/3.11.8. [PubMed] [Article] [CrossRef]
Beverley K. I. Regan D. (1973). Evidence for the existence of neural mechanisms selectively sensitive to the direction of movement in space. The Journal of Physiology, 235, 17–29. [CrossRef] [PubMed]
Bradshaw M. F. Glennerster A. Rogers B. J. (1996). The effect of display size on disparity scaling from differential perspective and vergence cues. Vision Research, 36, 1255–1264. [CrossRef] [PubMed]
Brenner E. Smeets J. B. J. Landy M. (2001). How vertical disparities assist judgements of distance. Vision Research, 41, 3455–3465. [CrossRef] [PubMed]
Brenner E. van Damme W. J. (1998). Judging distance from ocular convergence. Vision Research, 38, 493–498. [CrossRef] [PubMed]
Brenner E. van den Berg A. V. (1994). Judging object velocity during smooth pursuit eye movements. Experimental Brain Research, 99, 316–324. [CrossRef] [PubMed]
Brenner E. van den Berg A. V. van Damme W. J. (1996). Perceived motion in depth. Vision Research, 36, 699–706. [CrossRef] [PubMed]
Champion R. A. Freeman T. C. A. (2010). Discrimination contours for the perception of head-centred velocity. Journal of Vision, 10, (6):14, 1–9, http://www.journalofvision.org/content/10/6/14, doi:10.1167/10.6.14. [PubMed] [Article] [CrossRef] [PubMed]
Cochran W. G. (1937). Problems arising in the analysis of a series of similar experiments. Journal of the Royal Statistical Society, 4, 102–118.
Collett T. S. Schwarz U. Sobel E. C. (1991). The interaction of oculomotor cues and stimulus size in stereoscopic depth constancy. Perception, 20, 733–754. [CrossRef] [PubMed]
Cumming B. G. Parker A. J. (1994). Binocular mechanisms for detecting motion-in-depth. Vision Research, 34, 483–495. [CrossRef] [PubMed]
Enright J. T. (1991). Exploring the third dimension with eye movements—Better than stereopsis. Vision Research, 31, 1549–1562. [CrossRef] [PubMed]
Erkelens C. J. Collewijn H. (1985a). Eye movements and stereopsis during dichoptic viewing of moving random-dot stereograms. Vision Research, 25, 1689–1700. [CrossRef]
Erkelens C. J. Collewijn H. (1985b). Motion perception during dichoptic viewing of moving random-dot stereograms. Vision Research, 25, 583–588. [CrossRef]
Foley J. M. (1980). Binocular distance perception. Psychological Review, 87, 411–433. [CrossRef] [PubMed]
Freeman T. C. A. (2001). Transducer models of head-centred motion perception. Vision Research, 41, 2741–2755. [CrossRef] [PubMed]
Freeman T. C. A. Banks M. S. (1998). Perceived head-centric speed is affected by both extra-retinal and retinal errors. Vision Research, 38, 941–945. [CrossRef] [PubMed]
Freeman T. C. A. Champion R. A. Warren P. A. (2010). Bayesian analysis of perceived speed during smooth eye pursuit. Current Biology, 20, 757–762. [CrossRef] [PubMed]
Freeman T. C. A. Fowler T. A. (2000). Unequal retinal and extra-retinal motion signals produce different perceived slants of moving surfaces. Vision Research, 40, 1857–1868. [CrossRef] [PubMed]
Frisby J. P. Catherall C. Porrill J. Buckley D. (1997). Sequential stereopsis using high-pass spatial frequency filtered textures. Vision Research, 37, 3109–3116. [CrossRef] [PubMed]
González E. G. Allison R. S. Ono H. Vinnikov M. (2010). Cue conflict between disparity change and looming in the perception of motion in depth. Vision Research, 50, 136–143. [CrossRef] [PubMed]
Harris J. M. (2006). The interaction of eye movements and retinal signals during the perception of 3-D motion direction. Journal of Vision, 6, (8):2, 777–790, http://www.journalofvision.org/content/6/8/2, doi:10.1167/6.8.2. [PubMed] [Article] [CrossRef]
Harris J. M. Watamaniuk S. N. (1995). Speed discrimination of motion-in-depth using binocular cues. Vision Research, 35, 885–896. [CrossRef] [PubMed]
Howard I. P. (2008). Vergence modulation as a cue to movement in depth. Spatial Vision, 21, 581–592. [CrossRef] [PubMed]
McKee S. P. (1981). A local mechanism for differential velocity detection. Vision Research, 21, 491–500. [CrossRef] [PubMed]
McKee S. P. Welch L. (1989). Is there a constancy for velocity? Vision Research, 29, 553–561. [CrossRef] [PubMed]
Nefs H. T. Harris J. M. (2007). Vergence effects on the perception of motion-in-depth. Experimental Brain Research, 183, 313–322. [CrossRef] [PubMed]
Nefs H. T. Harris J. M. (2008). Induced motion in depth and the effects of vergence eye movements. Journal of Vision, 8, (3):8, 1–16, http://www.journalofvision.org/content/8/3/8, doi:10.1167/8.3.8. [PubMed] [Article] [CrossRef] [PubMed]
Regan D. Erkelens C. J. Collewijn H. (1986). Necessary conditions for the perception of motion in depth. Investigative Ophthalmology & Visual Science, 27, 584–597. [PubMed]
Regan D. Gray R. (2009). Binocular processing of motion: Some unresolved questions. Spatial Vision, 22, 1–43. [CrossRef] [PubMed]
Rokers B. Cormack L. K. Huk A. C. (2009). Disparity- and velocity-based signals for three-dimensional motion perception in human MT. Nature Neuroscience, 12, 1050–1055. [CrossRef] [PubMed]
Rotman G. Brenner E. Smeets J. B. J. (2004). Mislocalization of targets flashed during smooth pursuit depends on the change in gaze direction after the flash. Journal of Vision, 4, (7):4, 564–574, http://www.journalofvision.org/content/4/7/4, doi:10.1167/4.7.4. [PubMed] [Article] [CrossRef]
Rotman G. Brenner E. Smeets J. B. J. (2005). Flashes are localised as if they were moving with the eyes. Vision Research, 45, 355–364. [CrossRef] [PubMed]
Shioiri S. Saisho H. Yaguchi H. (2000). Motion in depth based on inter-ocular velocity differences. Vision Research, 40, 2565–2572. [CrossRef] [PubMed]
Taroyan N. A. Buckley D. Porrill J. Frisby J. P. (2000). Exploring sequential stereopsis for co-planarity tasks. Vision Research, 40, 3373–3390. [CrossRef] [PubMed]
Turano K. A. Massof R. W. (2001). Nonlinear contribution of eye velocity to motion perception. Vision Research, 41, 385–395. [CrossRef] [PubMed]
Welchman A. E. Harris J. M. Brenner E. (2009). Extra-retinal signals support the estimation of 3D motion. Vision Research, 49, 782–789. [CrossRef] [PubMed]
Figure 1
 
Schematic showing the lateral motion information available to the left eye for 3D motion judgments. At time 1, when the target (red dot) has not yet started to “move,” the target is at the center of the moving background (horizontal black line) and the eyes are fixated on F 1 (note that this is not necessarily representative but is assumed for clarity). At time 2, the background has moved by extent B (black arrow) and the target by extent T (red arrow), so the target has moved relative to the background by extent TB (green arrow). The eyes have moved by P (blue arrow) and are now fixating on F 2, slightly behind the center of the background (B 2). This “lag” causes a retinal slip of the background, B′, and a retinal slip of the target, T′. The inset shows the interpretation of such lateral target motion in terms of motion in depth, assuming that the right eye sees a mirror symmetrical image.
Figure 1
 
Schematic showing the lateral motion information available to the left eye for 3D motion judgments. At time 1, when the target (red dot) has not yet started to “move,” the target is at the center of the moving background (horizontal black line) and the eyes are fixated on F 1 (note that this is not necessarily representative but is assumed for clarity). At time 2, the background has moved by extent B (black arrow) and the target by extent T (red arrow), so the target has moved relative to the background by extent TB (green arrow). The eyes have moved by P (blue arrow) and are now fixating on F 2, slightly behind the center of the background (B 2). This “lag” causes a retinal slip of the background, B′, and a retinal slip of the target, T′. The inset shows the interpretation of such lateral target motion in terms of motion in depth, assuming that the right eye sees a mirror symmetrical image.
Figure 2
 
An illustration of how movement of the target and background were used to separate retinal and extraretinal cues to motion estimation. (A) An illustration of the motion in depth of the target and background in each condition. The background moved back and forth sinusoidally, in opposite directions in the two eyes, throughout the entire experiment (frequency = 0.25 Hz). In terms of binocular cues, this corresponds with oscillations in depth, but due to the absence of looming, these oscillations are not perceived. Target motion was perceived when we changed the relative disparity of the target with respect to the background. We did so at four phases of the background's oscillation (orange arrows): far, near, converging, or diverging. (B) Velocity of the target with respect to the background (upper panel) and of the targets and the background (lower panel) for the left eye. The same relative velocity corresponds to different target velocities in the four conditions.
Figure 2
 
An illustration of how movement of the target and background were used to separate retinal and extraretinal cues to motion estimation. (A) An illustration of the motion in depth of the target and background in each condition. The background moved back and forth sinusoidally, in opposite directions in the two eyes, throughout the entire experiment (frequency = 0.25 Hz). In terms of binocular cues, this corresponds with oscillations in depth, but due to the absence of looming, these oscillations are not perceived. Target motion was perceived when we changed the relative disparity of the target with respect to the background. We did so at four phases of the background's oscillation (orange arrows): far, near, converging, or diverging. (B) Velocity of the target with respect to the background (upper panel) and of the targets and the background (lower panel) for the left eye. The same relative velocity corresponds to different target velocities in the four conditions.
Figure 3
 
Model predictions for (A, B) a relative velocity model and (C, D) an absolute velocity model. The relative velocity could be judged by taking the difference between the retinal slip velocity of the target (T′) and the retinal slip velocity of the background (B′). The absolute velocity of the target could be judged by summing the pursuit velocity (P) and the retinal velocity of the target (T′). Note that since T′ = TP and B′ = BP, the predictions for the perceived velocity are TB and T, respectively, irrespective of the pursuit velocity. Predictions are shown both in terms of the relative speed (A, C) and in terms of the absolute speed (B, D) on the screen. Whenever curves overlap, only the red curve is visible.
Figure 3
 
Model predictions for (A, B) a relative velocity model and (C, D) an absolute velocity model. The relative velocity could be judged by taking the difference between the retinal slip velocity of the target (T′) and the retinal slip velocity of the background (B′). The absolute velocity of the target could be judged by summing the pursuit velocity (P) and the retinal velocity of the target (T′). Note that since T′ = TP and B′ = BP, the predictions for the perceived velocity are TB and T, respectively, irrespective of the pursuit velocity. Predictions are shown both in terms of the relative speed (A, C) and in terms of the absolute speed (B, D) on the screen. Whenever curves overlap, only the red curve is visible.
Figure 4
 
Measured eye positions from 200 ms before the target started moving until 200 ms after the target disappeared. The dashed lines indicate the background position. (A) Eye movements from a single trial in which the eyes were converging when the target was presented. Top row: Left and right eye traces. Bottom row: Version and vergence traces (i.e., mean of and difference between the left and right eye traces). (B) Average vergence response for each condition (expressed in terms of half the lateral distance between where the two eyes were directed on the screen).
Figure 4
 
Measured eye positions from 200 ms before the target started moving until 200 ms after the target disappeared. The dashed lines indicate the background position. (A) Eye movements from a single trial in which the eyes were converging when the target was presented. Top row: Left and right eye traces. Bottom row: Version and vergence traces (i.e., mean of and difference between the left and right eye traces). (B) Average vergence response for each condition (expressed in terms of half the lateral distance between where the two eyes were directed on the screen).
Figure 5
 
Psychophysical results. (A) Psychometric functions for the fraction of trials in which observers responded that the target approached faster than the average approach speed as a function of the relative velocity on the screen. Separate functions for the conditions in which the eyes were in near (red) and far (blue) vergence positions and when the eyes were converging (green) and diverging (cyan). (B) Points of subjective equality with standard errors. The solid horizontal line represents the mean speed of the target with respect to the background. (C) The individual subjects' data.
Figure 5
 
Psychophysical results. (A) Psychometric functions for the fraction of trials in which observers responded that the target approached faster than the average approach speed as a function of the relative velocity on the screen. Separate functions for the conditions in which the eyes were in near (red) and far (blue) vergence positions and when the eyes were converging (green) and diverging (cyan). (B) Points of subjective equality with standard errors. The solid horizontal line represents the mean speed of the target with respect to the background. (C) The individual subjects' data.
Figure 6
 
Predictions for a model that recovers the real-world speed of the target, both (A) in terms of the relative speed on the screen and (B) in terms of the real-world speed that would coincide with the simulated target positions on the screen. Note that this model scales the lateral motion by the viewing distance when judging speed, in contrast to our original models.
Figure 6
 
Predictions for a model that recovers the real-world speed of the target, both (A) in terms of the relative speed on the screen and (B) in terms of the real-world speed that would coincide with the simulated target positions on the screen. Note that this model scales the lateral motion by the viewing distance when judging speed, in contrast to our original models.
Figure 7
 
Predictions for a model based on a weighted average of the absolute speed of the target and the speed of the target relative to the background. The solid curves show model predictions. Data points show the actual fractions of judgments. (A) Predictions and judgments in terms of the relative speed on the screen. (B) Predictions and judgments in terms of the weighted model. Note that the curve for far (blue) is always obscured by the curve for near (red).
Figure 7
 
Predictions for a model based on a weighted average of the absolute speed of the target and the speed of the target relative to the background. The solid curves show model predictions. Data points show the actual fractions of judgments. (A) Predictions and judgments in terms of the relative speed on the screen. (B) Predictions and judgments in terms of the weighted model. Note that the curve for far (blue) is always obscured by the curve for near (red).
Supplementary Movie
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×