To gain insight into how speeds are combined in structure-from-motion, we compared performance for estimating the mean speed and performance for detecting deviations from planarity. The stimuli showed a center dot surrounded by an annulus of dots. In one (plane) condition, the stimuli simulated a rotating plane. In a two alternative forced choice (2AFC) task, the subject had to choose in which of two stimuli the center dot moved in the plane. In another (cloud) condition, the same dot locations and speeds were used but now assigned to different dots. Such a stimulus resembles a translating and rotating cloud of dots. In this case, the subject had to choose the stimulus in which the center dot moved with the mean speed of the surrounding dots. Performance was measured as a function of deformation/slant. Although location and speeds were the same in both conditions, performance was much poorer in the cloud condition. Subsequent experiments and an ideal observer model point to a plausible explanation: in detecting deviations from planarity, the visual system can focus on the most reliable pieces of information (the slower dots, closest to the test dot). Although performance could benefit by taking more dots into account, performance barely improved with an increase in the number of dots. This may reflect a limited processing capacity of the visual system.

*reference*stimulus, the center dot moved also in accordance with the affine transformation (Figure 2a and 2c). The speed of the center dot was also equal to the average speed. In the

*signal*stimulus, the center dot moved either with a larger or a smaller speed (Figure 2b and 2d). The task of the subject was to indicate which of the two stimuli was the reference stimulus.

*N*number of dots (

*N*= 49) in the middle frame of the stimulus sequence were chosen uniformly and randomly distributed within an annulus with inner radius of

*r*

_{min}and outer radius

*r*

_{max}(

*r*

_{min}= 100 pixels = 1.9 deg,

*r*

_{max}= 200 pixels = 3.8 deg). The angles of the positions in polar coordinates (

*r*

_{i},

*α*

_{i}) were equally distributed from 0 to 360 deg with +/- 30% scatter: the angle

*α*

_{i}of a point with index

*i*is given by in which

*random*represents a random number between 0 and 1 and

*α*

_{0}is fixed for all dots and randomly chosen. In the second stage, the center of mass was calculated and subtracted from the positions, such that the test dot, located at the center of the screen, coincided with the center of mass. A set of horizontal displacements

*S*was assigned to the dots. The dots moved linearly from

*x*+

*S*/2 to

*x*−

*S*/2. The displacement

*S*assigned to the dots was a linear function of the horizontal and vertical positions

*x*and

*y*: in which

*Def*is the deformation,

*ϕ*the direction of deformation and

*T*the overall displacement.

*T*, and in the signal stimulus, the displacement of the center dot was

*T*+

*δS*. Because the test dot was in the center of mass, the test dot moved with the average speed in the reference condition, on which basis the signal could be discriminated from the reference stimulus.

*ϕ*was chosen randomly. A

*ϕ*of 0 leads to a horizontal compression (horizontally tilted plane), while a

*ϕ*of 90 deg leads to a horizontal shearing motion (a vertically tilted plane). In the standard setting, the overall displacement

*T*was randomly chosen between 0.47 and 1.42 deg (corresponding to mean speeds between 1.42 and 4.27 deg/s). Each stimulus sequence consisted of 25 frames and lasted 0.33 s.

*T*= 0.75,

*ϕ*= 45,

*Def*= 0.5,

*δT*= 0.4. Note that in each trial each sequence was shown only once.

*δT*at which the subject could discriminate the signal stimuli from the reference stimuli with 81% probability. At any given trial, the absolute value of the offset displacement

*δT*was set to the maximum likelihood estimation of the threshold and its sign was chosen randomly. At the start of each staircase,

*δT*was set to twice the estimated threshold level. The subject was seated in a dimly lit room at 70 cm from the screen with the left eye covered and the right eye aligned with the center of the screen, with his/her head on a chin rest. At each trial, a reference stimulus and a signal stimulus were shown in random order separated by a blank interval (showing a black screen) that lasted 0.4 s. After this, the subject indicated which of the two stimuli represented the reference stimulus by pressing the left or right mouse button. In the plane condition, the reference stimulus was defined by the fact that the test dot moved with the local speed of the plane, as well as with the average speed of the surrounding dots. In the cloud condition, the reference stimulus was defined by the fact that the test dot moved with the average speed of the surrounding dots. Feedback was provided in the form of a tone that sounded after a wrong answer was given. This ensured that subjects were using a (near) optimal strategy in each condition. In each session, thresholds were determined for several conditions simultaneously in which the conditions were randomly interleaved. In each condition, a threshold was calculated after 80 trials. The thresholds presented are the geometric averages (average on a logarithmic scale) of the thresholds obtained in five or more sessions. Errors correspond to SEM of these values. We also present the geometric average of the thresholds of the three subjects (labelled as “Average”). In the latter case, the error estimates presented in the figures correspond to the square root of the sum of squared (individual) errors.

*all*dots is optimally combined. Instead, the results of Experiment 2 and the modelling exercise will show that these results are consistent with the idea that the visual system focuses on the most relevant pieces of information in the stimulus.

*relative*rather than

*absolute*speed.

*r*

_{min},

*r*

_{max}) differed. These were respectively (1, 1.5), (1, 2), (1, 3), (2, 3) and (2.5, 3), in which 1 unit corresponds to 1.9 deg, and in which (1, 2) resembled the standard setting. Note that for the first 3 conditions, the inner radius is the same, whereas for the last 3 conditions, the outer radius is the same. In all conditions, the dot density was kept constant and was the same as in the standard setting.

*N*in the annulus was 20, 49, 132, 82, and 45 for conditions 1 to 5.

*r*

_{min}= 1,

*r*

_{max}= 2), while the speed distribution was varied, that is the same as in the plane conditions used in this experiment (i.e., the lines shown in Figure 6). Data for this condition are shown as open blue squares in Figure 7 (“cloudspeed”). In this condition, the number of dots was kept constant at

*N*= 49.

*ϕ*(see Equation 1) was either approximately 0 or 90 deg (i.e., it was either at a random angle between −10 and +10 deg, or between 80 and 100 deg). (A direction of 0 deg leads to a horizontal compression, while a direction of 90 deg leads to a horizontal shear transformation). To produce the stimuli, we generated a set of locations and speeds as in Experiment 1a (standard settings). Then a subset of the dots was shown whose locations fell within a certain segment of the annulus. The segments in which the dots were shown were either around an angle

*α*of 0 and 180 deg, or around 90 and 270 deg, and were 45 deg wide. A total of 4 conditions were used varying in

*ϕ*and

*α*: (

*ϕ*,

*α*) = (0, 90), (90, 0), (0, 0), and (90, 90), schematically depicted in the top of Figure 8.

*ϕ, α*) = (0, 90) and (90,0) are about the same as those obtained in the standard settings, suggesting that these segments contain sufficient information to carry out the task as well as in the standard condition. The thresholds for the conditions with (

*ϕ, α*) = (0, 0) and (90, 90), on the other hand, are significantly higher. The average data on the right show that performance in these conditions is equally poor in these two conditions (green columns). The stimuli in conditions (0, 90) and (90, 0) (i.e., data shown by the red columns) contain dots that move with the lowest relative speeds, whereas the stimuli in conditions (0, 0) and (90, 90), shown by the green columns, contain dots moving with high speeds. This suggests that in the plane condition, performance is determined mainly by those dots that have the smallest relative speeds. Performance does not improve when other dots are shown as well.

*particular*subset of dots (see “cloudpos” results in Figure 7). This might be because for estimation of the average speed all dots are equally informative.

*N*(sometimes referred to as probability summation).

*N*= 2, because then the task would amount to determining whether all 3 dots are part of the same line). A control experiment showed that the use of this constraint did not change the thresholds for a number of dots larger than 2 (note that the test dot is close to the center even when this constraint is not applied).

*N*= 3, 6, 12, and 49 dots. In the cloud condition, deformation thresholds were obtained for

*N*= 2, 3, 4, 8, 16, and 49 dots. In the condition

*without*deformation, thresholds were obtained for

*N*= 1, 2, 3, 4, 8, 16, and 49 dots.

*N*) predicted by probability summation (∼1/√

*N*. The highest thresholds were obtained in the cloud condition. Remarkably, performance did not improve when the number of dots increased from 2 to 49 dots. One might suppose that such poor performance arises from using a limited number of randomly chosen dots. However, the fact that thresholds remain constant is incompatible with such a strategy. With an increase in the number of dots, the average of the subset of dots would be more and more dissimilar from the real average, which would lead to an

*increase*in the threshold.

*N*when the number of dots increased from 1 to 8, but level off for higher numbers of dots. Here, there is no evidence that the slope is initially -0.5 or that the slope changes with an increase in the number of points.

*x,y,S*)

_{i}of all points

*i*= 1…

*N*. The speeds S

_{i}are measured with uncertainty

*σ*

_{i}(which may differ from dot to dot).

*S*

_{m}is simply equal to the average of the speeds (i.e., the mean):

*S*

_{m}= Σ

*S*

_{i}/

*N*. This leads to an uncertainty in the average speed estimation of

*σ*

_{m}given by:

*σ*

^{2}

_{m}= Σ

*σ*

^{2}

_{i}/

*N*

^{2}, or written differently: that is, the uncertainty in the average speed estimate is equal to the square root of the average squared sigma divided by the square root of the number of measurements (the bracket < > indicates the average).

*t*is obtained by fitting a plane

*S*=

*ax*+

*by*+

*t*through the points (

*x,y,S*)

_{i}. Given that the noise is drawn from a Gaussian distribution, the optimal way to do this is by minimizing

*χ*

^{2}given by

*t*,

*a*and

*b*to zero leads to a set of linear equations (e.g., see Press, Flannery, Teukolsky, & Vetterling, 1996) that can easily be solved (see “Appendix”). The equation describing the uncertainty in the estimate of

*t*,

*σ*

_{t}is rather complex. In the model simulations, the exact equations are used. To give some intuitive idea, we also derived an approximation for the case that the test dot lies is the center of mass. In that case it is in close approximation equal to (see “Appendix”): .

- The predicted thresholds decrease with increasing numbers of dots (with one over the square root of
*N*:∼1/√*N*. This follows from the assumption that the speed measurements are independent and that all measurements are taken into account. This is sometimes referred to as probability summation. - If the noise in the speeds differs from dot to dot, better performance is predicted for estimating the local planar speed than for estimating the average speed. For example, suppose that the sigma for individual dots (
*σ*_{i}) spans the range*σ*_{o}−Δ/2 to*σ*_{o}+ Δ/2, where*σ*_{o}, is the average value and Δ the range. In that case, the uncertainty in the average speed estimate,*σ*_{m}, becomes larger than the average*σ*_{o}:*σ*^{2}_{m}=*σ*^{2}_{0}+ Δ^{2}/12 (using Equation 2) whereas the uncertainty in the local speed estimate,*σ*_{m}, becomes smaller than the average*σ*_{0}:*σ*^{2}_{t}=*σ*^{2}_{0}− Δ^{2}/4 (using Equation 4).

*σ*

_{i}, increases with increasing (relative) speed and has the following form: in which

*k*accounts for a proportional increase of the noise with speed, Δ

*S*

_{i}is the relative speed of the dot, and

*c*is a plateau level of

*σ*

_{i}as Δ

*S*

_{i}becomes small. Up to high speeds (64 deg/s), thresholds for speed discrimination (de Bruyn & Orban, 1988) can be modelled by a similar expression (see Hogervorst & Eagle, 1998).

*k*and

*c*for which the property

*σ*

_{m}√

*N*, the uncertainty in the average speed times √

*N*, equals the thresholds obtained in the cloud condition (using Equation 5). This leads to very high values of

*k*: 31% (MH), 94% (JR), 70% (ES), and 58% (Average), with values for the

*c*of 0.04 (MH), 0.09 (JR), 0.13 (ES), and 0.08 (Average). The average threshold data is plotted in Figure 12 along with the fitted line (“cloudAll”). The parameter

*k*can be compared directly with Weber fractions for speed discrimination, which are in the order of 5% to 8% (McKee, 1981; de Bruyn & Orban, 1988). Note that we use the factor √

*N*to compare the threshold obtained in this experiment with thresholds for speed discrimination with the same number of dots (because no effect of number of dots was found here or in the experiments of de Bruyn & Orban, discussed with Experiment 3 “Results” below). Comparing the fitted

*k*values with Weber fractions for speed discrimination (of uniformly translating textures) shows that performance for estimating the average speed is remarkably poor. This means that subjects are poor at judging the average speed of a cloud of dots when it is also rotating (a possible 3D interpretation of the stimulus): the rotation interferes with the estimation of the average speed. The deforming cloud data relate to the results from Watamaniuk and Duchon (1992), who performed experiments in which subjects had to discriminate the average speed of two sets of dots whose speed distributions were equal in width. They obtained thresholds for Gaussian speed distributions with moderate widths (up to 22% of the mean speed), and found that thresholds were

*unaffected*by the width of the distribution.

*k*= 0.58,

*c*= 0.08). This prediction is also plotted in Figure 12 (“planeAll”). This model does not predict the thresholds obtained in the plane condition very well. Although the model predicts the thresholds to be somewhat lower than in the cloud condition [consistent with prediction (c) from the previous section], the observed difference is much larger. This model takes all speed measurements into account. However, Experiments 2 and 3 indicate that in the plane condition, performance is determined primarily by the dots in the slow segments close to the test dot. We therefore calculated the predictions of the model in which only the dots in the slow segments were taken into account. The same noise model was used, but only dots in segments within +/− 45 deg from the deformation direction were taken into account (“planeSeg”, see top in Figure 8). Although the fit is not perfect, this model predicts the data fairly well (especially for large amounts of deformation). The main point is that the difference in thresholds between the plane and the cloud conditions can be accounted for by using only a subsection of the dots (“the best ones”) in the plane condition, and using all dots in the cloud condition.

*N*. The results of Experiment 3 show that this is not the case. In the plane condition with nonzero deformation, there is a small decrease in threshold. However, this decrease can be explained because with an increase in the number of dots, there is an increase in the chance of finding a slow dot close to the test dot (see Experiment 2). In the cloud condition, only a small (but statistically significant) decrease was found in the zero deformation condition. In the model described here, we have assumed that the thresholds are independent of the number of dots. This assumption is in accordance with the results of de Bruyn and Orban (1988), who found that thresholds for speed discrimination were not higher when one rather than many dots were used. Such a lack of improvement with an increase in the number of dots was also incorporated into the model used by Hogervorst and Eagle (1998, 2000) and Eagle and Hogervorst (1999) that was successful in explaining performance in structure-from-motion experiments.

*k*that accounts for a proportional increase of the noise with speed and the plateau level

*c*for speed discrimination thresholds (see “Model” section), which were derived from the data in Experiment 1 and used for all the modelling.

*k*(Equation 5), equal to 58% for the average subject, can be compared directly with Weber fractions for speed discrimination, which are around 5% (for medium speeds). This means that much higher noise levels have to be assumed to account for our results than for thresholds for speed discrimination. A similar model used by Hogervorst and Eagle (1998, 2000) was successful in modelling structure-from-motion thresholds and biases in perceived depth. They used estimates of the noise that were directly derived from human velocity and acceleration thresholds for uniform moving patterns (for small viewing angles, the noise was equal to these estimates, and for large viewing angles, the noise was twice as much). In our model, the noise is more than 10 times as high (for the average subject). One difference between our model and their model is that we use relative speeds as input, whereas Hogervorst and Eagle use speeds expressed in screen coordinates. The results show that this is more appropriate in our case (see Figure 4). Still, in the studies by Hogervorst and Eagle, the hinged planes rotated around their hinge, and the hinge did not show any additional translation. Whether absolute speeds, retinal speeds, or relative speeds (and relative to which reference) are more appropriate as input in structure-from-motion algorithms needs to be determined in future studies. The fact remains that the noise levels required to explain the results here are much higher than the noise levels required to explain the results in the studies by Hogervorst and Eagle and basic motion discrimination thresholds (using uniformly moving patterns). The reason for this is unclear.

*z*values in a number of points (

*x,y,z*)

_{i}. for

*i*= 1…N and given that these measurements are taken from their real values with noise added from Gaussian distributions with widths

*σ*

_{i}, the object is to find a plane

*z*=

*t*+

*ax*+

*by*that best represents the data. This is done by minimizing function

*χ*

^{2}given by:

*σ*

_{i}. However, other weightings are also possible; for example, the weight could be made to vary with the distance from the origin to make it more local (as in splines). Minimization amounts to setting the partial derivatives to zero: leading to the following set of linear equations: in which The solution is written as in which

**M**is the inverse of the matrix displayed above. The task set in our experiment requires deduction of

*t*. The best estimate of

*t*follows from the solution of the matrix equation in which the measurements of

*z*

_{i}are used:

*t*=

*m*

_{31}Σ

_{zx}+

*m*

_{32}Σ

_{zy}+

*M*

_{33}Σ

_{z}. The noise in these measurements propagate into the noise on the estimate of

*t*,

*σ*

_{t}, in the following way: This is analogous to the derivation for fitting a line in 2D:

*y*=

*a x*+

*t*, described in Numerical Recipes (Press et al., 1996). In the 2D case, the variance in the

*t*is given by: which reduces to when Σ

_{x}= 0, in which the bracket < > stands for the average. Therefore, when the tilt is well defined and Σ

_{x}= 0, the uncertainty in the local speed of the plane reduces to (A5). When one fits a plane

*z*=

*t*to the data in a similar way as described above, one obtains: in which case the variance in

*t*is described exactly (i.e., not an approximation) by (A5). This is a weighted average of all measurements, in which the weight is inversely related to the uncertainty in the measurement. When taking a normal average

*m*= Σ

*Z*

_{i}/

*N*, the variance is simply To appreciate what these equations mean, one could use the analogy of a number of resistances with magnitudes equal to

*σ*

_{i}

^{2}. The total resistance corresponds to the variance in the speed estimate of the test dot. The variance in the local planar speed resembles a situation in which the resistances are in parallel, whereas the variance in the average speed estimation resembles a situation in which the

*N*resistances are in series (actually,

*N*of these series should be placed in parallel to account for the division by

*N*). While the variance in the average speed estimate is determined equally by all variances, the variance in the local speed of the plane is determined largely by the smallest variance (the smallest resistance).