It has recently been shown in our laboratory (
Wexler et al. 2001a;
Wexler et al., 2001b) that a new hypothesis is needed to account for structure-from-motion performance in moving observers: the visual system makes the stationarity assumption; that is, it prefers SfM solutions that minimize motion in an allocentric or earth-fixed reference frame. The stationarity assumption has obvious computational and ecological advantages. Similar motion-minimization criteria have classically been invoked to account for the perception of 2D movement (
Wertheimer, 1912;
Wallach & O’Connell, 1953; see Weiss, Simoncelli, & Adelson [
2002] for recent work).
Due to the stationarity assumption, the rate of reversals should be much smaller in
act than in
immob: the simulated plane in
act is stationary and the reversed plane is not, whereas the simulated and reversed planes are equally non-stationary in
immob (
Wexler et al., 2001a). We have indeed found this to be the case (see
Figure 4). (The fact that reversals occur in less than 50% of
immob trials indicates that the visual system takes into account second-order optic flow.)
The stationarity assumption would predict that solutions that really are stationary (as always, we mean stationary in an allocentric frame) will be perceived more precisely, because the initial 3D motion estimate will need much less refinement. This prediction is, in fact, borne out by some of our results. When there is no tilt reversal, the solution in
act is stationary, whereas the solution in
immob rotates at the same speed, ω, as the subject did in the corresponding
act trial (see animation).
5 Accordingly, we found that, in trials without reversals, tilt errors are smaller in
act (16.6°) than in
immob (23.3°): see
Figure 6. When tilt reverses, in
immob the reversed solution rotates in the opposite direction but with the same speed, −ω. In
act, on the other hand, the reversed solution rotates at 2ω (see animation). Accordingly, we find
immob tilt errors about the same in reversed trials (25.5°), whereas in
act they are about twice as high (34.0°) as in unreversed trials. This could mean that observers do not prefer to see only an allocentrically stationary object, but that the computation of its tilt is performed in an allocentric reference frame, contrary to using only retinal data that are egocentric.
On the other hand, the stationarity assumption runs into problems in predicting the effects of shear. At first, all seems well: we take a circle in space with an arbitrary slant and tilt and rotate it by angle α about an arbitrary frontal axis. Let R
0(ρ,ϑ) and R(ρ,ϑ) be the initial and final positions in 3D space of a point on the circle with 2D polar coordinates (ρ,ϑ). When we average the square length of 3D displacements generated by this rotation, we find the following expression
Equation 8 shows that nonstationarity rises with the shear, η, which would seem to be in agreement with our tilt error results in
immob (see
Figure 5). However, our virtual objects were not circles in space but in the image plane, and were then projected onto the simulated surface; therefore, in space, these objects were ellipses. When we perform the above calculation for these elliptical objects (in parallel projection), we find the following mean square displacement:
which is independent of shear. (In perspective projection, the first correction to
Equation 9 is in second order, which can be safely ignored for our small stimuli.)
Therefore, the stationarity assumption seems to be in agreement with some general features of our data, but not with the dependence of tilt perception on shear.
Note that at first glance, the stationarity assumption resembles the change-of-surface-normal explanation, but the two should not be confused. The stationarity assumption has to do with how plane orientation is extracted from optic flow in moving and immobile observers; the change-in-the-normal hypothesis has to do with how perceived plane orientations are combined.