Recent studies suggest that the active observer combines optic flow information with extra-retinal signals resulting from head motion. Such a combination allows, in principle, a correct discrimination of the presence or absence of surface rotation. In Experiments 1 and 2, observers were asked to perform such discrimination task while performing a lateral head shift. In Experiment 3, observers were shown the optic flow generated by their own movement with respect to a stationary planar slanted surface and were asked to classify perceived surface rotation as being small or large. We found that the perception of surface motion was systematically biased. We found that, in active, as well as in passive vision, perceived surface rotation was affected by the deformation component of the first-order optic flow, regardless of the actual surface rotation. We also found that the addition of a null disparity field increased the likelihood of perceiving surface rotation in active, but not in passive vision. Both these results suggest that vestibular information, provided by active vision, is not sufficient for veridical 3D shape and motion recovery from the optic flow.

*rigidity assumption*. Under this assumption, the Euclidean three-dimensional (3D) structure of the distal object can in principle be recovered from the image transformations produced by an orthographic projection, if second-order temporal information is available (Longuet-Higgins, 1984; Longuet-Higgins & Prazdny, 1980; Ullman, 1979). However, a large number of psychophysical studies have shown that human observers exhibit a very limited sensitivity to the second-order temporal properties of the optic flow. It has been found, in fact, that perceived SfM depends only on the specific properties of the first-order optic flow, that is, on the gradients of the velocity field (Domini, Caudek, & Proffitt, 1997; Todd & Bressan, 1990).

*def*) is the only component that provides information about the surface orientation and motion in the 3D scene.

*def*, head's motion, surface rotation, and surface slant (see Figure A1) is shown in the following equation (see 1 for a derivation):

*α*

_{0}is the visual direction,

*T*

_{ αx}is the horizontal translatory motion component of the observer's head (expressed in terms of angular velocity),

*α*

_{s}is the surface's slant about the vertical axis, and

*ω*is the angular rotation velocity of the surface about the vertical axis (see Figure A2).

*def*provide sufficient information to allow veridical discrimination of the presence or absence of surface rotation? In any single moment in time,

*def*does not specify the presence or absence of surface rotation even if

*T*

_{ αx}and

*α*

_{0}are known.

^{1}The contribution of

*ω*to the total deformation, in fact, is confused with the contribution of surface slant (i.e.,

*α*

_{s}). As a consequence, the same

*def*can be produced by a stationary or by a rotating surface, depending on

*α*

_{s}(see Figure 1).

*T*

_{αx}and

*α*

_{0}are specified by proprioceptive information. If the distal surface is stationary, then

*ω*= 0 and an unbiased estimate of

*α*

_{s}(surface slant) can be found from Equation 1. This estimate of

*α*

_{s}remains constant in successive moments of the motion sequence.

*α*

_{s}with

*ω*= 0 will produce a

*biased*estimate of

*α*

_{s}. Such biased estimate will take on different values in different moments in time.

_{s}remains constant and those in which

_{s}varies in time. These two classes of events set apart the optic flow fields produced by stationary and rotating surfaces, respectively.

*T*

_{ αx}and

*α*

_{0}are left unspecified and

*def*remains ambiguous, not only in each single moment in time but also across an extended time window. Veridical discrimination of the presence or absence of surface rotation is not possible. Therefore, in active vision we should expect to find the same systematic biases reported in passive SfM: perceived rotation should be a positive function of

*def*,

*regardless of actual surface rotation*(Caudek & Domini, 1998; Caudek & Proffitt, 1993; Caudek & Rubin, 2001; Domini et al., 1997; Domini, Caudek, Turner, & Favretto, 1998; Todd & Bressan, 1990; Todd & Perotti, 1999).

*def*as a function of angular rotation and surface tilt is plotted in Figure 2. Note that in Experiments 1a and 1b,

*def*covaried with tilt. If the perceptual analysis relies exclusively on retinal signals, then we should expect a larger number of “Rotation” responses for surfaces with 180° tilt, regardless of actual surface rotation.

*xy*-plane coplanar with the monitor screen, the

*x*-axis pointing to the subject's right, the

*y*-axis upward, and the

*z*-axis away from the subject. The origin of the reference frame was set at the center of the monitor's screen.

*y*-axis. The dot arrangement was varied by taking into account the observer's head position and his/her orientation with respect to the simulated surface. We set the dots to the maximal electron-gun value of 82 cd/m

^{2}; the black background was 3 cd/m

^{2}.

*z*

_{0}= tan(g1)

*x*

_{0}+ tan(g2)

*y*

_{0}, with

*x*

_{0}and

*y*

_{0}randomly selected in the range between ±25 mm from the screen center, and g1 and g2 representing the amount of surface rotation around the

*y*- and

*x*-axes, respectively. In terms of slant

*σ*and tilt

*τ*of a planar surface, g1 corresponds to sin(

*τ*)tan(

*σ*), and g2 to cos(

*τ*)tan(

*σ*), so that

*σ*= arctan

*τ*= atan(tan(g2)/tan(g1)). In different trials, we set g1 = ±45° and g2 = 0, thus producing two planar surfaces with equal slant (45°) but opposite tilt angles (represented by the sign of g1; see Figure 2).

*def*as a function of Rotation Velocity and Tilt is provided in Figure 2 (top panel).

*def*component of the first-order optic flow (see Figure 2, top panel), which does not allow a veridical discrimination of presence or absence of surface rotation. In fact if we replot the proportion of “Rotation” responses as a function of average

*def*(right panels of Figure 6), we see that, in active, as well as in passive vision, proportion of “Rotation” responses increased monotonically with

*def*(

*x*-axis), not with Rotation Velocity (circle's size), regardless of the fact that the simulated surface is stationary or rotating.

*def*is controlled, we computed

*d*′ by following the procedure indicated by Wright and London (2009) and Wright, Horry, and Skagerberg (2009). One of the advantages of computing

*d*′ by means of a linear mixed effects (

*lme*) analysis involves the possibility of adding a continuous variable to the model (in our case,

*def*). In a first

*lme*model, disregarding the effect of

*def*, we asked whether observers can discriminate stationary from rotating surfaces. In this model,

*d*′ took on the value of 1.123 (

*z*= 8.738,

*p*< 0.001), indicating a veridical performance. More interesting, however, was to repeat the same analysis when the effect of

*def*was statistically controlled. In a second

*lme*model, by adding

*def*as a covariate,

*d*′ becomes statistically equal to zero (

*d*′ = −0.100,

*z*= −0.576,

*p*= 0.565): there was no evidence that observers could veridically discriminate stationary surfaces from rotating ones

*when def was kept constant*. In this second model, the likelihood of a “Rotating” response was completely explained by

*def*(

*β*

_{def}= 5.292,

*z*= 15.566,

*p*< 0.001).

*T*

_{ αx}and

*α*

_{0}of Equation 1, and we know that vestibulo-ocular reflex is more effective than pursue eye movements for image stabilization (Bennur & Gold, 2008; Buizza, Leger, Droulez, Bertoz, & Schmid, 1980; Ferman, Collewijn, Jansen, & van den Berg, 1987; Gu, Angelaki, & DeAngeis, 2008; Gu, DeAngeis, & Angelaki, 2007; Liu & Angelaki, 2009). Therefore, we might have expected an advantage of active over passive vision. Nevertheless, the present data suggest that in active vision, perceived surface rotation is a direct function of

*def*, rather than of actual surface rotation. For our stimuli, the additional information provided by vestibular information was not sufficient for the veridical estimation of planar surface motion from the optic flow.

^{2}For the stimuli of Experiment 1b, the velocity/disparity pairing was compatible with a viewing distance of at least 3 m (see Fantoni, 2008). Such a large viewing distance, however, was at odds with the fact that both vergence and accommodation were modulated by a much smaller viewing distance of only 480 mm (i.e., screen distance). We reasoned that this conflict between the cues to viewing distance could be resolved either by vetoing extra-retinal information (vergence and accommodation), or by disregarding the retinal information (null disparity field).

- If vergence and accommodation are disregarded, then the 3D interpretation of the optic flow must take into consideration the large viewing distance compatible with the null disparity field. A stationary surface positioned very far from the observer can generate only a negligible motion parallax, if the observer's head moves by a small amount. In our experiments, head's motion was small but motion parallax was far from negligible. This stimulus situation is only compatible with a rotation of the distal surface—see the motion-distance invariance principle (Gogel & Tietz, 1973; Hay & Sawyer, 1969; Tyler, 1974; Wallach, Yablick, & Smith, 1972). If vergence and accommodation are vetoed, therefore, we expect a larger likelihood of “Rotation” responses in Experiment 1b than in Experiment 1a.
- If in Experiment 1b the information provided by the null disparity field is disregarded, then we should find the same results as in Experiment 1a.

*def*is not taken into consideration,

*d*′ was significant (

*d*′ = 0.849,

*z*= 6.768,

*p*< 0.001). When

*def*was controlled,

*d*′ become statistically equal to zero (

*d*′ = 0.102,

*z*= 0.700,

*p*= 0.484). Likewise, in this case, the likelihood of a “Rotation” response was only a function of

*def*(

*β*

_{def}= 3.172,

*z*= 10.103,

*p*< 0.001; Figure 7, right panel).

*lme*analysis, we considered only the trials in which the simulated surface was stationary. For these trials, the proportion of “Rotation” responses was significantly larger in Experiment 1b than in Experiment 1a (0.49 vs. 0.21;

*z*= 8.644,

*p*< 0.001). In a second analysis, we considered only the trials simulating a surface rotation. Likewise, in this case, in Experiment 1b, we found a larger proportion of “Rotation” responses than in Experiment 1a (0.78 vs. 0.59;

*z*= 5.016,

*p*< 0.001).

*def*. In Experiments 1a and 1b, in fact, the two variables covaried. The purpose of Experiments 2a and 2b was to disentangle the effects of these two variables. This was done by simulating two surfaces slanted around the horizontal (rather than vertical) axis with a gradient of velocity in a direction orthogonal to the direction of lateral head motion. These two surfaces differed for their tilt directions (as in Experiment 1) but generated similar

*def*components. Experiments 2a and 2b thus replicated the design of Experiments 1a and 1b, with the difference that tilt did not covaried with

*def*. In these circumstances, we hypothesized that the tilt variation would not affect the perceptual discrimination of the presence or absence of surface rotation. In Experiment 2a, viewing was monocular; in Experiment 2b, viewing was binocular, with the same (monocular) optic flow shown to both eyes (null disparity field).

*z*-axis. With such surface orientation, the variation of the horizontal shear induced by the lateral head shift is independent of tilt. Instantaneous

*def*was then the same for both tilt conditions (90° and 270°) in each moment of the motion sequence. Across the three Rotation Velocities that had been simulated (0, 10, 20 deg/s), average

*def*was equal to 0.26, 0.44, and 0.62 rad/s, respectively. The apparatus, display, and procedure were otherwise identical to those of Experiment 1.

*def*covaried perfectly with Rotation Velocity, so the relative effects of these two variables cannot be distinguished.

*lme*model with response as the dependent variable (stationary vs. rotating surface), participants as random factor, and angular Rotation Velocity (0, 10°, 20°/s) and Tilt (90° and 270°) as fixed effects revealed significant main effects for both variables (Rotation Velocity:

*z*= 5.153,

*p*< 0.001; Tilt:

*z*= 3.089,

*p*< 0.005) and a not significant interaction (

*χ*

_{1}

^{2}= 1.725, n.s.).

*def*) and was larger for surfaces with a 270° tilt.

*lme*model with response (stationary vs. rotating surface) as the dependent variable, participants as random factor, and Rotation Velocity (0, 10, 20 deg/s) and Tilt (90° and 270°) as fixed effects revealed significant main effects for Rotation Velocity (

*z*= 12.330,

*p*< 0.001). Neither the effect of Tilt (

*z*= 0.953, n.s.) nor the interaction between Tilt and Rotation Velocity (

*χ*

_{1}

^{2}= 0.911, n.s.) was significant.

*lme*analysis with response (stationary vs. rotating surface) as the dependent variable, participants as random factor, and angular Rotation Velocity (0, 10, 20 deg/s), Tilt angle (90° and 270°), and Experiment ( Experiment 2a/monocular versus Experiment 2b/binocular) as fixed effects showed that the likelihood of a “Rotation” response was larger for Experiment 2b ( Figure 8, bottom panel) than for Experiment 2a ( Figure 8, top panel). This result replicates what we found in Experiment 1 (

*z*= 10.016,

*p*< 0.001).

*def*, the effect of tilt on the perceptual discrimination between stationary and rotating surfaces disappears or was greatly reduced. The research on passive SfM has shown that

*def*is not the only determinant of perceived angular rotation. Domini and Caudek (1999), for example, found that surface tilt accounts for a small component of perceived surface rotation (see also Todd & Bressan, 1990; Todd & Perotti, 1999). In Experiment 2a, we replicated this finding in active SfM: perceived surface rotation was indeed affected by surface tilt, even though this effect was very small if compared to the effect of

*def*. For an interpretation of tilt effects in the spatial domain, see Fantoni (2008).

*stationary*planar slanted surface. As detailed in 2, the optic flow produced by the active observer was recorded, and subsequently, it was shown to a stationary observer (passive-viewing condition). In both conditions, observers were asked to classify the apparent rotation of the simulated (stationary) surface as being “small” or “large”.

*def*, but the simulated surface was always stationary. The stimulus displays were viewed either monocularly or binocularly.

*def*, even if the simulated surface was always stationary; (2) the addition of a null disparity field would increase the likelihood of a “Large Rotation” response (see Rogers & Collett, 1989).

*def*component of the optic flow for the passive observer, we created two different kinds of displays (see Braunstein & Tittle, 1988; Naji & Freeman, 2004; Rogers & Collett, 1989):

- both Translational and Rotational (TR) components of the optic flow generated during active vision trials were provided to the passive observer;
- only the Rotational (Rot) component of the optic flow generated during active vision trials was provided to the passive observer (not the horizontal translational component).

*def*(with

*def*equal to 0.19 and 0.34 for 0° tilt and 180° tilt, respectively). Correspondingly, a larger proportion, of “Large Rotation” responses, was associated with the larger

*def*magnitude.

*z*= 3.350,

*p*< 0.001). The same effect, but stronger, was found also for the monocular condition, as revealed by the significant interaction between Viewing Mode and Tilt (

*z*= 4.282,

*p*< 0.001). From Figure 10, we also see that the likelihood of a “Large Rotation” response was higher in the binocular than in the monocular condition (

*z*= 8.165,

*p*< 0.001). This replicates the results found in Experiments 1 and 2.

*z*= 3.427,

*p*< 0.001): a higher likelihood of a “Large Rotation” response was associated with the larger

*def*(180° tilt). Figure 10 indicates that a similar effect was also found in the monocular Rot condition. In both TR and Rot conditions, the effect of

*def*was stronger in the monocular condition, as revealed by the significant interaction between Passive Optic Flow and Tilt (

*z*= 2.195,

*p*< 0.05).

*def*had similar effects on the perception of surface rotation in both active and passive SfMs. This is apparent from Figure 10 (monocular) when we compare the proportions of “Large Rotation” responses for 0° and 180° tilt, across the Act, TR, and Rot conditions. The addition of a null disparity field affected the response in the active but not in the passive viewing condition.

*def*in Experiments 1a and 1b, but not in Experiment 2. Likewise, in active vision, therefore, perceived surface rotation seems to depend on the analysis of the first-order optic flow (e.g., Domini & Caudek, 2003a, 2003b). In Experiment 3, we replayed to the passive observers the optic flow previously generated by the active observers. We found a similar response pattern in both active and passive SfMs.

*ω*) is a function of

*def*, regardless of the actual 3D surface rotation (Domini & Caudek, 1999, 2003a, 2003b). As indicated in Figure 1, for both the active and passive observers,

*def*is ambiguous, in the sense that the same

*def*can be produced by different slant (

*σ*) and angular rotations (

*ω*) values. For the recovery of surface rotation, Domini and Caudek (2003b) proposed that the visual system chooses, among these infinite

*σ*and

*ω*pairs, the one that maximizes the likelihood function

*p*(

*def*∣

*σ*):

*p*(

*def*∣

*ω*) has a maximum: The value

*ω*

_{i}corresponding to the maximum of the marginal distribution

*p*(

*def*∣

*ω*) is the maximum likelihood estimate

*z*= (

*x*−

*x*

_{0})

*g*

_{ x}, where

*x*

_{0}is the horizontal coordinate of the point of the surface that intersects the image plane and

*g*

_{ x}is the horizontal depth gradient (slant). Suppose that the planar surface translates horizontally with speed

*T*

_{ x}and rotates with angular velocity

*ω*about a vertical axis passing through the point (

*x*

_{0}, 0, 0). In this case,

*x*−

*x*

_{0})

*g*

_{ x}

*ω*+

*T*

_{ x}and

*x*−

*x*

_{0})

*ω*. If we (1) solve Equation A1 for

**after substituting**

*x**z*for the equation of the plane, (2) substitute

**, now function of**

*x**x*

_{ P}, in the equations for

*z*, and (3) substitute

*z*in Equation A2, then we obtain the equation for the image plane velocity field

_{ P}(

*x*

_{ P}):

*def*) calculated at

*x*

_{0}can be obtained by deriving the previous equation ( Equation A3) with respect to

*x*

_{ P}:

*α*the horizontal visual direction of a generic point

*P*belonging to the planar surface, then

*def*is defined as

*α*

_{0}= tan

^{−1}

*α*) =

*α*

_{0}) =

*T*

_{ αx}=

*α*are very small for the range of movements relevant to the present study, tan(

*α*) ≈

*α*and tan(

*α*)

^{2}≪

*α*. By this approximation,

*def*becomes

*def*varies if the observer translates and the surface is static (

*ω*= 0). In fact,

*α*

_{0}increases or decreases with the rightward horizontal position of the viewing point, depending on the sign of

*α*

_{s}(defining the tilt of the surface). Therefore, as shown in Figure A3, the absolute value of the second term of Equation A5 increases if

*α*

_{s}= +45° (i.e., tilt = 180°) and decreases if

*α*

_{s}= −45° (i.e., tilt = 0°). Consequently, the average

*def*is larger for

*α*

_{s}= +45° than for

*α*

_{s}= −45°. That is,

*def*covaries with the tilt angle of a vertically slanted surface, if the observer undergoes a lateral head translation.

*z*′ corresponds to the cyclopean line of sight in the corresponding actively viewed display;

*x*′ is parallel to the inter-ocular axis in the corresponding actively viewed display;

*y*′ passes through the intersection of the

*x*′- and

*z*′-axes and is orthogonal to both of them ( Figure B1). The center of the new coordinate system was fixed at the distance (

*D*) of 480 mm from the cyclopean eye and the simulated planar surface was projected onto the

*x*′–

*y*′ plane.

*P*′ =

*P*·

*R*

_{ yz}+

*T*

_{ xyz}, where

*R*

_{ yz}and

*T*

_{ xyz}identify the rotational and translational components of the following Transformation Matrix:

*R*

_{ z}(about the

*z*-axis) and

*R*

_{ y}(about the

*y*-axis). Consistent with Listing's Law,

*R*

_{ x}was neglected (the vertical extension of the eyes is null and any rotation around inter-ocular axis of the head leaves the image unchanged). The

*R*

_{ y}

*R*

_{ z}multiplication resulted in the following Rotation Matrix:

*α*

_{ y}is the rotation angle of the inter-ocular axis around the

*y*-axis in the actively viewed display and

*α*

_{z}is the rotation angle of the inter-ocular axis around the

*z*-axis. The two rotation angles were calculated according to the left (

*x*

_{ el},

*y*

_{ el},

*z*

_{ el}) and right eye (

*x*

_{ er},

*y*

_{ er},

*z*

_{ er}) positions during active vision.

*α*

_{ y}, but not

*α*

_{ z}= arctan

- TR, in which the observers' fixation was assumed to be straight ahead, regardless of object position and
*α*_{ y}= arctan$ ( z e \u2062 r \u2212 z e \u2062 l x e \u2062 r \u2212 x e \u2062 l )$; - Rot, in which the observers' fixation was assumed to be centered on the planar surface regardless of actual head position and
*α*_{ y}= arctan$ ( ( x e \u2062 r + x e \u2062 l ) / 2 ( z e \u2062 r + z e \u2062 l ) / 2 )$.

^{2}We are not claiming here that a null disparity field associated with a velocity field actually elicits the perception of a far surface viewed at large viewing distance from the observer. Instead, we claim that extra-retinal signals may not be used for estimating the object's 3D structure and motion (Cornilleau-Pérès & Droulez, 1994), but only for estimating the egocentric distance: the retinal cues (null disparity and velocity gradient) specify a rotating surface, whereas extra-retinal cues specify a small egocentric distance.