Humans can recover 3-D structure from the projected 2D motion field of a rotating object, a phenomenon called structure from motion (SFM). Current models of SFM perception are limited to the case in which objects rotate about a frontoparallel axis. However, as our recent psychophysical studies showed, frontoparallel axes of rotation are not representative of the general case. Here we present the first model to address the problem of SFM perception for the general case of rotations around an arbitrary axis. The SFM computation is cast as a two-stage process. The first stage computes the structure perpendicular to the axis of rotation. The second stage corrects for the slant of the axis of rotation. For cylinders, the computed object shape is invariant with respect to the observer's viewpoint (that is, perceived shape doesn't change with a change in the direction of the axis of rotation). The model uses template matching to estimate global parameters such as the angular speed of rotation, which are then used to compute the local depth structure. The model provides quantitative predictions that agree well with current psychophysical data for both frontoparallel and non-frontoparallel rotations.

*1. SFM from 1*

*st***One theory is that humans recover SFM from the 1st-order velocity field. This implies that shape is recoverable only up to a scaling factor in depth (Norman & Todd, 1993; Todd, 1998; Todd & Norman, 1991; Werkhoven & van Veen, 1995). Hence, accuracy is low for judgments requiring veridical perception of Euclidean metric structure, such as judgments of lengths or angles (Braunstein, Liter, & Tittle, 1993; Cornilleau-Pérès & Droulez, 1989; Eagle & Blake, 1995; Hogervorst, Kappers, & Koenderink, 1993; Liter, Braunstein, & Hoffman, 1993; Norman & Lappin, 1992; Todd & Bressan, 1990). Conversely, accuracy is high for judgments of an object's “affine”**

*-order velocities:*^{1}structure—the structure up to a scaling factor in depth—such as depth order between pair of points, parallelism between lines defined by pairs of points on a planar surface, and coplanarity among points (Braunstein et al., 1993; Eagle & Blake, 1995; Hogervorst et al., 1993; Liter et al., 1993; Tittle et al., 1995; Todd & Bressan, 1990). Koenderink and van Doorn (1991) developed one of the best-known theoretical approaches to computing affine structure from the first-order optic flow.

**There is evidence that humans can use second-order optic flow information (that is, accelerations) and perspective information in computing the SFM. Regarding perspective effects, Eagle and Hogervorst (1999) found that shape discrimination thresholds decreased with stimulus size under perspective projection, but under orthographic projection, thresholds increased. Regarding the effects of accelerations, Hogervorst and Eagle (1998, 2000) showed that errors in the estimate of speeds and accelerations could explain some pattern of misperceptions in SFM.**

*2. SFM from 2nd-order optic flow and perspective effects:***A more radical view is that of Domini et al. (Domini & Braunstein, 1998; Domini & Caudek, 2003; Domini, Caudek, & Richman, 1998). They propose that the perceived slant of a surface is a function of the 1st-order optic-flow property**

*3. Surface slant from def:**def*and predict that the structure of a shape recovered from SFM will be internally inconsistent. The authors suggest that this prediction is in agreement with numerous experimental results (Domini & Braunstein, 1998; Domini et al., 1998).

*not affine*. Moreover, violations of affinity can result in perceived

*depth-order violations*along the line of sight. We found, for cylinders rotating about its longitudinal axis, that as the axis of rotation changes its inclination, shape constancy is maintained through a trade-off: Humans perceive a constant object shape relative to a changing axis of rotation and they do this by introducing an inconsistency between the perceived speed of rotation and the 1st-order optic flow. Shape constancy demands this inconsistency because observers do not perceive the inclination of the axis of rotation veridically. The observed depth-order violations are the cost of the trade-off. Note that objects other than cylinders do not show this kind of shape constancy. As mentioned above, Loomis and Eby (1988) found that the perceived shape of rotating ellipsoids changes with changes in the slant of the axis of rotation.

*depth-scaling*parameter, that is, the ratio between perceived and simulated depth. This approach assumes that the visual system computes affine structure first. Then, in a second step, a scaling factor is “assigned.” This transforms the initially computed affine structure into a metric object. But there are other ways to think about this problem of computing SFM, and in this article we used a different approach.

*Z*

_{0}is the distance to the object, and Ω is the angular speed of rotation. Equation 1, which implicitly assumes rigidity, gives the difference in depth Δ

*Z*(positive towards the observer) between any two points on the object from their differences in horizontal retinal velocity Δ

*v*

_{x}. Equation 1 assumes, without loss of generality, a vertical axis of rotation.

^{2}As a result, the depth structure can be recovered only up to a scaling factor in depth. Current theories lack an algorithm for predicting this scaling factor, which is needed to predict the recovered depth structure in the special case of a frontoparallel axis of rotation. Computing the scaling factor is mathematically equivalent to computing the perceived angular speed of rotation, Ω

_{obs}, which can be used in Equation 1 instead of Ω, the actual speed of rotation. Thus, using Ω in Equation 1 allows recovery of the veridical shape of the object, whereas using Ω

_{obs}allows recovery of the perceived shape.

_{obs}is non-veridical, in principle the object should end up looking either non-rigid or as a texture flowing over the surface of an otherwise rigid object. Perceptually, some cases (a minority) give the appearance of a texture flowing over the surface of an otherwise rigid object. But in all cases the objects appear rigid. One reason could be that the object's non-rigidity is below threshold. A big factor in discriminating rigidity and non-rigidity depends on temporal factors—memory—rather than just the magnitude of shape change. Large changes in the object can remain undetected because we do not keep a faithful 3D representation of the object in memory across a sufficiently long time, but only what amounts to an instantaneous one. In a task in which a 3D representation of the object had to be kept in memory, then the non-rigidity would perhaps become “visible.” Data on rigidity-versus-non-rigidity discriminations show that non-rigid objects can be perceived as rigid and vice versa, depending on whether specific optic-flow properties change, or do not change, over time.

^{3}As an example, Hogervorst, Kappers, and Koenderink (1997) found that non-rigid displays were not discriminable from rigid ones when every set of three consecutive views in the sequence had a rigid solution. They concluded that observers combined only a few views at a time. That is, they used a temporal local description of the flow. Only very large changes in structure were detected over large changes in rotation.

*θ*is the actual of inclination of the axis of rotation (0° <

*θ*< 90°) from the frontoparallel plane, and ∂

_{x}

*v*

_{y}is the derivative with respect to

*x*(horizontal direction) of the vertical component of the retinal velocity,

*v*

_{y}. Note that ∂

_{x}

*v*

_{y}is constant across the image and thus independent of depth (see Fernandez & Farell, 2007, for a demonstration). Equation 4, which implicitly assumes rigidity, gives the difference in depth Δ

*Z*(positive towards the observer) between any two points on the object from their differences in horizontal retinal velocity Δ

*v*

_{x}and angular vertical position Δ

*y*. It is assumed here that the axis of rotation lies within the sagittal plane passing through the eye. This choice does not represent a loss of generality, but is a direct consequence of assuming that the axis of rotation projects into a vertical line in the frontoparallel plane (see Equations 1 and 4). If the projection of the axis of rotation onto the frontoparallel plane were not vertical, Equations 1 and 4 would still be similar, but

*v*

_{x}would be replaced by the speeds perpendicular to the frontoparallel projection of the axis of rotation, and

*v*

_{y}by the speeds parallel to this projection (also, Δ

*y*would be measured in the direction parallel to the projection of the axis).

*θ*could not be recovered veridically (this is in agreement with psychophysical data, see for instance Fernandez & Farell, 2007). Then depth structure could be recovered only up to a scaling factor in depth by using the perceived angle of inclination,

*θ*

_{obs}, in Equation 4.

*λ*accounts for the departure of the perceived angular speed of rotation, Ω

_{obs}, from optic-flow consistency.

^{4}

*θ,*is perceived non-veridically as

*θ*

_{obs}, it is still possible to use Equation 3 by substituting

*θ*

_{obs}for

*θ*. The result is a non-veridical perceived speed of rotation that is consistent with the first-order optic flow. This is equivalent to recovering the shape from Equation 5 using

*λ*= 1. Having

*λ*≠ 1, in agreement with psychophysical data, implies that Ω

_{obs}, the perceived speed of rotation, is not consistent with the first-order optic flow.

*λ*is to multiply all distances relative to the axis of rotation rather than to the frontoparallel plane. The direction of this scaling is perpendicular to the frontoparallel plane, so that the retinal image will remain unchanged. In this sense, the recovered structure of the object is not affine. Let us characterize this recovered structure as

*λ*-affine.

*λ*can be viewed as a correction factor whose purpose is to allow the observer to perceive the stimulus shape as constant as the axis of rotation changes. In such a case

*λ*could in principle be computed from the optic flow (in Fernandez & Farell, 2007, we showed that this is possible). However, this way of thinking about

*λ*has the disadvantage of imposing an additional computational burden on the visual system. Alternatively,

*λ*can be viewed as a bias intrinsic to the observer rather than being computable from the input. We will follow this latter interpretation here. In addition, we propose that

*λ*is a constant.

*λ*results in additional constraints. For instance, for cylinders, we shown in 1 that

*λ*can be a constant only if sin

*θ*

_{obs}∝ sin

*θ*

_{sim}(the constraint comes from the psychophysical fact that, for cylinders rotating about its longitudinal axis, the perceived stimulus shape does not changes with changes in the axis of rotation). Now, by definition, we have (see 1):

*λ*as the proportionality factor between perceived and simulated gradient. We will call the perceived gradient ∇

_{obs}(=∂

_{ x}

*v*

_{ y}/

*λ*). We can rewrite Equation 5 in a format that explicitly reduces to Equation 1 (i.e., the frontoparallel case) when

*θ*= 0 (and thus, ∇

_{obs}= 0). Using Equations 5 and 6 we get, after some work:

_{obs}and ∇

_{obs}. These parameters can be obtained from template matching, as described below in The template-matching model.

*I*:

*I*is a measure of the difference between the input velocity pattern, f(

*x, y*), and the template velocity pattern, g(

*x, y*), where (

*x, y*) gives the spatial retinal coordinates in the image plane.

*I*takes its minimum value when both template and input patterns are the same. The response,

*r,*of the model MST cell is then defined as

*β*is a constant.

*r*is a maximal response when the template and input velocity patterns are the same. Equation 8 is valid when the functions f and g are scalar quantities (which is the case for frontoparallel rotations, where all velocities are parallel). In the more general case of non-frontoparallel rotations, f and g are vectors representing the velocity at any point in the retinal image. These velocities could have any direction. To generalize Equation 8 to vector variables we modify it in the following way.

^{5}the velocity field is not independent of the axis slant.

^{6}As a consequence of this, perceived shape will change if the axis slant changes. This is consistent with the psychophysical data of Loomis and Eby (1988) on rotating ellipsoids.

*x, y*) the optic-flow pattern from an object rotating about a frontoparallel axis can be written as the product of a shape factor and the angular speed of rotation, Ω (see 1). Thus f(

*x, y*) = ΩF(

*x, y*), where F is a function that depends only on the object's shape (as expressed either by the stimulus or the template). This is a function not only of position (

*x, y*), but also of parameters that define the shape (we will keep the treatment general at this point, leaving for later the specification of the template shapes used in our simulations). For instance, a rotating elliptical cylinder will have three parameters, S, C and

*φ,*where S represents the size, C represents the ellipticity, and

*φ*the angle between the major axis and the line of sight. The specific set of functions used in making the templates is described later in the subsection A minimalist template set.

*v*

_{ x}is still proportional to Ω, but instead of a shape factor we have a more complex expression that is also a function of the axis slant (see 1).

*α*and

*γ*

_{0}the sets of parameters representing the shapes of the stimulus and the template, respectively, and Ω and Ω

_{0}the angular speeds of stimulus and template, respectively. It is easy to show (replacing f(

*x, y*) = ΩF(

*x, y, α*) and g(

*x, y*) = Ω

_{0}G(

*x, y, γ*

_{0}) in Equation 8)

^{7}that the matching variable

*I,*defined in Equation 8, becomes (remember that this is only for frontoparallel rotations):

_{0}and

*γ*

_{0}), we can see from Equations 9 and 12 that, for a fixed

*γ*

_{0}, the population response will have a bell-shaped curve. The peak of this curve (which is always a maximum because A

_{G}is always positive) is a function of the stimulus angular speed (Ω) and shape (represented by the set of parameters

*α*). Because the population of cells contains cells tuned to various values of

*γ*

_{0}, we will have many cells that respond maximally. The tuning parameter Ω

_{0}of the maximally responding cell (for a given

*γ*

_{0}) will be:

^{8}in parameter space (Ω

_{0},

*γ*

_{0}). The average value of Ω

_{0}obtained from the population response will then be:

_{0}and the perceived angular speed of rotation, Ω

_{obs}, is simple and direct. For instance, if Ω

_{obs}were proportional to 〈Ω

_{0}〉, then Equation 15 tells us that the perceived rotational speed will be proportional to the simulated rotational speed.

^{9}The constant of proportionality will be different for different shapes and will depend on the particular family of templates used. This is a highly desirably property that is very difficult, if not impossible, to attain with simpler models that treat Ω

_{obs}as some function of

*def*or other optic-flow properties that are not shape invariant.

**. Having started with the more intuitively accessible case of a frontoparallel axis of rotation, we are now in a position to generalize to the case where the axis of rotation is slanted. Equations 12 and 15 can be generalized to slanted axes, because the optic flow can be decomposed into a product of Ω and a “shape” factor. For slanted axes, the “shape” factor is a function of the shape and the slant of the axis of rotation (see 30). That means that the previous derivations are not restricted to frontoparallel rotations but they are also valid for slanted axes of rotation. Note also that for non-frontoparallel rotations, for shapes other than cylinders, the constant of proportionality in Equation 15 will be a function of the axis slant—because B(**

*Generalization to slanted axes of rotation**α, γ*

_{0}) is a function of the axis slant—but at any slant Ω

_{obs}will still be proportional to Ω.

_{obs}is proportional to 〈Ω

_{0}〉, this ratio will be constant for a given shape and axis slant, and, in general, different for different shapes and different axis slants (although remember that for cylinders it will not change with the axis slant). This is in agreement with psychophysical data, as will be shown in Results.

_{obs}(the perceived speed of rotation) and ∇

_{obs}(the perceived gradient). We showed in the previous section how Ω

_{obs}could be obtained by template matching. Now we will show how to obtain ∇

_{obs}from template matching.

_{ x}

*v*

_{ y}is computed by template matching. Computing a gradient for rotations using templates is actually much simpler than computing Ω

_{obs}. This is because for rotations ∂

_{ x}

*v*

_{ y}is constant across the image, so for any rotating object we have

*v*

_{ y}=

*x*∂

_{ x}

*v*

_{ y}. We will abbreviate ∂

_{ x}

*v*

_{ y}as ∇, so

*v*

_{ y}=

*x*∇.

*x, y*) = ∇

_{0}G(

*x*) where ∇

_{0}is a constant and G(

*x*) is an arbitrary function of

*x*. We assume also that the family of templates includes all motion directions, which are obtained by rotating the template just defined.

*x*,

*y*) =

*x*∇. Note again that the gradient ∇, and thus the input, is independent of the object's shape (see Equation 3).

_{0}of the maximally responding cell will be:

_{0}) will be proportional to the simulated gradient, regardless of the stimulus shape and regardless of the particular family of templates, i.e., the specific form of G(

*x*). As an example, let us assume G(

*x*) =

*ηx,*where

*η*is a constant. Then, it follows that ∇

_{0}= ∇/

*η*. This implies, by the definition of

*λ,*that

*η*=

*λ*.

*x, y, α*) one could use Equation 15 to obtain B(

*α, γ*

_{0}) and then Equation 13 to find the corresponding template G(

*x, y, γ*

_{0}). It is more interesting and productive to take the opposite approach and ask: How well can we explain the available psychophysical data by keeping the number of templates and their complexity to a minimum?

_{0}or ∇

_{0}) to the computation of the output parameter. The computation is a weighted average in which each filter parameter has a weight proportional to the intensity of its filter's response.

_{0}.

**Planar Surfaces:**The first set of templates consists of the product of one specific planar surface (that is, one having a fixed slant and tilt) with various angular speeds of rotation. Thus, a specific template within this family is identified by just one parameter, the angular speed of rotation (see Methods).

**Cylinders:**The second family of templates consists of the product of an elliptical cylinder rotating about its longitudinal axis, with various angular speeds. The elliptical cylinders could have different ellipticities, but are all similarly oriented, that is, the major axis of each ellipse makes the same fixed angle

*γ*

_{0}with the line of sight. A specific template within this second family is identified by two parameters: the ellipticity and the angular speed of rotation (see Methods). In addition, we scaled the stimuli to a standard size that matches that of the templates. An alternative but equivalent option would have been to assume that templates come in families with size as an additional parameter, such that the cylinder templates and the stimuli match in size.

^{10}

*σ*

_{0}, and tilt,

*τ*

_{0}, and various rotational speeds. Then Equation 14, which gives the speed tuning parameter Ω

_{0}of the maximally responding cell

^{11}becomes (see 1):

*σ*and

*τ*are the speed of rotation, slant and tilt of the stimulus, respectively; and

*h*is a function of

*τ*and

*τ*

_{0}. The rightmost part of Equation 18 is a consequence of the fact that

*def*= Ω

*σ*for planar surfaces. If the perceived speed of rotation, Ω

_{obs}, is proportional to Ω

_{0}then, by Equation 18, Ω

_{obs}is proportional to

*def*. This agrees with the psychophysical data of Domini, Caudek, and Proffitt (1997) (see their Figure 7).

_{obs}is proportional to

*def*

^{1/2}(see their Figure 15, reproduced here in Figure 2b). It is unclear why the result differed in the two cases, although, due to error uncertainties, either set of data could probably be fitted by a linear function, a square root function, or any function in between.

_{obs}is proportional to

*def*

^{1/2}are correct. If our model is going to predict this, then we need to assume that the relationship between Ω

_{obs}and Ω

_{0}is not linear; rather, we will assume that Ω

_{obs}is proportional to Ω

_{0}

^{1/2}. One reason why the visual system might prefer a non-linear relationship between the cell's tuning parameter Ω

_{0}and Ω

_{obs}is a trade-off: for a linear relationship the perceived slant,

*σ*

_{obs}, would be independent of the surface slant,

*σ*(see next section). A non-linearity overcomes this problem. The non-linearity Ω

_{obs}∝ Ω

_{0}

^{1/2}results in Ω

_{obs}∝

*def*

^{1/2}, which is consistent with the results of Domini and Caudek (1999), as shown in Figure 2a. Note that assuming the above nonlinearity results in specific model predictions for the behavior of

*σ*

_{obs}. As shown in the next section,

*σ*

_{obs}is now predicted to behave as

*σ*

_{obs}∝

*σ*

^{1/2}, which is also consistent with psychophysical data (Domini & Caudek, 1999).

*σ*

_{obs}, is a function of its perceived rotational speed, Ω

_{obs}. They are related, by definition, through the relationship:

*def*= Ω

*σ*= Ω

_{obs}

*σ*

_{obs}(see 1). This expression is valid for frontoparallel rotations only (as far as we know). Thus, the model predictions that follow in this section are restricted to frontoparallel rotations. This is enough for our purposes because the data from Domini and Caudek (1999) that we are going to model was also restricted in the same way.

*σ*

_{obs}=

*def*/Ω

_{obs}and Equation 18 that, if Ω

_{obs}is proportional to Ω

_{0}(i.e., Ω

_{obs}=

*k*Ω

_{0}), then

*σ*

_{obs}=

*σ*

_{0}/[

*k h*(

*τ, τ*

_{0})]. Thus,

*σ*

_{obs}is independent of both the simulated slant,

*σ,*and

*def*. This disagrees with psychophysical data (see Figure 15 of Domini & Caudek, 1999, reproduced here in Figure 2a). There are two ways to modify the model to bring it in line with the psychophysical data. One way is to assume that our choice of templates was inadequate and to fix the problem by adding more templates. The idea would be to build a set of templates that results in a relationship that parallels that in Equation 18 but with a fundamental difference: now Ω

_{0}is proportional to

*def*

^{1/2}rather than to

*def*. In this case,

*σ*

_{obs}will not longer be a constant. It is not difficult to introduce templates that accomplish this. A second way to modify the model, which is the approach adopted here, is to keep the templates as they are and change the function that relates Ω

_{obs}to Ω

_{0}. Until now we assumed that these variables were proportional to each other, but this was just a simplifying assumption that doesn't necessarily represent the best choice.

_{obs}∝ Ω

_{0}

^{1/2}or Ω

_{obs}=

*k*Ω

_{0}

^{1/2}, where

*k*is a constant. This results in (again, using Equation 18 and

*σ*

_{obs}=

*def*/Ω

_{obs}):

_{obs}=

*k*Ω

_{0}+

*ζ,*where both

*k*and

*ζ*are constants.

^{12}This results in:

*σ*

_{obs}, varies with the simulated tilt,

*τ,*of the planar surface (see their Figure 18, shown here in Figure 3a). Their heuristic is unable to explain this dependence but our model predicts it (see Equation 19 or 20). Figure 3b shows the predicted dependence of

*σ*

_{obs}on

*τ*for Equation 19 (Equation 20 gives similar results). Figure 3b shows two curves, one for each of the two values of

*def*used in Figure 18 of Domini and Caudek (1999). All templates are planar surfaces with

*τ*

_{0}= 0 and

*σ*

_{0}= tan(75°). The match of predictions to the data is quite good, considering the simplifying assumptions used.

_{0}〉, does not have an analytical solution and has to be solved numerically (see Notes on computational methods).

*C*

_{obs}/

*C*

_{sim}(where

*C*

_{sim}and

*C*

_{obs}are the simulated and the perceived curvature, respectively) is the depth-scaling factor, which was found to be a function of the stimulus' shape (i.e., of

*C*

_{sim}).

^{13}

*def*. Domini and Caudek's model works well for predicting the perceived slant of planes, but fails to explain the data for cylinders shown in Figure 4a. In such a case Domini and Caudek's model predicts that the scaling factor should decrease with

*C*

_{sim}; this is contrary to psychophysical results (Fernandez & Farell, 2007), which show an increase.

^{14}Also assumed is a linear relationship between Ω

_{obs}and Ω

_{0}(see Notes on computational methods). The quantitative match between data and model is very good, especially considering the simplifications made in choosing the family of templates.

_{obs}and Ω

_{0}. For cylinders, instead, we assumed a linear relationship. The two assumptions do not contradict each other, because the relationship can be different for different sets of templates: Different sets of templates are independent of each other. In the physiological implementation of the model that we will present in a following paper it will become apparent why this can be so. It just depends on how template wires its feedback connections to the lower areas, so the only constraints here are psychophysics and simplicity.

*def*(see their Figure 8, reproduced here in Figure 5a). Equation 6 gives the perceived slant of the axis of rotation as predicted by our model. In order to compare our predictions with the data of Caudek and Domini's (1998), we will define the slant as

*σ*

_{axis}= 90 −

*θ*

_{obs}(making

*σ*

_{axis}an angle, rather than the tangent of an angle, as it was in the previous sections). Then, Equation 6 becomes:

_{obs}=

*k*Ω

_{0}

^{1/2}(valid for the frontoparallel rotations) to the case in which the axis of rotation is slanted. Then, it is easy to show that (see 1):

*ρ*(used as a parameter in Caudek & Domini, 1998) is the component of the speed of rotation along the line of sight (the z-component of Ω, which is assumed not to be observable), and

*C*is a constant given that the stimulus tilt,

*τ,*is constant (

*C*=

*k*[

*h*(

*τ, τ*

_{0})/

*σ*

_{0}]

^{1/2}, see Equation 18).

^{15}

*def*and

*ρ*used in Figure 8 of Caudek and Domini (1998), shown here as Figure 5a. The three curves were fitted using the same value of

*λC*. There is good agreement between the psychophysical data and model predictions.

_{obs}=

*c*Ω

_{0}=

*k*Ω

_{sim}(see Equation 15). Using Equations 3 and 6 now becomes:

*θ*

_{sim}is the simulated slant of the axis of rotation measured as a deviation from vertical. Figure 6 shows the predictions of Equation 23 together with the psychophysical data for circular cylinders obtained from three observers (data replotted from Fernandez & Farell, 2007). Each observer was fitted to a different value of

*λk*. Again, we find a reasonably good match between psychophysical data and model predictions.

*λ*= constant. only if the condition sin

*θ*

_{obs}∝ sin

*θ*

_{sim}is satisfied. Equation 23 satisfies this condition, showing that the template matching model is consistent with the hypothesis that

*λ*= constant.

*def*. There are two ways to extend this to curved surfaces. For curved surfaces,

*def*is a function of retinal position (

*x, y*), and changes from point to point. One extension—the continuous or

*differential*version—is to assume that both the perceived slant and the perceived angular speed of rotation, Ω

_{obs}, change locally from point to point as a function of local

*def*. This is extremely important because the depth-scaling factor is proportional to the inverse of this angular speed. Because Ω

_{obs}is computed locally, the structure of the recovered object is internally consistent but not affine. That is, the recovered object is not related to the simulated object by a simple scaling factor in depth. Rather, different scaling factors are found at different points on the recovered object. Note also that because the rotation rate is allowed to change from position to position, the object should be perceived as nonrigid.

*non-differential*version—assumes that the angular speed of rotation is computed as an average across differentiable surfaces, rather than locally. The logic here is that some objects might be perceptually segmented, with each patch possessing a different angular speed. Regardless of how the speed of rotation of a given surface patch is estimated, it is clear that the recovered structure will be affine for each patch, and each patch will be perceived as moving with its own angular speed, making the object non-rigid in the general case.

*def*. The template-matching model that we developed overcomes this problem by providing a unified framework for both planar and curved surfaces. For planar surfaces the model predicts the dependence of perceived slant and Ω

_{obs}on

*def,*and for cylinders it reproduces the variation of perceived versus simulated curvature described in Fernandez and Farell (2007).

^{16}The depth structure of each segmented part is then computed independently of all others, using a single computed value of Ω

_{obs}for each patch (though Ω

_{obs}can differ across patches).

*def*. As a result, the perceived depth separation between two points with a fixed physical depth separation on a planar surface would be a decreasing function of

*def*(see 1).

*def*results in an internally consistent perceived structure once we take the gap into account. All the psychophysical data used by Domini et al. as a support for their internal-inconsistency model are also easily reinterpreted in terms of an internally consistent perceived depth structure.

_{obs}and ∇

_{obs}. The second part is a template-matching model that uses the optic flow field as the input to compute the two global parameters required by Equation 7.

*h*(

*τ, τ*

_{0}) is derived in the 1. For the template we used a plane with

*τ*

_{0}= 0 and

*σ*

_{0}= tan75°. The limits of integration used were

*x*

_{1}=

*y*

_{1}= −.45

*x*

_{2}= −.45

*y*

_{2}(where the particular value for

*x*

_{1}is not important because they cancel out).

_{0}as being a continuous variable and used Equation 14 to compute Ω

_{0}for each of the curvatures used. We used cylinders with curvatures

*C*= .5, 1, 1.5, 2, 2.5 and 3; and with an angle of 5° between the line of sight and cylinder's apex. We integrated numerically the second and third integrals of Equation 13 in order to compute Ω

_{0}in Equation 14. Then we used the activities computed from Equation 9 to compute the population average value for Ω

_{0}, with

*β*= 5 × 10

^{−4}sec

^{2}/deg

^{2}.

_{0}was computed, we obtained Ω

_{obs}as Ω

_{obs}= Ω

_{0}(notice that

*C*

_{obs}/

*C*

_{sim}= Ω

_{sim}/Ω

_{obs}, see Fernandez & Farell, 2007).

*λ*

*C*(same value for the three curves) to get a reasonable match with the psychophysical data shown in Figure 5b.

*λ*is a factor that represents the deviation between the perceived angular speed of rotation, Ω

_{obs}, and an optic-flow consistent angular speed of rotation, Ω

_{OFC}(=−

*∂*

_{x}

*v*

_{y}/sin

*θ*

_{obs}). In general, Ω

_{obs}is not consistent with the first-order optic flow. Substituting the definition of Ω

_{OFC}into the definition of

*λ*(=Ω

_{OFC}/Ω

_{obs}), we obtain Equation 6.

_{0}we use Equation 14. To do so, we need to compute first the integrals A

_{G}and B, defined in Equation 13. The distance of a point on a planar surface from a frontoparallel plane in a direction towards the observer can be written as (see, e.g., Domini & Braunstein, 1998):

*σ*and

*τ*are the slant and tilt of the planar surface. We will use Equation A1 to represent the stimuli, and a similar equation but with

*σ*and

*τ*replaced by

*σ*

_{0}and

*τ*

_{0}for the planar surface representing the template. Then we have

*h*(

*τ, τ*

_{0}) = g

_{B}(

*τ, τ*

_{0})/f

_{A}(

*τ*

_{0}).

*θ*

_{obs}= 90 −

*σ*

_{axis}and −∂

_{ x}

*v*

_{ y}= Ω

_{sim}cos

*σ*

_{axis}=

*ρ*in Equation 6, we obtain:

_{obs}= kΩ

_{0}

^{1/2}, and from Equation 18 we have Ω

_{0}∝

*def,*so Ω

_{obs}=

*C*

*λ*= const. (

*cylinders*)

*z*

_{axis}) does not change with the inclination of this axis (Δ

*z*

_{axis}= const.). As shown there also, this perceived structure is given by:

*θ*

_{obs}to project it in the direction perpendicular to the axis of rotation. Replacing

*v*

_{y}=

*v*

_{z}

^{axis}sin

*θ*

_{sim}(see Fernandez & Farell, 2007), where

*v*

_{z}

^{axis}does not change with a change in the axis's slant, Equation A7 becomes:

*v*

_{ x}/∂

_{ x}

*v*

_{ z}

^{axis}= const. (that is, it does not change with the axis's slant, as long as the angular speed of rotation is kept constant), the condition for

*λ*to be a constant becomes sin

*θ*

_{obs}/sin

*θ*

_{sim}= const. When we allow not just the axis's inclination, but also the speed of rotation to change, then the condition for

*λ*to be a constant becomes more complex.

*R*, is perpendicular to the axis of rotation). The coordinates of such a point can be described as:

*φ*is the initial phase. The horizontal speed of such a point will be given by

*z*as a function of position (

*x, y*) because the point under consideration is arbitrary and the description is valid for any point on the object. Equation A10 shows that the velocity field of a rotating object can be written as the product of a shape factor and the angular speed of rotation.

*v*

_{ x}is still proportional to Ω, but in this case instead of a shape factor we have a more complex expression. This is easily obtained from Equation 2, which after a little work can be rewritten as

*def*= Ω

*σ*= Ω

_{obs}

*σ*

_{obs}

*def*=

*ωσ*for arbitrary rotations (where

*ω*is the projection of Ω into the frontoparallel plane). Thus this reduces to

*def*= Ω

*σ*for frontoparallel rotations. Here we show that Ω

_{obs}

*σ*

_{obs}=

*def*for frontoparallel rotations, which is the context in which we used it.

*def*

_{1}=

*def*

_{2}=

*v*

_{ y}= 0, so

*def*reduces to

*v*

_{ x}is the gradient of

*v*

_{ x}. From Equation 1 we have

*v*

_{ x}is maximal along the same direction (in the image plane) in which the change in perceived depth is maximal—or, stated in a different way, at each location in the image the gradients of

*v*

_{ x}and

*z*always point in the same direction, although this direction could vary across the image. If we call this direction

*r,*then we have (from Equation A14):

*v*

_{ x}∣ =

*def*and

*σ*

_{obs}, which, when substituted into Equation A15, results in

*def*= Ω

_{obs}

*σ*

_{obs}.

*def*). Let us show this. By definition, we have:

*σ*is the surface slant, Δ

*z*the depth difference between two points on the surface and Δ

*y*the difference in height between the same two points (we assume here for simplicity that the surface has a tilt of zero; in the general case

*y*would be measured along the tilt direction rather than along the vertical direction). The subscript

*obs*refers to perceived quantities. If we eliminate Δ

*y*from Equation A16 and also use

*def*= Ω

*σ*and

*σ*

_{obs}=

*ασ*

^{1/2}, where

*α*is a constant, we get:

*def*.

*def*predicts internally consistent structure

*def*. We show here that this predicts that the depth structure of the perceived object will be internally consistent.

*def*has a constant value across each planar surface and so its value is well defined for the surface. The same is valid for the planar surface's slant, whose value is constant across the surface.

*def*varies across the surface. It is unclear how the Domini et al.'s heuristic generalizes to this case. One way is for the perceived slant at any point on the surface to be a function of the local

*def*at that position. Let us first deal with the discrete version of the model.

*def*(see Equations 19 and 20). Our model always results in an internally consistent perceived depth structure. Why then did Domini et al. (1998) claim that depth structure is internally inconsistent? Their claim is based in the assumption that the intersection (or edge) of two planar surfaces that are simulated to rigidly rotate with the same angular speed will be perceived to intersect at the simulated intersection even if they are perceived as rotating at different angular speeds. However, this is inconsistent with the basic SFM equations (see Fernandez & Farell, 2007).

*z*(

*x, y*) is an arbitrary function of retinal position (

*x, y*). For instance,

*z*could be a function of either

*def*or

*σ,*which in turn could also be arbitrary functions of retinal position (

*x, y*). In general, the value of a path integral of the form

*x, y*)d

*x*+ g(

*x, y*)d

*y*], with f(

*x, y*) and g(

*x, y*) arbitrary functions, is a function of the path from P

_{1}to P

_{2}. To have a consistent object, the value of the integral in Equation A18 must be independent of the path. Using Stokes' theorem (Kaplan, 1952), it can be shown that a path integral is independent of the path if and only if

*z*(

*x, y*) is a continuous function of both

*x*and

*y*. Thus, we have demonstrated that, for a smooth surface, the perceived internal depth structure for the generalized Domini et al. model is always consistent.

*θ*

_{obs}∝ sin

*θ*

_{sim}

_{obs}=

*k*Ω

_{0}

^{1/2}. Following a procedure similar to the one used to obtain Equation 23, we get:

*θ*

_{obs}= cos

*σ*

_{axis}, thus Equation A21 is just a different version of Equation 22.

^{2}In principle, observers could use information about the period to estimate Ω. They could do this if they were able to see a few full rotations (which is not the case in most psychophysical tests) and the object had a salient feature to mark the periodicity. Our pilot experiments with rotating elliptical cylinders show that observers are very poor at predicting when a uniquely colored dot will reappear on the front. Even rather large shifts in the occluded dot's position go undetected when the dot reappears, making it unlikely that Ω is estimated from the period.

^{4}An alternative way to compare Equations 4 and 5 is to say that the perceived 3-D structure and motion are consistent with a different optic flow in which the observed value of (Δ

*v*

_{ x}/∂

_{x}

*v*

_{ y}) is

*λ*times the real value. This would mean that Δ

*v*

_{ x}, ∂

_{x}

*v*

_{ y}or both are observed wrongly, whereas our approach assumes the observed ∂

_{x}

*v*

_{ y}is 1/

*λ*times the real ∂

_{x}

*v*

_{ y}.

^{6}Of course, our theory is general and does not assume that the axis of rotation is fixed relative to the object as the slant of the axis changes. We only made the assumption that the axis of rotation is parallel to the longitudinal axis for cylinders because the psychophysical data from Fernandez and Farell (2007) were obtained under this condition. The model can also be applied to cylinders in which the longitudinal and the rotation axes are not linked. It will result in a different set of predictions, but at this time there are no psychophysical data to test them.

^{9}Strictly speaking, 〈Ω

_{0}〉 should be computed as the standard weighted population average 〈Ω

_{0}〉 =

_{0}and

*γ*

_{0}as zero.

^{10}An anonymous reviewer noted an interesting parallel between the template model and the model of Hogervorst and Eagle (1998). To quote the reviewer, “When the templates reflect the prior probability of shapes and motions that can occur (which is likely) the activity of the template cells resembles the posterior probability. Moreover, the model by Hogervorst and Eagle (1998) calculates for each combination of shape and motion parameters a probability that depends on the match between the optic flow (the stimulus) and the flow resulting from a 3-D object and motion (similar to a template). In the template model, each combination is covered by a template cell.”

^{11}We are using here Equation 14 instead of Equation 15 because all the templates used for planar surfaces have the same value of the parameters

*σ*

_{0}, and

*τ*

_{0}, so no averaging is needed.

^{12}This linear relationship is only meant as an approximation within the range of speeds tested. It cannot be valid when Ω gets close to zero. Notice that Ω

_{obs}is minimal at about 12 deg/s and

*ζ*= 0.5 deg/s. In addition, there aren't any templates that represent Ω

_{obs}= 0 (similarly, there probably aren't any MST cells that respond to static stimuli). And, of course, at sufficiently low speeds SFM no longer works (neither perceptually nor in the model).

^{13}

*C*

_{sim}in our experiments is a function of the curvature, but it is actually defined as the ratio between the two principal axes: the one that crosses the line of sight and the one that crosses the frontoparallel plane. The later has a constant size, so with this definition the ratio

*C*

_{obs}/

*C*

_{sim}gives the scaling factor in depth.

^{14}The cylinders rotated about their longitudinal axes, which were frontoparallel (because, as explained elsewhere in the article, model results for cylinders are independent of the simulated slant of the axis of rotation). Thus, the elliptical cross-section of the cylinder (perpendicular to the longitudinal axis) is in the horizontal plane. The angle between the major axis of this ellipse and the line of sight is 12.5° regardless of the cylinder's ellipticity.

^{15}As shown in Figure 2a, either a square root function or a linear function describes well the psychophysical data; we chose the square root function for analytical reasons, for the derivations result in simpler equations.

^{16}Segmentation is based on the optic flow. However, the algorithm we used is very rudimentary. For instance, we segment into parts when there is a discontinuity in the first derivative of the optic flow, such as at the edge of an open book. But we are still far from a good understanding on how segmentation works in SFM (or how it works in general, for that matter).