A new theory of structure-from-motion perception

Julian M. Fernandez; Bart Farell

doi:10.1167/9.11.23

Abstract

Humans can recover 3-D structure from the projected 2D motion field of a rotating object, a phenomenon called structure from motion (SFM). Current models of SFM perception are limited to the case in which objects rotate about a frontoparallel axis. However, as our recent psychophysical studies showed, frontoparallel axes of rotation are not representative of the general case. Here we present the first model to address the problem of SFM perception for the general case of rotations around an arbitrary axis. The SFM computation is cast as a two-stage process. The first stage computes the structure perpendicular to the axis of rotation. The second stage corrects for the slant of the axis of rotation. For cylinders, the computed object shape is invariant with respect to the observer's viewpoint (that is, perceived shape doesn't change with a change in the direction of the axis of rotation). The model uses template matching to estimate global parameters such as the angular speed of rotation, which are then used to compute the local depth structure. The model provides quantitative predictions that agree well with current psychophysical data for both frontoparallel and non-frontoparallel rotations.

Introduction

Our current understanding of how motion triggers the perception of three-dimensional structure comes from the data of a special case: Structure from motion (SFM) has been studied almost exclusively from rotations about an axis in the frontoparallel plane. Perceived shape in this condition is usually non-veridical. This has given rise to two main theoretical approaches:

1. SFM from 1 st -order velocities: One theory is that humans recover SFM from the 1st-order velocity field. This implies that shape is recoverable only up to a scaling factor in depth (Norman & Todd, 1993; Todd, 1998; Todd & Norman, 1991; Werkhoven & van Veen, 1995). Hence, accuracy is low for judgments requiring veridical perception of Euclidean metric structure, such as judgments of lengths or angles (Braunstein, Liter, & Tittle, 1993; Cornilleau-Pérès & Droulez, 1989; Eagle & Blake, 1995; Hogervorst, Kappers, & Koenderink, 1993; Liter, Braunstein, & Hoffman, 1993; Norman & Lappin, 1992; Todd & Bressan, 1990). Conversely, accuracy is high for judgments of an object's “affine”¹ structure—the structure up to a scaling factor in depth—such as depth order between pair of points, parallelism between lines defined by pairs of points on a planar surface, and coplanarity among points (Braunstein et al., 1993; Eagle & Blake, 1995; Hogervorst et al., 1993; Liter et al., 1993; Tittle et al., 1995; Todd & Bressan, 1990). Koenderink and van Doorn (1991) developed one of the best-known theoretical approaches to computing affine structure from the first-order optic flow.

2. SFM from 2nd-order optic flow and perspective effects: There is evidence that humans can use second-order optic flow information (that is, accelerations) and perspective information in computing the SFM. Regarding perspective effects, Eagle and Hogervorst (1999) found that shape discrimination thresholds decreased with stimulus size under perspective projection, but under orthographic projection, thresholds increased. Regarding the effects of accelerations, Hogervorst and Eagle (1998, 2000) showed that errors in the estimate of speeds and accelerations could explain some pattern of misperceptions in SFM.

3. Surface slant from def: A more radical view is that of Domini et al. (Domini & Braunstein, 1998; Domini & Caudek, 2003; Domini, Caudek, & Richman, 1998). They propose that the perceived slant of a surface is a function of the 1st-order optic-flow property def and predict that the structure of a shape recovered from SFM will be internally inconsistent. The authors suggest that this prediction is in agreement with numerous experimental results (Domini & Braunstein, 1998; Domini et al., 1998).

There is an alternative that is also in agreement with experimental results and that results in internally consistent shapes. One of the aims of this article is to develop this alternative theory. The second aim is to make the theory general and independent of the axis of rotation.

Previous models

Many models for computing SFM have been proposed. Some of them attempt to describe how humans perceive SFM; others show how SFM can be computed from the available information. Among the latter, the best known are the class of models based on Ullman's theorem (and its variants) that under orthographic projection, three views of four non-coplanar points from a rigid object are sufficient for uniquely determining the 3D structure of the points (up to a mirror reflection) (Ullman, 1979). Ullman later proposed an “incremental rigidity algorithm” for recovering structure from input data over time (Ullman, 1984). While these studies are extremely interesting in their own right, they do not tell us how neurons in the brain could solve the problem. In fact, there is no evidence that visual neurons could explicitly track positions of fine image features over time, as implied by the input representation of these models.

As we previously mentioned, more recent psychophysical data suggest that humans use velocity information instead of positional locations of image features for surface interpolation in SFM tasks (Treue et al., 1995). This result led to a major modification of the incremental rigidity algorithm (Hildreth, Ando, Andersen, & Treue, 1995). One problem with most of the earlier models, however, is that they compute the veridical Euclidean metric structure of the object as the output (Hildreth et al., 1995; Ullman, 1979, 1984), which contradicts the psychophysical data previously mentioned. An exception is the model previously mentioned of Koenderink and van Doorn (1991), which computes affine structure from the first-order optic flow.

Non-frontoparallel rotations

Most of the theoretical approaches mentioned so far are based on data from frontoparallel rotations, and their scope is limited to that particular condition [the exception is the theory of Koenderink and van Doorn (1991), which applies not only to non-frontoparallel rotations, but also to semi-parallel projections and movement in depth, two topics not covered by our theory]. Until recently, few empirical studies looked at non-frontoparallel axes of rotation. A first study, by Loomis and Eby (1988), found that the perceived depth of elongated ellipsoids becomes a smaller fraction of the simulated depth the more the angle of rotation deviates from frontoparallel. A second study, by Liter et al. (1993), reached the same conclusion using 3D “objects” made with five randomly positioned dots.

There are other studies involving rotations about slanted axes, but these did not examine the perception of shape from motion per se, but instead looked at other issues, such as the perceived inclination of the axis of rotation (Domini & Caudek, 1999; Pollick, Nishida, Koike, & Kawato, 1994), or the perception of rigidity (Todd & Bressan, 1990).

More recently, we investigated in greater detail the ability of humans to recover shape from rotations about a slanted axis (Fernandez & Farell, 2007). We found that in this condition the perceived structure is not affine. Moreover, violations of affinity can result in perceived depth-order violations along the line of sight. We found, for cylinders rotating about its longitudinal axis, that as the axis of rotation changes its inclination, shape constancy is maintained through a trade-off: Humans perceive a constant object shape relative to a changing axis of rotation and they do this by introducing an inconsistency between the perceived speed of rotation and the 1st-order optic flow. Shape constancy demands this inconsistency because observers do not perceive the inclination of the axis of rotation veridically. The observed depth-order violations are the cost of the trade-off. Note that objects other than cylinders do not show this kind of shape constancy. As mentioned above, Loomis and Eby (1988) found that the perceived shape of rotating ellipsoids changes with changes in the slant of the axis of rotation.

Thus, structure recovered from rotations around an axis in the frontoparallel plane preserves affine structure. The more general case of structure recovered from rotations around an arbitrary axis does not preserve affinity. But, for cylinders rotating about its longitudinal axis, it does preserve perceived shape relative to the perceived axis of rotation.

Here we show that recovering the depth structure in the general case can be decomposed into two stages. In the first stage the depth structure relative to the axis of rotation is computed: the information about this depth structure is available from the optic flow field in the projected component of the speeds perpendicular to the axis of rotation. This computational problem is mathematically similar to computing the depth structure for frontoparallel rotations. The second stage computes a correction that takes into account the slant of the axis of rotation. This gives the depth structure for an arbitrary axis of rotation slanted with respect to the frontoparallel plane.

Thus, there are currently two major issues that need to be understood and quantified in order to develop a general model of the SFM computation. The first is how to characterize the algorithm for recovering structure from frontoparallel rotations (current theories lack an algorithm that allows one to predict the recovered depth structure even in the special case of a frontoparallel axis of rotation). The approach usually taken previously has been to describe the computation behind the depth-scaling parameter, that is, the ratio between perceived and simulated depth. This approach assumes that the visual system computes affine structure first. Then, in a second step, a scaling factor is “assigned.” This transforms the initially computed affine structure into a metric object. But there are other ways to think about this problem of computing SFM, and in this article we used a different approach.

The second requirement is to characterize the SFM computation for the general case of non-frontoparallel rotations.

Our purpose here is to develop a theory that answers these two basic issues.

The model

The model's structure is conceptually simple, and consists of two parts. One part is a template-matching mechanism that uses the optic-flow field as the input to compute two global parameters: the perceived speed of rotation and the perceived gradient of the speeds perpendicular to the frontoparallel projection of the axis of rotation. The second part consists of an operation that generalizes the structure-from-motion computation from the case of a frontoparallel axis of rotation to the general case of arbitrary non-frontoparallel axes.

Recovering depth structure

Frontoparallel rotations

The depth structure for frontoparallel rotations is linearly related to the velocity field (see, for instance, Fernandez, Watson, & Qian, 2002):

\frac{Δ Z}{Z_{0}} = - \frac{Δ v_{x}}{Ω}

(1)

where Z₀ is the distance to the object, and Ω is the angular speed of rotation. Equation 1, which implicitly assumes rigidity, gives the difference in depth ΔZ (positive towards the observer) between any two points on the object from their differences in horizontal retinal velocity Δv_x. Equation 1 assumes, without loss of generality, a vertical axis of rotation.

If SFM is recovered only from the first-order optic flow, as psychophysical data seems to show (Norman & Todd, 1993; Todd, 1998; Todd & Norman, 1991; Werkhoven & van Veen, 1995), then Ω cannot be recovered veridically.² As a result, the depth structure can be recovered only up to a scaling factor in depth. Current theories lack an algorithm for predicting this scaling factor, which is needed to predict the recovered depth structure in the special case of a frontoparallel axis of rotation. Computing the scaling factor is mathematically equivalent to computing the perceived angular speed of rotation, Ω_obs, which can be used in Equation 1 instead of Ω, the actual speed of rotation. Thus, using Ω in Equation 1 allows recovery of the veridical shape of the object, whereas using Ω_obs allows recovery of the perceived shape.

Note that if Ω _obs is non-veridical, in principle the object should end up looking either non-rigid or as a texture flowing over the surface of an otherwise rigid object. Perceptually, some cases (a minority) give the appearance of a texture flowing over the surface of an otherwise rigid object. But in all cases the objects appear rigid. One reason could be that the object's non-rigidity is below threshold. A big factor in discriminating rigidity and non-rigidity depends on temporal factors—memory—rather than just the magnitude of shape change. Large changes in the object can remain undetected because we do not keep a faithful 3D representation of the object in memory across a sufficiently long time, but only what amounts to an instantaneous one. In a task in which a 3D representation of the object had to be kept in memory, then the non-rigidity would perhaps become “visible.” Data on rigidity-versus-non-rigidity discriminations show that non-rigid objects can be perceived as rigid and vice versa, depending on whether specific optic-flow properties change, or do not change, over time. ³ As an example, Hogervorst, Kappers, and Koenderink (1997) found that non-rigid displays were not discriminable from rigid ones when every set of three consecutive views in the sequence had a rigid solution. They concluded that observers combined only a few views at a time. That is, they used a temporal local description of the flow. Only very large changes in structure were detected over large changes in rotation.

Non-frontoparallel rotations

We recently showed (Fernandez & Farell, 2007) that when the axis of rotation is not in the frontoparallel plane, the 3D structure of the object under orthographic projection must satisfy the following two relations:

\frac{Δ Z}{Z_{0}} = - \frac{Δ v_{x}}{Ω \cos θ} + Δ y \tan θ

(2)

and

Ω = - \frac{\partial_{x} v_{y}}{\sin θ},

(3)

so that

\frac{Δ Z}{Z_{0}} = (\frac{Δ v_{x}}{\partial_{x} v_{y}} + Δ y) \tan θ,

(4)

where θ is the actual of inclination of the axis of rotation (0° < θ < 90°) from the frontoparallel plane, and ∂_xv_y is the derivative with respect to x (horizontal direction) of the vertical component of the retinal velocity, v_y. Note that ∂_xv_y is constant across the image and thus independent of depth (see Fernandez & Farell, 2007, for a demonstration). Equation 4, which implicitly assumes rigidity, gives the difference in depth ΔZ (positive towards the observer) between any two points on the object from their differences in horizontal retinal velocity Δv_x and angular vertical position Δy. It is assumed here that the axis of rotation lies within the sagittal plane passing through the eye. This choice does not represent a loss of generality, but is a direct consequence of assuming that the axis of rotation projects into a vertical line in the frontoparallel plane (see Equations 1 and 4). If the projection of the axis of rotation onto the frontoparallel plane were not vertical, Equations 1 and 4 would still be similar, but v_x would be replaced by the speeds perpendicular to the frontoparallel projection of the axis of rotation, and v_y by the speeds parallel to this projection (also, Δy would be measured in the direction parallel to the projection of the axis).

If SFM for non-frontoparallel rotations were recovered only from the first-order optic flow, then θ could not be recovered veridically (this is in agreement with psychophysical data, see for instance Fernandez & Farell, 2007). Then depth structure could be recovered only up to a scaling factor in depth by using the perceived angle of inclination, θ_obs, in Equation 4.

However, we showed (Fernandez & Farell, 2007) that the perceived depth structure does not follow Equation 4, but rather satisfies the following relation:

\frac{Δ Z}{Z_{0}} = (λ \frac{Δ v_{x}}{\partial_{x} v_{y}} + Δ y) \tan θ_{o b s},

(5)

where the factor λ accounts for the departure of the perceived angular speed of rotation, Ω_obs, from optic-flow consistency.⁴

To retrace the picture we've sketched so far, there is, first of all, the speed of rotation in the stimulus, Ω, that satisfies Equation 3. If the axis slant, θ, is perceived non-veridically as θ _obs, it is still possible to use Equation 3 by substituting θ _obs for θ. The result is a non-veridical perceived speed of rotation that is consistent with the first-order optic flow. This is equivalent to recovering the shape from Equation 5 using λ = 1. Having λ ≠ 1, in agreement with psychophysical data, implies that Ω _obs, the perceived speed of rotation, is not consistent with the first-order optic flow.

As shown in Figure 1, the geometrical effect of λ is to multiply all distances relative to the axis of rotation rather than to the frontoparallel plane. The direction of this scaling is perpendicular to the frontoparallel plane, so that the retinal image will remain unchanged. In this sense, the recovered structure of the object is not affine. Let us characterize this recovered structure as λ-affine.

Figure 1

View Original Download Slide

Non-affine structure. a) Schema showing experimental stimulus. b) Effect of changing λ for a constant inclination of the axis of rotation. Side view (sagittal cut) from the object shown in (a). Changing λ result in a depth order reversal of the bottom half, as all the distances to the axis of rotation (dotted line) double. c) Same as (b), but this time showing the difference between the perceived and simulated object. The shape of the perceived object can be made independent of the inclination of the axis of rotation by using an adequate non-unity value of λ, leading also to depth-order violations, as seen here. By contrast, a value of λ = 1 gives an affine structure, preserving depth order but not the values of d ₁, d ₂ and α.

λ can be viewed as a correction factor whose purpose is to allow the observer to perceive the stimulus shape as constant as the axis of rotation changes. In such a case λ could in principle be computed from the optic flow (in Fernandez & Farell, 2007, we showed that this is possible). However, this way of thinking about λ has the disadvantage of imposing an additional computational burden on the visual system. Alternatively, λ can be viewed as a bias intrinsic to the observer rather than being computable from the input. We will follow this latter interpretation here. In addition, we propose that λ is a constant.

A constant value for λ results in additional constraints. For instance, for cylinders, we shown in 1 that λ can be a constant only if sin θ _obs ∝ sin θ _sim (the constraint comes from the psychophysical fact that, for cylinders rotating about its longitudinal axis, the perceived stimulus shape does not changes with changes in the axis of rotation). Now, by definition, we have (see 1):

\sin θ_{o b s} = \frac{- \partial_{x} v_{y}}{λ Ω_{o b s}}

(6)

The analogy between Equations 3 and 6 suggests a more precise interpretation of 1/ λ as the proportionality factor between perceived and simulated gradient. We will call the perceived gradient ∇ _obs (=∂ _x v _y/ λ). We can rewrite Equation 5 in a format that explicitly reduces to Equation 1 (i.e., the frontoparallel case) when θ = 0 (and thus, ∇ _obs = 0). Using Equations 5 and 6 we get, after some work:

\frac{Δ Z}{Z_{0}} = - \frac{Δ v_{x}}{\sqrt{Ω_{o b s}^{2} - \nabla_{o b s}^{2}}} + \frac{Δ y}{\sqrt{{(Ω_{o b s} / \nabla_{o b s})}^{2} - 1}}

(7)

Equation 7 shows the two stages mentioned in Introduction. The first term in the right side of Equation 7 represents the frontoparallel structure (up to a scaling factor in depth), or equivalently, the structure relative to the axis of rotation. The second term is a correction that takes into account the slant of the axis of rotation, becoming zero for frontoparallel rotations.

Equation 7 allows us to compute the relative depth structure of an arbitrary rotating object from the two parameters Ω _obs and ∇ _obs. These parameters can be obtained from template matching, as described below in The template-matching model.

The template-matching model

Template matching plays a fundamental role in our SFM model. Template matching implements what we hypothesize is the contribution of area MST to the SFM computation (a contribution that is made by only a subset of MST cells). Many MST cells are tuned to optic-flow patterns that are useful for heading perception, and successful template-matching models of cells with these tuning properties have been developed (Perrone & Stone, 1998). Alternative models of MST use subsets of neurons whose properties are better for modeling the perception of biological motion (Giese & Poggio, 2003). Overall, the evidence suggests that MST contains subsets of cells tuned to different global patterns of motion, each of which is best suited to perform one of various classification or recognition tasks, e.g., determining heading direction, recognizing biological motion, classifying shape from motion. Our model MST cells have tuning properties that include these cases.

For illustrative purposes, let us first consider the case where all velocities are parallel; this happens, for instance, for frontoparallel rotations. To define the tuning properties of our MST model cells, we define for each cell the matching variable I:

I = \int \int [f (x, y) - g (x, y)]^{2} d x d y

(8)

I is a measure of the difference between the input velocity pattern, f( x, y), and the template velocity pattern, g( x, y), where ( x, y) gives the spatial retinal coordinates in the image plane. I takes its minimum value when both template and input patterns are the same. The response, r, of the model MST cell is then defined as

r = e^{- β I}

(9)

where β is a constant.

Thus, r is a maximal response when the template and input velocity patterns are the same. Equation 8 is valid when the functions f and g are scalar quantities (which is the case for frontoparallel rotations, where all velocities are parallel). In the more general case of non-frontoparallel rotations, f and g are vectors representing the velocity at any point in the retinal image. These velocities could have any direction. To generalize Equation 8 to vector variables we modify it in the following way.

\begin{matrix} I & = & \int \int [&frarr; (x, y) \cdot \frac{&grarr; (x, y)}{| &grarr; (x, y) |} - | &grarr; (x, y) |]^{2} d x d y \\ = & \int \int [f (x, y) - g (x, y)]^{2} d x d y \end{matrix}

(10)

where the dot represents the scalar product, and the arrows indicate that the variables are vectors. The second equality represents a simplified notation of the first, where:

f (x, y) = &frarr; (x, y) \cdot \frac{&grarr; (x, y)}{| &grarr; (x, y) |} a n d g (x, y) = | &grarr; (x, y) |,

(11)

thus f and g are scalars, with f being the component of the input speed in the direction of the speed in the template, and g being the speed (modulus) in the template, regardless of direction. The modification has the following effect: rather than comparing the whole 2D speed input pattern to the template, it only compares, at each retinal position, the component of the speed in the input which is parallel to that in the template.

To make the problem easier to analyze and keep the number of templates low, we will only use templates that have speeds in the horizontal direction (this, of course, does not imply a loss of generality; the treatment that follow could also include templates representing slanted objects). This restriction will ensure that only the horizontal components of the retinal speeds are matched. For cylinders only, this implies that the same template will match a given rotating cylinder regardless of the slant of the axis of rotation. This works because when the axis of rotation changes its slant, only the vertical components of the retinal speed change. For objects other than cylinders, ⁵ the velocity field is not independent of the axis slant. ⁶ As a consequence of this, perceived shape will change if the axis slant changes. This is consistent with the psychophysical data of Loomis and Eby (1988) on rotating ellipsoids.

Are such templates physiologically plausible? Most models of MST tuning use a tiled input from MT-like neurons, each tuned to a particular speed and direction. The templates built in this way will care about the local input velocity, not just the component of the speed in one direction. To build our MST template cell, we need to make use of complex V1-like cells. Because they suffer from the aperture problem, these cells are not selective for stimulus velocity; rather, they are selective only for the component of velocity orthogonal to their preferred spatial orientation (Simoncelli & Heeger, 1998). Physiologically, it is plausible that some MST cells use complex-cell V1 input, either received directly or inherited via area MT. Though this remains speculative at this point, it delivers a considerable computational advantage for problems like SFM.

Template matching for the angular speed of rotation

Let us first analyze the simpler case of frontoparallel rotations. It is not difficult to show that at each location ( x, y) the optic-flow pattern from an object rotating about a frontoparallel axis can be written as the product of a shape factor and the angular speed of rotation, Ω (see 1). Thus f( x, y) = ΩF( x, y), where F is a function that depends only on the object's shape (as expressed either by the stimulus or the template). This is a function not only of position ( x, y), but also of parameters that define the shape (we will keep the treatment general at this point, leaving for later the specification of the template shapes used in our simulations). For instance, a rotating elliptical cylinder will have three parameters, S, C and φ, where S represents the size, C represents the ellipticity, and φ the angle between the major axis and the line of sight. The specific set of functions used in making the templates is described later in the subsection A minimalist template set.

If the axis of rotation is slanted, the previous paragraph still holds true for cylinders, but it is no longer valid for other shapes. In the general case, v _x is still proportional to Ω, but instead of a shape factor we have a more complex expression that is also a function of the axis slant (see 1).

Let us represent by α and γ ₀ the sets of parameters representing the shapes of the stimulus and the template, respectively, and Ω and Ω ₀ the angular speeds of stimulus and template, respectively. It is easy to show (replacing f( x, y) = ΩF( x, y, α) and g( x, y) = Ω ₀G( x, y, γ ₀) in Equation 8) ⁷ that the matching variable I, defined in Equation 8, becomes (remember that this is only for frontoparallel rotations):

I = A_{G} (γ_{0}) [Ω_{0} - Ω \frac{B (α, γ_{0})}{A_{G} (γ_{0})}]^{2} + Ω^{2} [A_{F} (α) - \frac{B^{2} (α, γ_{0})}{A_{G} (γ_{0})}]

(12)

where

\begin{matrix} A_{F} (α) & = & \int \int F^{2} (x, y, α) d x d y; \\ A_{G} (γ_{0}) & = & \int \int G^{2} (x, y, γ_{0}) d x d y; \\ B (α, γ_{0}) & = & \int \int F (x, y, α) G (x, y, γ_{0}) d x d y \end{matrix}

(13)

If we have a population of neurons representing different flow templates (that is, having different values of Ω ₀ and γ ₀), we can see from Equations 9 and 12 that, for a fixed γ ₀, the population response will have a bell-shaped curve. The peak of this curve (which is always a maximum because A _G is always positive) is a function of the stimulus angular speed (Ω) and shape (represented by the set of parameters α). Because the population of cells contains cells tuned to various values of γ ₀, we will have many cells that respond maximally. The tuning parameter Ω ₀ of the maximally responding cell (for a given γ ₀) will be:

Ω_{0} = Ω \frac{B (α, γ_{0})}{A_{G} (γ_{0})}

(14)

Thus, cells will respond maximally along a “curve” ⁸ in parameter space (Ω ₀, γ ₀). The average value of Ω ₀ obtained from the population response will then be:

〈 Ω_{0} 〉 = Ω 〈 \frac{B (α, γ_{0})}{A_{G} (γ_{0})} 〉

(15)

We hypothesize that the relationship between Ω ₀ and the perceived angular speed of rotation, Ω _obs, is simple and direct. For instance, if Ω _obs were proportional to 〈Ω ₀〉, then Equation 15 tells us that the perceived rotational speed will be proportional to the simulated rotational speed. ⁹ The constant of proportionality will be different for different shapes and will depend on the particular family of templates used. This is a highly desirably property that is very difficult, if not impossible, to attain with simpler models that treat Ω _obs as some function of def or other optic-flow properties that are not shape invariant.

Generalization to slanted axes of rotation. Having started with the more intuitively accessible case of a frontoparallel axis of rotation, we are now in a position to generalize to the case where the axis of rotation is slanted. Equations 12 and 15 can be generalized to slanted axes, because the optic flow can be decomposed into a product of Ω and a “shape” factor. For slanted axes, the “shape” factor is a function of the shape and the slant of the axis of rotation (see 30). That means that the previous derivations are not restricted to frontoparallel rotations but they are also valid for slanted axes of rotation. Note also that for non-frontoparallel rotations, for shapes other than cylinders, the constant of proportionality in Equation 15 will be a function of the axis slant—because B( α, γ ₀) is a function of the axis slant—but at any slant Ω _obs will still be proportional to Ω.

The perceived scaling factor in depth is just the ratio between the simulated and perceived Ω s. Equation 15 shows that, if Ω _obs is proportional to 〈Ω ₀〉, this ratio will be constant for a given shape and axis slant, and, in general, different for different shapes and different axis slants (although remember that for cylinders it will not change with the axis slant). This is in agreement with psychophysical data, as will be shown in Results.

Template matching for gradients

As Equation 7 shows, to compute the relative depth structure of an arbitrary rotating object we need two parameters, Ω _obs (the perceived speed of rotation) and ∇ _obs (the perceived gradient). We showed in the previous section how Ω _obs could be obtained by template matching. Now we will show how to obtain ∇ _obs from template matching.

Our hypothesis is that the gradient ∂ _x v _y is computed by template matching. Computing a gradient for rotations using templates is actually much simpler than computing Ω _obs. This is because for rotations ∂ _x v _y is constant across the image, so for any rotating object we have v _y = x∂ _x v _y. We will abbreviate ∂ _x v _y as ∇, so v _y = x∇.

As before, we assume that our templates will only match the component input velocity that is parallel to that in the template. This direction will be the same across the entire template. More explicitly, we assume that the templates are described by g( x, y) = ∇ ₀G( x) where ∇ ₀ is a constant and G( x) is an arbitrary function of x. We assume also that the family of templates includes all motion directions, which are obtained by rotating the template just defined.

Because the templates only care about the speeds in their preferred directions, we can use as the input only the speed component of the input parallel to that in the template. For an object rotating about an axis whose projection onto the frontoparallel plane is vertical, we can write f( x, y) = x∇. Note again that the gradient ∇, and thus the input, is independent of the object's shape (see Equation 3).

By analogy to Equation 14, the tuning parameter ∇ ₀ of the maximally responding cell will be:

\nabla_{0} = \nabla \frac{B}{A_{G}}

(16)

where (see Equation 13):

\begin{matrix} A_{G} & = & \int \int G (x)^{2} d x d y; a n d \\ B & = & \int \int x G (x) d x d y \end{matrix}

(17)

It is clear that the gradient “perceived” by the templates (i.e., ∇ ₀) will be proportional to the simulated gradient, regardless of the stimulus shape and regardless of the particular family of templates, i.e., the specific form of G( x). As an example, let us assume G( x) = ηx, where η is a constant. Then, it follows that ∇ ₀ = ∇/ η. This implies, by the definition of λ, that η = λ.

A minimalist template set

It is important to realize that template matching can be used to model psychophysical data to any degree of accuracy. This can be done just by using as many templates as stimuli one wants to model. For each stimulus F( x, y, α) one could use Equation 15 to obtain B( α, γ ₀) and then Equation 13 to find the corresponding template G( x, y, γ ₀). It is more interesting and productive to take the opposite approach and ask: How well can we explain the available psychophysical data by keeping the number of templates and their complexity to a minimum?

In what follows, we assume that there are only two families of templates: the first one for detecting velocity fields in which curvature is absent (e.g., planar surfaces), and the second one for detecting objects in which curvature is present (e.g., cylinders). These can be considered as a set of first- and second-order filters. We assume that a stimulus generates a population response in which each filter contributes a value for its preferred parameter (either Ω ₀ or ∇ ₀) to the computation of the output parameter. The computation is a weighted average in which each filter parameter has a weight proportional to the intensity of its filter's response.

Two sets of templates, one for planar surfaces and one for cylinders, were used to obtain the data shown in Results. Remember that a template is the product of a shape factor (or function) and an angular speed of rotation, Ω ₀.

Planar Surfaces: The first set of templates consists of the product of one specific planar surface (that is, one having a fixed slant and tilt) with various angular speeds of rotation. Thus, a specific template within this family is identified by just one parameter, the angular speed of rotation (see Methods).

Cylinders: The second family of templates consists of the product of an elliptical cylinder rotating about its longitudinal axis, with various angular speeds. The elliptical cylinders could have different ellipticities, but are all similarly oriented, that is, the major axis of each ellipse makes the same fixed angle γ ₀ with the line of sight. A specific template within this second family is identified by two parameters: the ellipticity and the angular speed of rotation (see Methods). In addition, we scaled the stimuli to a standard size that matches that of the templates. An alternative but equivalent option would have been to assume that templates come in families with size as an additional parameter, such that the cylinder templates and the stimuli match in size.

This limited set of templates is, of course, an oversimplification. If the visual system uses templates, it would likely possess a far larger variety that our two sets—for instance, planar templates having various tilts and slants rather than a single tilt and a single slant; and cylinders with the major axis of the ellipse forming various angles to the line of sight. However, our goal here is to show how well our minimalist set of templates reproduces psychophysical data in spite of its simplicity. This will demonstrate the explanatory power of the template matching approach. ¹⁰

Templates and their relationship to perceived biases

An incomplete set of templates is one source of the biases predicted by the model. We believe it is unrealistic to assume that the visual system possesses a complete set, a set than spanned all object forms, orientations, sizes, rotational speeds, tilts and slant of the axis of rotation, and so forth. The number of template cells would be problematic. Moreover, this number would assure that observers would not have previously experienced many of the corresponding stimuli, raising the question of the origins of templates representing them. In any case, our intention is not to equate the templates we employ in the model with those that plausibly exist in the human brain. Instead, we want to show that our model's approach and architecture can work, and can work without cherry-picking the templates, or assuming that all possible templates are available.

There are other ways aside from an incomplete set of templates for perceptual biases to arise. For instance, even if all possible templates were available, a bias would result from an imperfect mapping between the template's tuning parameter and the circuitry used to compute the depth structure.

Results

The model makes a number of predictions about the perception of rotating objects. We evaluate these predictions below under four headings: (1) perceived speed of rotation, (2) perceived slant of planar surfaces, (3) perceived curvature of elliptical cylinders and (4) perceived slant of the axis of rotation. The selected set of psychophysical data that we choose to model constitutes, to the best of our understanding, a representative sample of the available quantitative psychophysical data on rotating objects.

Perceived speed of rotation

We assume that the set of templates for planar surfaces specifies a planar surface with fixed slant, σ ₀, and tilt, τ ₀, and various rotational speeds. Then Equation 14, which gives the speed tuning parameter Ω ₀ of the maximally responding cell ¹¹ becomes (see 1):

Ω_{0} = Ω \frac{σ}{σ_{0}} h (τ, τ_{0}) \propto d e f

(18)

where Ω, σ and τ are the speed of rotation, slant and tilt of the stimulus, respectively; and h is a function of τ and τ ₀. The rightmost part of Equation 18 is a consequence of the fact that def = Ω σ for planar surfaces. If the perceived speed of rotation, Ω _obs, is proportional to Ω ₀ then, by Equation 18, Ω _obs is proportional to def. This agrees with the psychophysical data of Domini, Caudek, and Proffitt (1997) (see their Figure 7).

However, in subsequent experiments Domini and Caudek (1999) found that Ω_obs is proportional to def^1/2 (see their Figure 15, reproduced here in Figure 2b). It is unclear why the result differed in the two cases, although, due to error uncertainties, either set of data could probably be fitted by a linear function, a square root function, or any function in between.

Figure 2

View Original Download Slide

a) The perceived slant of a planar surface, σ_obs, as a function of def, from Figure 15 of Domini and Caudek (1999). (Reproduced with permission). Notice that the vertical axis has two labels. b) Model predictions for the conditions used in Domini and Caudek (1999). For the range of values of def tested, both Equations 19 and 20 provide a reasonable fit to the data. Same results (not shown) are obtained for perceived angular speed of rotation.

Let us now assume that the latter experiments where Ω _obs is proportional to def ^1/2 are correct. If our model is going to predict this, then we need to assume that the relationship between Ω _obs and Ω ₀ is not linear; rather, we will assume that Ω _obs is proportional to Ω ₀ ^1/2. One reason why the visual system might prefer a non-linear relationship between the cell's tuning parameter Ω ₀ and Ω _obs is a trade-off: for a linear relationship the perceived slant, σ _obs, would be independent of the surface slant, σ (see next section). A non-linearity overcomes this problem. The non-linearity Ω _obs ∝ Ω ₀ ^1/2 results in Ω _obs ∝ def ^1/2, which is consistent with the results of Domini and Caudek (1999), as shown in Figure 2a. Note that assuming the above nonlinearity results in specific model predictions for the behavior of σ_obs. As shown in the next section, σ_obs is now predicted to behave as σ_obs ∝ σ^1/2, which is also consistent with psychophysical data (Domini & Caudek, 1999).

Perceived slant of planar surfaces

The perceived slant of a planar surface, σ _obs, is a function of its perceived rotational speed, Ω _obs. They are related, by definition, through the relationship: def = Ω σ = Ω _obs σ _obs (see 1). This expression is valid for frontoparallel rotations only (as far as we know). Thus, the model predictions that follow in this section are restricted to frontoparallel rotations. This is enough for our purposes because the data from Domini and Caudek (1999) that we are going to model was also restricted in the same way.

It is not difficult to show using σ _obs = def/Ω _obs and Equation 18 that, if Ω _obs is proportional to Ω ₀ (i.e., Ω _obs = kΩ ₀), then σ _obs = σ ₀/[ k h( τ, τ ₀)]. Thus, σ _obs is independent of both the simulated slant, σ, and def. This disagrees with psychophysical data (see Figure 15 of Domini & Caudek, 1999, reproduced here in Figure 2a). There are two ways to modify the model to bring it in line with the psychophysical data. One way is to assume that our choice of templates was inadequate and to fix the problem by adding more templates. The idea would be to build a set of templates that results in a relationship that parallels that in Equation 18 but with a fundamental difference: now Ω₀ is proportional to def^1/2 rather than to def. In this case, σ_obs will not longer be a constant. It is not difficult to introduce templates that accomplish this. A second way to modify the model, which is the approach adopted here, is to keep the templates as they are and change the function that relates Ω_obs to Ω₀. Until now we assumed that these variables were proportional to each other, but this was just a simplifying assumption that doesn't necessarily represent the best choice.

Let us instead assume, as we did at the end of the last section, that for planar surfaces Ω _obs ∝ Ω ₀ ^1/2 or Ω _obs = kΩ ₀ ^1/2, where k is a constant. This results in (again, using Equation 18 and σ _obs = def/Ω _obs):

σ_{o b s} = \frac{\sqrt{σ_{0}}}{k \sqrt{h (τ, τ_{0})}} \sqrt{d e f}

(19)

which is in excellent agreement with the psychophysical data of Domini and Caudek (1999) (see their Figure 15). Another, simpler, option is to assume a linear relationship: Ω_obs = kΩ₀ + ζ, where both k and ζ are constants.¹² This results in:

σ_{o b s} = \frac{1}{\frac{k h (τ, τ_{0})}{σ_{0}} + \frac{ζ}{d e f}}

(20)

As shown in Figure 2b, within the range of values tested the two options given by Equations 19 and 20 result in a good fit to the psychophysical data of Domini and Caudek (1999), shown in Figure 2a.

Domini and Caudek (1999) also found that the perceived slant, σ_obs, varies with the simulated tilt, τ, of the planar surface (see their Figure 18, shown here in Figure 3a). Their heuristic is unable to explain this dependence but our model predicts it (see Equation 19 or 20). Figure 3b shows the predicted dependence of σ_obs on τ for Equation 19 (Equation 20 gives similar results). Figure 3b shows two curves, one for each of the two values of def used in Figure 18 of Domini and Caudek (1999). All templates are planar surfaces with τ₀ = 0 and σ₀ = tan(75°). The match of predictions to the data is quite good, considering the simplifying assumptions used.

Figure 3

View Original Download Slide

a) The perceived slant of a planar surface, σ_obs, as a function of the surface's simulated tilt, τ. Each curve corresponds to a different value of def, from Figure 18 of Domini and Caudek (1999). (Reproduced with permission). b) Model predictions of perceived slant, σ_obs, as a function of tilt, τ, for the two simulated values of def used in (a).

Perceived curvature of elliptical cylinders

Our implementation of the model includes a set of templates of elliptical cylinders of various ellipticities and angular speeds, all rotating about their longitudinal axis and oriented with the major axis of the ellipse making the same angle to the line of sight. For this case, Equation 15, which gives the averaged tuning parameter 〈Ω ₀〉, does not have an analytical solution and has to be solved numerically (see Notes on computational methods).

Figure 4a shows our recent psychophysical data (Fernandez & Farell, 2007) obtained from cylinders of different simulated ellipticities rotating about their longitudinal axes. The ratio C_obs/C_sim (where C_sim and C_obs are the simulated and the perceived curvature, respectively) is the depth-scaling factor, which was found to be a function of the stimulus' shape (i.e., of C_sim).¹³

Figure 4

View Original Download Slide

Perceived depth-scaling factor as a function of simulated curvature. a) Psychophysical data averaged across 4 subjects and across inclinations (the shape of the curve does not change with inclination). Redrawn from Fernandez and Farell (2007). b) Predicted data from our template-matching model, assuming that the perceived angular speed of rotation is proportional to the tuning parameter Ω₀.

We showed in the previous section that, for planar surfaces, our model predictions agree well with the model of Domini and Caudek (1999). This model states that the scaling factor is a decreasing function of def. Domini and Caudek's model works well for predicting the perceived slant of planes, but fails to explain the data for cylinders shown in Figure 4a. In such a case Domini and Caudek's model predicts that the scaling factor should decrease with C_sim; this is contrary to psychophysical results (Fernandez & Farell, 2007), which show an increase.

Figure 4b shows the template model predictions. The stimuli we used here consisted of cylinders of different ellipticities with their major axis making an angle of 12.5° with the line of sight, which corresponds to the average orientation of the rotating cylinders used to obtain the psychophysical data of Figure 4a. ¹⁴ Also assumed is a linear relationship between Ω _obs and Ω ₀ (see Notes on computational methods). The quantitative match between data and model is very good, especially considering the simplifications made in choosing the family of templates.

Note that for the planar templates we assumed a square root relationship between Ω _obs and Ω ₀. For cylinders, instead, we assumed a linear relationship. The two assumptions do not contradict each other, because the relationship can be different for different sets of templates: Different sets of templates are independent of each other. In the physiological implementation of the model that we will present in a following paper it will become apparent why this can be so. It just depends on how template wires its feedback connections to the lower areas, so the only constraints here are psychophysics and simplicity.

Perceived slant of the axis of rotation

Caudek and Domini (1998) measured the perceived slant of the axis of rotation. They found that, for planar surfaces, the slant of the perceived axis is a function of def (see their Figure 8, reproduced here in Figure 5a). Equation 6 gives the perceived slant of the axis of rotation as predicted by our model. In order to compare our predictions with the data of Caudek and Domini's (1998), we will define the slant as σ_axis = 90 − θ_obs (making σ_axis an angle, rather than the tangent of an angle, as it was in the previous sections). Then, Equation 6 becomes:

\cos σ_{a x i s} = \frac{- \partial_{x} v_{y}}{λ Ω_{o b s}}

(21)

Figure 5

View Original Download Slide

a) The perceived slant of the axis of rotation, σ_axis (measured from the line of sight), from Figure 8 of Caudek and Domini (1998) (reproduced with permission). The stimuli are planar surfaces, and perceived slant is plotted as a function of def, for the three values of ρ. b) Model predictions of the data shown in (a).

Caudek and Domini's (1998) used planar surfaces in which the axis of rotation is contained in the planar surface, so the normal to the surface was perpendicular to the axis of rotation. In such a case, we can extend the result Ω_obs = kΩ₀^1/2 (valid for the frontoparallel rotations) to the case in which the axis of rotation is slanted. Then, it is easy to show that (see 1):

\cos σ_{a x i s} = \frac{ρ}{λ C \sqrt{d e f}}

(22)

where ρ (used as a parameter in Caudek & Domini, 1998) is the component of the speed of rotation along the line of sight (the z-component of Ω, which is assumed not to be observable), and C is a constant given that the stimulus tilt, τ, is constant (C = k[h(τ, τ₀)/σ₀]^1/2, see Equation 18).¹⁵

Figure 5b shows the Equation 22 predictions for the values of def and ρ used in Figure 8 of Caudek and Domini (1998), shown here as Figure 5a. The three curves were fitted using the same value of λC. There is good agreement between the psychophysical data and model predictions.

The model also predicts data for rotating cylinders. We have seen that, for cylinders, a good approximation is Ω _obs = cΩ ₀ = kΩ _sim (see Equation 15). Using Equations 3 and 6 now becomes:

\sin θ_{o b s} = \frac{\sin θ_{s i m}}{λ k}

(23)

where θ _sim is the simulated slant of the axis of rotation measured as a deviation from vertical. Figure 6 shows the predictions of Equation 23 together with the psychophysical data for circular cylinders obtained from three observers (data replotted from Fernandez & Farell, 2007). Each observer was fitted to a different value of λk. Again, we find a reasonably good match between psychophysical data and model predictions.

Figure 6

View Original Download Slide

Model predictions (lines) from Equation 23, for circular cylinders. Predictions were fitted to the data of three subjects (triangles, circles and squares) by a least squares method. Psychophysical data were redrawn from Fernandez and Farell (2007).

We mentioned before that, for cylinders, the template matching model is consistent with the hypothesis that λ = constant. only if the condition sin θ _obs ∝ sin θ _sim is satisfied. Equation 23 satisfies this condition, showing that the template matching model is consistent with the hypothesis that λ = constant.

Discussion

Segmentation in SFM

In the Domini et al. (Domini & Braunstein, 1998; Domini & Caudek, 2003; Domini et al., 1998) heuristic perceived slant for planar surfaces is a function of def. There are two ways to extend this to curved surfaces. For curved surfaces, def is a function of retinal position (x, y), and changes from point to point. One extension—the continuous or differential version—is to assume that both the perceived slant and the perceived angular speed of rotation, Ω_obs, change locally from point to point as a function of local def. This is extremely important because the depth-scaling factor is proportional to the inverse of this angular speed. Because Ω_obs is computed locally, the structure of the recovered object is internally consistent but not affine. That is, the recovered object is not related to the simulated object by a simple scaling factor in depth. Rather, different scaling factors are found at different points on the recovered object. Note also that because the rotation rate is allowed to change from position to position, the object should be perceived as nonrigid.

A second extension—the discrete or non-differential version—assumes that the angular speed of rotation is computed as an average across differentiable surfaces, rather than locally. The logic here is that some objects might be perceptually segmented, with each patch possessing a different angular speed. Regardless of how the speed of rotation of a given surface patch is estimated, it is clear that the recovered structure will be affine for each patch, and each patch will be perceived as moving with its own angular speed, making the object non-rigid in the general case.

In Fernandez and Farell (2007), we showed that our psychophysical data for rotating cylinders rule out both extensions of the Domini et al. (Domini & Braunstein, 1998; Domini & Caudek, 2003; Domini et al., 1998) heuristic to curved surfaces. The data are inconsistent with perceived slant of such surfaces being computed (either locally or as an average over segmented patches) as a function of def. The template-matching model that we developed overcomes this problem by providing a unified framework for both planar and curved surfaces. For planar surfaces the model predicts the dependence of perceived slant and Ω_obs on def, and for cylinders it reproduces the variation of perceived versus simulated curvature described in Fernandez and Farell (2007).

The model developed in this article corresponds to the discrete version. Here, before being fed into the model, the image is segmented by an algorithm based on segmentation psychophysics. ¹⁶ The depth structure of each segmented part is then computed independently of all others, using a single computed value of Ω _obs for each patch (though Ω _obs can differ across patches).

Support for the discrete version comes from the work of Di Luca, Domini, and Caudek (2004). They found that linking spatially separated patches into global entities affects the perception of local surface orientation. This result supports the discrete version over the continuous version of our model, for the latter hold that perceived surface orientation depends only on the local, rather than the global, optic-flow field.

Frontoparallel rotations: Current theories

As mentioned in the Introduction, a number of studies show that, for frontoparallel rotations, there is an affine relationship between the simulated and the perceived shape in SFM. The theory developed in this article agrees with this result and adds quantitative predictions lacking from previous theoretical treatments. We also mentioned in the Introduction the alternate conclusions reached in some studies (Domini & Braunstein, 1998; Domini & Caudek, 2003; Domini et al., 1998). These studies suggested that the perceived slant of a planar surface is a monotonically increasing sublinear function of def. As a result, the perceived depth separation between two points with a fixed physical depth separation on a planar surface would be a decreasing function of def (see 1).

This claim generates a radical prediction: The recovered object's depth structure will be internally inconsistent. That is, the integral of the signed depths across a closed path will not sum to zero. Also predicted are distortions of depth-order relations and of parallelism, so that the recovered structure will be neither affine nor Euclidean. In a series of experiments, Domini and Braunstein (1998) and Domini et al. (1998) [referred together as Domini et al. in what follows] presented evidence to support these predictions. In a previous article (Fernandez & Farell, 2007) we showed that all the experiments preformed by Domini et al. could be reinterpreted under the hypothesis that in SFM the angular velocity is misperceived as varying between surfaces having a common axis of rotation. In that article we erroneously claimed that Domini et al. model assumed a single angular speed of rotation for the whole object. This mistake was motivated by the fact that Domini et al. assumed that the edge of an “open book” (two intersecting planar surfaces) will be perceived as just that, an edge. However, a difference in the perceived speed of rotation between surfaces predicts that, when the open-book edge is not contained in the same frontoparallel plane as the axis of rotation, then a gap in depth ought to be perceived between book's ‘pages’ where they are simulated to intersect. Domini et al.'s prediction of an inconsistent depth structure resulted from neglecting this gap. As we show in the 1, the assumption that perceived slant is a function of def results in an internally consistent perceived structure once we take the gap into account. All the psychophysical data used by Domini et al. as a support for their internal-inconsistency model are also easily reinterpreted in terms of an internally consistent perceived depth structure.

There is an additional, related, issue that we need to address here. Orthographic projection is inherently ambiguous in an important but often neglected respect: the optic flow of a pure rotation is indistinguishable from that of a translation plus a rotation. Any pure rotational motion can be decomposed into a translation plus a rotation. This is so because we can subtract any instantaneous speed, identified as an overall translation, from the entire velocity field, and then interpret the modified velocity field that remains as a pure rotation. Each combination corresponds to a different depth structure. Suppose, for instance, that we have an open book that rotates around a frontoparallel axis. Suppose further that the axis of rotation is parallel to the edge at which the two planar surfaces meet, but the edge and the axis of rotation are not in the same frontoparallel plane. The edge, then, moves in a circular path around the actual axis of rotation. If we subtract the instantaneous speed of the edge, the resulting optic-flow field is consistent with an open book rotating around the edge. If the visual system decomposes the velocity field in this way, then our modification of the Domini et al. heuristic predicts that the edge would be perceived as such, rather than as a gap in depth. Thus, a pure-rotation interpretation of the optic-flow field of an open book predicts that a gap must be perceived at the edge, but an alternative rotation-plus-translation interpretation can always be found such that no gap is perceived.

We performed various informal experiments (not shown) whose results are consistent with the hypothesis that the visual system actually subtracts a constant speed to the optic-flow field before processing SFM. In one of them, for instance, two rigidly rotating planar surfaces simulated as having a gap in depth are perceived as joining at an edge. This happens if the motion of the surfaces is such that the retinal speeds of the two surfaces are the same at the point where the projections of the surfaces meet. Thus, even if an actual gap was explicitly simulated in this case, we do not see it. (There is an alternative interpretation. One can assume—in principle—that both surfaces are moving with the same angular speed, even if this is not the case across time—in our simulations with gap we used two surfaces moving at different angular speeds. If the perceptual system were doing that, the stimulus would be consistent with a no-gap interpretation. But we assumed, in agreement with psychophysical data, that the visual system perceived the two planar surfaces as having different angular speeds. The visual system perceives different angular speeds for a dihedral angle even when the two surfaces are simulated as having the same angular speed—perceptually the open book either opens or closes at it moves. Thus, under that assumption, the only way to have a non-gap interpretation is to subtract the speed of the moving edge. To us this interpretation seems more consistent with the psychophysical data than to assume a no-gap interpretation based on perceiving the two surfaces as having the same angular speed. However, the later explanation, although implausible, cannot be ruled out.)

The theory of Domini et al. is consistent with a no-gap edge only for the special case in which the edge and the axis of rotation are coincident. Subtracting the speed of the edge from the optic flow field transforms the general case in which the edge and the axis of rotation are not coincident into the special case in which they are coincident. Thus, this transformation predicts that no gap should be observed and that the internal depth structure is consistent.

Domini et al. did not subtract any speed from the original optic flow field. Thus, a consistent interpretation of the flow field when the edge and the axis of rotation are not coincident requires that a gap be perceived. Ignoring the perception of a gap leads to the prediction of an inconsistent internal depth structure.

Beyond first-order optic flow

Our model was based on the analysis of the first-order optic flow under orthographic projection. However, there is evidence that humans can use second-order optic flow information (that is, accelerations) and perspective information in computing the SFM. Eagle and Hogervorst (1999) found this to be true for discriminations of the dihedral angle of a hinged plane (open book). This open book rotated about a frontoparallel axis. Discrimination thresholds decreased with size under perspective projection. Under orthographic projection, instead, thresholds increased with size until reaching random performance levels. On another study but using the same stimuli, Hogervorst and Eagle (1998) found a pattern of misperceptions that depended on stimulus size and rotation angle. They showed that errors in the estimate of speeds and accelerations could explain the pattern of misperceptions. Hogervorst and Eagle (2000) confirmed this result and extended it to orthographic projections.

Our model did not include these effects, which are certainly worthy of investigation, both psychophysically and by modeling. Another avenue for further research, suggested by a reviewer, would be to look at the consequences of ignored optic-flow parameters. Parameters that the model does not use should result in predicted metameric combinations of objects and motions. Do humans also have difficulty discriminating these combinations?

Conclusions

This article presents a quantitative model of the perception of structure from motion. Its predictions match most psychophysical data for both frontoparallel and non-frontoparallel rotations.

The model's structure is conceptually simple, and consists of two parts. The first part is an equation ( Equation 7) that describes how to compute the perceived depth structure as a function of two global parameters, Ω _obs and ∇ _obs. The second part is a template-matching model that uses the optic flow field as the input to compute the two global parameters required by Equation 7.

The model is algorithmic. Therefore, even though some of its computational modules, such as the MST templates, are inspired in physiology, the model is not intended as a physiologically plausible model of SFM computation. To fill this gap we have implemented a physiologically plausible neural model of SFM computation based on the model presented here; it will appear as a separate paper.

The model predictions presented here were derived from severely constrained assumptions about the templates. The general success of these predictions shows that the template approach is well suited for modeling the problem, even without an optimized template set. As we already mentioned, the addition of more templates should improve the fit of the model's predictions to psychophysical data.

Of course, the idea is not to have an infinite number of templates, but only a basic set. Visual objects will give rise to a population response across the set. For purposes of illustration, in this article we choose a minimal basis set, planes and cylinders, representatives of first-order and second-order templates, respectively. This is the simplest set consistent with our purposes. Any stimulus that is neither a cylinder nor a planar surface will fall into a separate parameterized class that is not represented in the templates. Nevertheless, each template in each set of templates (that is, templates for both cylinders and planar surfaces) will vote individually on which angular speed corresponds to the presented stimulus. In addition, we are currently trying a different set of templates resembling optic-flow operators, rather than specific objects. This approach would be more in line with the tuning properties of some MST cells. Beyond this, it would be rather premature to speculate about what might constitute a realistic set of basic templates.

Notes on computational methods

Computation of Figure 3b

We used Equation 19, where h( τ, τ ₀) is derived in the 1. For the template we used a plane with τ ₀ = 0 and σ ₀ = tan75°. The limits of integration used were x ₁ = y ₁ = −.45 x ₂ = −.45 y ₂ (where the particular value for x ₁ is not important because they cancel out).

Computation of Figure 4b

The stimuli were cylinders with simulated curvatures, C _sim, with a range of [0.5, 2] and an angle of 12.5° between the line of sight and cylinder's apex (equal to the average angle used for the data of Figure 4a) and Ω _sim = 135°/sec (the value used in Fernandez & Farell, 2007).

For the templates, we approximated Ω ₀ as being a continuous variable and used Equation 14 to compute Ω ₀ for each of the curvatures used. We used cylinders with curvatures C = .5, 1, 1.5, 2, 2.5 and 3; and with an angle of 5° between the line of sight and cylinder's apex. We integrated numerically the second and third integrals of Equation 13 in order to compute Ω ₀ in Equation 14. Then we used the activities computed from Equation 9 to compute the population average value for Ω ₀, with β = 5 × 10 ⁻⁴ sec ²/deg ².

Once Ω ₀ was computed, we obtained Ω _obs as Ω _obs = Ω ₀ (notice that C _obs/ C _sim = Ω _sim/Ω _obs, see Fernandez & Farell, 2007).

Computation of Figure 5a

We used Equation 22 and fitted by hand the value of 0.2 for λ C (same value for the three curves) to get a reasonable match with the psychophysical data shown in Figure 5b.

Computation of Figure 6

The data from Fernandez and Farell (2007) was replotted here and fitted by the least squares method to a straight line constrained to pass through the origin of coordinates.

Appendix A

Derivation of Equations 6, 18 and 22

Equation 6

By definition (Fernandez & Farell, 2007), 1/λ is a factor that represents the deviation between the perceived angular speed of rotation, Ω_obs, and an optic-flow consistent angular speed of rotation, Ω_OFC (=−∂_xv_y/sinθ_obs). In general, Ω_obs is not consistent with the first-order optic flow. Substituting the definition of Ω_OFC into the definition of λ (=Ω_OFC/Ω_obs), we obtain Equation 6.

Equation 18

To compute Ω ₀ we use Equation 14. To do so, we need to compute first the integrals A _G and B, defined in Equation 13. The distance of a point on a planar surface from a frontoparallel plane in a direction towards the observer can be written as (see, e.g., Domini & Braunstein, 1998):

z (x, y) = \frac{σ}{\sqrt{1 + τ^{2}}} (x + τ y) = a x + b y

(A1)

where σ and τ are the slant and tilt of the planar surface. We will use Equation A1 to represent the stimuli, and a similar equation but with σ and τ replaced by σ₀ and τ₀ for the planar surface representing the template. Then we have

\begin{matrix} A_{G} & = & \int_{x 1}^{x 2} \int_{y 1}^{y 2} (a_{0} x + b_{0} y)^{2} d x d y a n d \\ B & = & \int_{x 1}^{x 2} \int_{y 1}^{y 2} (a x + b y) (a_{0} x + b_{0} y) d x d y \end{matrix}

(A2)

After lengthy but straightforward work integrating these two expressions, we get:

A_{G} = σ_{0}^{2} f_{A} (τ_{0}) a n d B = σ σ_{0} g_{B} (τ, τ_{0})

(A3)

where

\begin{array}{l} f_{A} (τ_{0}) = \frac{α_{1}}{1 + τ_{0}^{2}} + \frac{α_{2}}{1 + \frac{1}{τ_{0}^{2}}} + \frac{τ_{0} α_{3}}{1 + τ_{0}^{2}} a n d \\ g_{B} (τ, τ_{0}) = \frac{α_{1}}{\sqrt{(1 + τ^{2}) (1 + τ_{0}^{2})}} + \frac{α_{2}}{\sqrt{(1 + \frac{1}{τ^{2}}) (1 + \frac{1}{τ_{0}^{2}})}} + \\ \frac{(τ + τ_{0}) α_{3}}{\sqrt{(1 + τ^{2}) (1 + τ_{0}^{2})}} \end{array}

(A4)

with

\begin{matrix} α_{1} = \frac{1}{3} (x_{2}^{3} - x_{1}^{3}) (y_{2} - y_{1}) \\ α_{2} = \frac{1}{3} (y_{2}^{3} - y_{1}^{3}) (x_{2} - x_{1}) \\ α_{3} = \frac{1}{4} (x_{2}^{2} - x_{1}^{2}) (y_{2}^{2} - y_{1}^{2}) \end{matrix}

(A5)

Replacing Equations A3 and A4 into Equation 14 we get Equation 18, where h( τ, τ ₀) = g _B( τ, τ ₀)/f _A( τ ₀).

Equation 22

By replacing θ _obs = 90 − σ _axis and −∂ _x v _y = Ω _simcos σ _axis = ρ in Equation 6, we obtain:

\cos σ_{a x i s} = \frac{ρ}{λ Ω_{o b s}}

(A6)

Now, Ω _obs = kΩ ₀ ^1/2, and from Equation 18 we have Ω ₀ ∝ def, so Ω _obs = C

\sqrt{d e f}

, which substituted into Equation A6 gives Equation 22.

Miscellaneous derivations

Condition required for having λ = const. ( cylinders)

As shown in Fernandez and Farell (2007), for cylinders rotating about its longitudinal axis, the shape of the perceived object relative to the axis of rotation (Δz_axis) does not change with the inclination of this axis (Δz_axis = const.). As shown there also, this perceived structure is given by:

\frac{Δ z_{a x i s}}{Z_{0}} = - λ \frac{Δ v_{x}}{\partial_{x} v_{y}} \tan θ_{o b s} \cos θ_{o b s},

(A7)

which is basically the first term in Equation 5 multiplied by cosθ_obs to project it in the direction perpendicular to the axis of rotation. Replacing v_y = v_z^axissinθ_sim (see Fernandez & Farell, 2007), where v_z^axis does not change with a change in the axis's slant, Equation A7 becomes:

λ \frac{Δ v_{x}}{\partial_{x} v_{z}^{a x i s}} \frac{\sin θ_{o b s}}{\sin θ_{s i m}} = c o n s t .

(A8)

Because Δ v _x/∂ _x v _z ^axis = const. (that is, it does not change with the axis's slant, as long as the angular speed of rotation is kept constant), the condition for λ to be a constant becomes sin θ _obs/sin θ _sim = const. When we allow not just the axis's inclination, but also the speed of rotation to change, then the condition for λ to be a constant becomes more complex.

The optic-flow field of a rotating object can be written as the product of a shape factor and the angular speed of rotation

This decomposition of optic flow is valid only for frontoparallel rotations. Let us assume, without loss of generality that the axis of rotation is vertical. Then, the trajectory of an arbitrary point on the object will follow a circular path (the circle, of radius R, is perpendicular to the axis of rotation). The coordinates of such a point can be described as:

\begin{matrix} x & = & R \sin (Ω t + φ) \\ y & = & c o n s t . \\ z & = & R \cos (Ω t + φ) \end{matrix}

(A9)

where Ω is the angular speed of rotation and φ is the initial phase. The horizontal speed of such a point will be given by

v_{x} = \frac{d x}{d t} = Ω R \cos (Ω t + φ) = Ω z (x, y)

(A10)

where we give z as a function of position ( x, y) because the point under consideration is arbitrary and the description is valid for any point on the object. Equation A10 shows that the velocity field of a rotating object can be written as the product of a shape factor and the angular speed of rotation.

When the axis of rotation is slanted, v _x is still proportional to Ω, but in this case instead of a shape factor we have a more complex expression. This is easily obtained from Equation 2, which after a little work can be rewritten as

v_{x} = Ω [\frac{z (x, y)}{Z_{0}} \cos θ - y \sin θ] = Ω Γ (x, y, θ)

(A11)

def = Ω σ = Ω _obs σ _obs

Domini and Braunstein (1998) showed that def = ωσ for arbitrary rotations (where ω is the projection of Ω into the frontoparallel plane). Thus this reduces to def = Ωσ for frontoparallel rotations. Here we show that Ω_obsσ_obs = def for frontoparallel rotations, which is the context in which we used it.

By definition,

d e f = \sqrt{d e f_{1}^{2} + d e f_{2}^{2}},

(A12)

where def ₁ =

\frac{\partial v_{x}}{\partial x} - \frac{\partial v_{y}}{\partial y}

and def ₂ =

\frac{\partial v_{x}}{\partial y} + \frac{\partial v_{y}}{\partial x}

. For frontoparallel rotations v _y = 0, so def reduces to

d e f = \sqrt{{(\frac{\partial v_{x}}{\partial x})}^{2} + {(\frac{\partial v_{x}}{\partial y})}^{2}} = | \nabla \to v_{x} |

(A13)

where

\nabla \to

v _x is the gradient of v _x. From Equation 1 we have

Δ v_{x} = - Ω_{o b s} \frac{Δ Z_{o b s}}{Z_{o b s}},

(A14)

which shows that, at each location on the image, a change in v _x is maximal along the same direction (in the image plane) in which the change in perceived depth is maximal—or, stated in a different way, at each location in the image the gradients of v _x and z always point in the same direction, although this direction could vary across the image. If we call this direction r, then we have (from Equation A14):

\frac{Δ v_{x}}{Δ r} = Ω_{o b s} \frac{Δ Z_{o b s}}{Δ r Z_{o b s}} .

(A15)

By definition

\frac{Δ v_{x}}{Δ r}

= ∣

\nabla \to

v _x∣ = def and

\frac{Δ Z_{o b s}}{Δ r Z_{o b s}}

= σ _obs, which, when substituted into Equation A15, results in def = Ω _obs σ _obs.

Perceived depth in planar surfaces is a decreasing function of def

Increasing slant increases perceived depth, but this happens if you keep a constant retinal separation between the points, which means that the simulated depth also increases. Thus, in this case perceived depth increases with simulated depth, as well. However, for fixed simulated depth but variable simulated slant (and thus variable simulated retinal size), the perceived depth is actually a decreasing function of simulated slant (and thus of def). Let us show this. By definition, we have:

\begin{matrix} Δ z = σ Δ y a n d \\ Δ z_{o b s} = σ_{o b s} Δ y \end{matrix} .

(A16)

where σ is the surface slant, Δ z the depth difference between two points on the surface and Δ y the difference in height between the same two points (we assume here for simplicity that the surface has a tilt of zero; in the general case y would be measured along the tilt direction rather than along the vertical direction). The subscript obs refers to perceived quantities. If we eliminate Δ y from Equation A16 and also use def = Ω σ and σ _obs = ασ ^1/2, where α is a constant, we get:

Δ z_{o b s} = \frac{α Ω^{1 / 2} Δ z}{d e f^{1 / 2}}

(A17)

which is a decreasing function of def.

Slant from def predicts internally consistent structure

The heuristic proposed by Domini et al. (1998) states that the perceived slant of a planar surface is a function of def. We show here that this predicts that the depth structure of the perceived object will be internally consistent.

Domini et al. (1998) heuristic is well defined for planar surfaces, or combinations of these, such as dihedral angles (open books). In this case, def has a constant value across each planar surface and so its value is well defined for the surface. The same is valid for the planar surface's slant, whose value is constant across the surface.

We will call this the discrete version of Domini et al. (1998), as opposed to the continuous version that can be used on curved surfaces. In the continuous version the value of def varies across the surface. It is unclear how the Domini et al.'s heuristic generalizes to this case. One way is for the perceived slant at any point on the surface to be a function of the local def at that position. Let us first deal with the discrete version of the model.

The discrete version of Domini et al.'s (1998) heuristic is actually similar to the one presented in this article: our model also predicts, in certain circumstances, that the slant is also a function of def (see Equations 19 and 20). Our model always results in an internally consistent perceived depth structure. Why then did Domini et al. (1998) claim that depth structure is internally inconsistent? Their claim is based in the assumption that the intersection (or edge) of two planar surfaces that are simulated to rigidly rotate with the same angular speed will be perceived to intersect at the simulated intersection even if they are perceived as rotating at different angular speeds. However, this is inconsistent with the basic SFM equations (see Fernandez & Farell, 2007).

We now show that the continuous version of the model also predicts an internally consistent depth structure. The easiest way to show this is the following. We need to show that the depth difference between any two points on the object is independent of the path that we use to compute this distance. The depth between two points can calculated as the path integral:

Δ z_{12} = \int_{P 1}^{P 2} d z = \int_{P 1}^{P 2} (\frac{\partial z (x, y)}{\partial x} d x + \frac{\partial z (x, y)}{\partial y} d y),

(A18)

where z( x, y) is an arbitrary function of retinal position ( x, y). For instance, z could be a function of either def or σ, which in turn could also be arbitrary functions of retinal position ( x, y). In general, the value of a path integral of the form

\int_{P 1}^{P 2}

[f( x, y)d x + g( x, y)d y], with f( x, y) and g( x, y) arbitrary functions, is a function of the path from P ₁ to P ₂. To have a consistent object, the value of the integral in Equation A18 must be independent of the path. Using Stokes' theorem (Kaplan, 1952), it can be shown that a path integral is independent of the path if and only if

\frac{\partial f}{\partial y} = \frac{\partial g}{\partial x}

(A19)

and for the case of Equation A18 this reduces to

\frac{\partial}{\partial y} (\frac{\partial z}{\partial x}) = \frac{\partial}{\partial x} (\frac{\partial z}{\partial y})

(A20)

Equation A20 is always true if z(x, y) is a continuous function of both x and y. Thus, we have demonstrated that, for a smooth surface, the perceived internal depth structure for the generalized Domini et al. model is always consistent.

For planar surfaces sin θ _obs ∝ sin θ _sim

This is valid only for planar surfaces in which the axis of rotation is contained in the planar surface. For such a case, we found that a good match to psychophysical data is obtained if Ω _obs = kΩ ₀ ^1/2. Following a procedure similar to the one used to obtain Equation 23, we get:

\sin θ_{o b s} = \frac{Ω_{s i m}^{1 / 2}}{λ k} \sin θ_{s i m} .

(A21)

Note that sin θ _obs = cos σ _axis, thus Equation A21 is just a different version of Equation 22.

Acknowledgments

This research was supported by NEI Grant EY12286 (B.F.).

Commercial relationships: none.

Corresponding author: Julian M. Fernandez.

Email: julian_fernandez@isr.syr.edu.

Address: Institute for Sensory Research, Syracuse University, 621 Skytop Rd, Syracuse, NY 13224, USA.

Footnotes

¹ We use the term “affine” here in the more restricted way often used in SFM studies, which means a scaling factor in depth. An actual affine transformation can include a shearing transformation in depth, which does not preserve depth ordering.

Footnotes

² In principle, observers could use information about the period to estimate Ω. They could do this if they were able to see a few full rotations (which is not the case in most psychophysical tests) and the object had a salient feature to mark the periodicity. Our pilot experiments with rotating elliptical cylinders show that observers are very poor at predicting when a uniquely colored dot will reappear on the front. Even rather large shifts in the occluded dot's position go undetected when the dot reappears, making it unlikely that Ω is estimated from the period.

Footnotes

³ This argument could explain why Hogervorst et al. (1997) found that stimuli in which the slant of the rotation axis was changing continuously did not lead to a nonrigid percept, in opposition to the results of Loomis and Eby (1988) and to the predictions of our model.

Footnotes

⁴ An alternative way to compare Equations 4 and 5 is to say that the perceived 3-D structure and motion are consistent with a different optic flow in which the observed value of (Δ v _x/∂ _x v _y) is λ times the real value. This would mean that Δ v _x, ∂ _x v _y or both are observed wrongly, whereas our approach assumes the observed ∂ _x v _y is 1/ λ times the real ∂ _x v _y.

Footnotes

⁵ By “cylinder” we mean any shape that doesn't change along the axis of rotation. This includes planar surfaces in which the axis of rotation is contained in the planar surface.

Footnotes

⁶ Of course, our theory is general and does not assume that the axis of rotation is fixed relative to the object as the slant of the axis changes. We only made the assumption that the axis of rotation is parallel to the longitudinal axis for cylinders because the psychophysical data from Fernandez and Farell (2007) were obtained under this condition. The model can also be applied to cylinders in which the longitudinal and the rotation axes are not linked. It will result in a different set of predictions, but at this time there are no psychophysical data to test them.

Footnotes

⁷ Note that, in general, F and G are not identical functions and thus the set of parameters α and γ ₀ need not be the same.

Footnotes

⁸ The “curve” is actually an N-dimensional hypersurface in a N + 1 dimensional space, where N in the number of parameters γ ₀.

Footnotes

⁹ Strictly speaking, 〈Ω ₀〉 should be computed as the standard weighted population average 〈Ω ₀〉 =

\frac{\int Ω_{0} r d Ω_{0} d γ_{0}}{\int r d Ω_{0} d γ_{0}}

; Equation 15 corresponds to an approximation in which only the peaks of the population response are considered, which is equivalent to treating the width of the population response along the two parameters Ω ₀ and γ ₀ as zero.

Footnotes

¹⁰ An anonymous reviewer noted an interesting parallel between the template model and the model of Hogervorst and Eagle (1998). To quote the reviewer, “When the templates reflect the prior probability of shapes and motions that can occur (which is likely) the activity of the template cells resembles the posterior probability. Moreover, the model by Hogervorst and Eagle (1998) calculates for each combination of shape and motion parameters a probability that depends on the match between the optic flow (the stimulus) and the flow resulting from a 3-D object and motion (similar to a template). In the template model, each combination is covered by a template cell.”

Footnotes

¹¹ We are using here Equation 14 instead of Equation 15 because all the templates used for planar surfaces have the same value of the parameters σ ₀, and τ ₀, so no averaging is needed.

Footnotes

¹² This linear relationship is only meant as an approximation within the range of speeds tested. It cannot be valid when Ω gets close to zero. Notice that Ω _obs is minimal at about 12 deg/s and ζ = 0.5 deg/s. In addition, there aren't any templates that represent Ω _obs = 0 (similarly, there probably aren't any MST cells that respond to static stimuli). And, of course, at sufficiently low speeds SFM no longer works (neither perceptually nor in the model).

Footnotes

¹³ C _sim in our experiments is a function of the curvature, but it is actually defined as the ratio between the two principal axes: the one that crosses the line of sight and the one that crosses the frontoparallel plane. The later has a constant size, so with this definition the ratio C _obs/ C _sim gives the scaling factor in depth.

Footnotes

¹⁴ The cylinders rotated about their longitudinal axes, which were frontoparallel (because, as explained elsewhere in the article, model results for cylinders are independent of the simulated slant of the axis of rotation). Thus, the elliptical cross-section of the cylinder (perpendicular to the longitudinal axis) is in the horizontal plane. The angle between the major axis of this ellipse and the line of sight is 12.5° regardless of the cylinder's ellipticity.

Footnotes

¹⁵ As shown in Figure 2a, either a square root function or a linear function describes well the psychophysical data; we chose the square root function for analytical reasons, for the derivations result in simpler equations.

Footnotes

¹⁶ Segmentation is based on the optic flow. However, the algorithm we used is very rudimentary. For instance, we segment into parts when there is a discontinuity in the first derivative of the optic flow, such as at the edge of an open book. But we are still far from a good understanding on how segmentation works in SFM (or how it works in general, for that matter).

References

Braunstein, M. L. Liter, J. C. Tittle, J. S. (1993). Recovering three-dimensional shape from perspective translations and orthographic rotations. Journal of Experimental Psychology: Human Perception and Performance, 19, 598–614. [PubMed] [CrossRef] [PubMed]

Caudek, C. Domini, F. (1998). Perceived orientation of axis of rotation in structure-from-motion. Journal of Experimental Psychology: Human Perception and Performance, 24, 609–621. [PubMed] [CrossRef] [PubMed]

Cornilleau-Pérès, V. Droulez, J. (1989). Visual perception of curvature: Psychophysics of curvature detection induced by motion parallax. Perception & Psychophysics, 46, 351–364. [PubMed] [CrossRef] [PubMed]

Di Luca, M. Domini, F. Caudek, C. (2004). Spatial integration in structure from motion. Vision Research, 44, 3001–3013. [PubMed] [CrossRef] [PubMed]

Domini, F. Braunstein, M. L. (1998). Recovery of 3-D structure from motion is neither Euclidean nor affine. Journal of Experimental Psychology: Human Perception and Performance, 24, 1273–1295. [PubMed] [CrossRef]

Domini, F. Caudek, C. (1999). Misperceptions of angular velocities influence the perception of rigidity in the kinetic depth effect. Journal of Experimental Psychology: Human Perception and Performance, 25, 426–444. [PubMed] [CrossRef] [PubMed]

Domini, F. Caudek, C. (2003). 3-D structure perceived from dynamic information: A new theory. Trends Cognitive Science, 7, 444–449. [PubMed] [CrossRef]

Domini, F. Caudek, C. Proffitt, D. R. (1997). Misperceptions of angular velocities influence the perception of rigidity in the kinetic depth effect. Journal of Experimental Psychology: Human Perception and Performance, 23, 1111–1129. [PubMed] [CrossRef] [PubMed]

Domini, F. Caudek, C. Richman, S. (1998). Distortions of depth-order relations and parallelism in structure from motion. Perception & Psychophysics, 60, 1164–1174. [PubMed] [CrossRef] [PubMed]

Eagle, R. A. Blake, A. (1995). Two-dimensional constraints on three-dimensional structure from motion tasks. Vision Research, 35, 2927–2941. [PubMed] [CrossRef] [PubMed]

Eagle, R. A. Hogervorst, M. A. (1999). The role of perspective information in the recovery of 3D structure-from-motion. Vision Research, 39, 1713–1722. [PubMed] [CrossRef] [PubMed]

Fernandez, J. M. Farell, B. (2007). Shape constancy and depth-order violations in structure from motion: A look at non-frontoparallel axes of rotation. Journal of Vision, 7, (7):3, 1–18, http://journalofvision.org/7/7/3/, doi:10.1167/7.7.3. [PubMed] [Article] [CrossRef] [PubMed]

Fernandez, J. M. Watson, B. Qian, N. (2002). Computing relief structure from motion with a distributed velocity and disparity representation. Vision Research, 42, 883–898. [PubMed] [CrossRef] [PubMed]

Giese, M. A. Poggio, T. (2003). Neural mechanisms for the recognition of biological movements and actions. Nature Reviews Neuroscience, 4, 179–192. [PubMed] [CrossRef] [PubMed]

Hildreth, E. C. Ando, H. Andersen, R. A. Treue, S. (1995). Recovering three-dimensional structure from motion with surface reconstruction. Vision Research, 35, 117–137. [PubMed] [CrossRef] [PubMed]

Hogervorst, M. Kappers, A. M. L. Koenderink, J. J. (1993). Perception of metric depth from motion parallax. Perception, 22, 101.

Hogervorst, M. A. Eagle, R. A. (1998). Biases in three-dimensional structure-from-motion arise from noise in the early visual system. Proceedings: Biological Science, 265, 1587–1593. [PubMed] [Article] [CrossRef]

Hogervorst, M. A. Eagle, R. A. (2000). The role of perspective effects and accelerations in perceived three-dimensional structure-from-motion. Journal of Experimental Psychology: Human Perception and Performance, 26, 934–955. [PubMed] [CrossRef] [PubMed]

Hogervorst, M. A. Kappers, A. M. Koenderink, J. J. (1997). Monocular discrimination of rigidly and nonrigidly moving objects. Perception & Psychophysics, 59, 1266–79. [PubMed] [CrossRef] [PubMed]

Kaplan, W. (1952). Advanced calculus. Reading, Massachusetts: Addison-Wesley Publishing Company, Inc.

Koenderink, J. J. van Doorn, A. J. (1991). Affine structure from motion. Journal of the Optical Society of America A, Optics and Image Science, 8, 377–385. [PubMed] [CrossRef] [PubMed]

Liter, J. C. Braunstein, M. L. Hoffman, D. D. (1993). Inferring structure from motion in two-view and multi-view displays. Perception, 22, 1441–1465. [PubMed] [CrossRef] [PubMed]

Loomis, J. M. Eby, D. W. (1988). Perceiving structure from motion: Failure of shape constancyn Proceedings of Second International Conference on Computer Vision (pp. 383–391). Washington, D.C.: Computer Society of the IEEE.

Norman, J. F. Lappin, J. S. (1992). The detection of surfaces defined by optical motion. Perception & Psychophysics, 51, 386–396. [PubMed] [CrossRef] [PubMed]

Norman, J. F. Todd, J. T. (1993). The perceptual analysis of structure from motion for rotating objects undergoing affine stretching transformations. Perception & Psychophysics, 53, 279–291. [PubMed] [CrossRef] [PubMed]

Perrone, J. A. Stone, L. S. (1998). Emulating the visual receptive-field properties of MST neurons with a template model of heading estimation. Journal of Neuroscience, 18, 5958–5975. [PubMed] [Article] [PubMed]

Pollick, F. E. Nishida, S. Koike, Y. Kawato, M. (1994). Perceived motion in structure from motion: Pointing responses to the axis of rotation. Perception & Psychophysics, 56, 91–109. [PubMed] [CrossRef] [PubMed]

Simoncelli, E. P. Heeger, D. J. (1998). A model of neuronal responses in visual area MT. Vision Research, 38, 743–761. [PubMed] [CrossRef] [PubMed]

Tittle, J. S. Todd, J. T. Perotti, V. J. Norman, F. N. (1995). Systematic distortion of perceived three-dimensional structure from motion and binocular stereopsis. Journal of Experimental Psychology: Human Perception and Performance, 21, 663–678. [PubMed] [CrossRef] [PubMed]

Todd, J. T. Watanabe, T. (1998). Theoretical and biological limitations on the visual perception of 3D structure from motion. High-level motion processing—Computational, neurophysiological and psychophysical perspectives. (pp. 359–380). Cambridge, MA: MIT Press.

Todd, J. T. Bressan, P. (1990). The perception of 3-dimensional affine structure from minimal apparent motion sequences. Perception & Psychophysics, 48, 419–430. [PubMed] [CrossRef] [PubMed]

Todd, J. T. Norman, J. F. (1991). The visual perception of smoothly curved surfaces from minimal apparent motion sequences. Perception & Psychophysics, 50, 509–523. [PubMed] [CrossRef] [PubMed]

Treue, S. Andersen, R. A. Ando, H. Hildreth, E. C. (1995). Structure-from-motion: Perceptual evidence for surface interpolation. Vision Research, 35, 139–148. [PubMed] [CrossRef] [PubMed]

Ullman, S. (1979). The interpretation of visual motion. Cambridge, MA: MIT Press.

Ullman, S. (1984). Maximizing rigidity: The incremental recovery of 3-d structure from rigid and nonrigid motion. Perception, 13, 255–274. [PubMed] [CrossRef] [PubMed]

Werkhoven, P. van Veen, H. A. (1995). Extraction of relief from visual motion. Perception & Psychophysics, 57, 645–656. [PubMed] [CrossRef] [PubMed]

Jump To...

Related Articles

From Other Journals

Related Topics

Jump To...

This feature is available to authenticated users only.

Related Articles

From Other Journals

Related Topics

To View More...

You must be signed into an individual account to use this feature.