Our visual system is highly sensitive to the movement patterns of other living creatures. This ability is so well developed that we obtain an immediate, vivid percept of a walking human, already from seeing just a few points attached to the joints of a moving body (Johansson,
1973). Point-light displays contain both form and motion information. Each point at each time provides position information about a single spot on the body. Integration of the positions of multiple points, either per frame or over time, yields form information about the configuration of the body. At the same time, the temporal evolution of the positions of each single point provides local motion, acceleration, and trajectory information for that point.
The limited-lifetime technique can be used to investigate the contributions of motion, acceleration, and trajectory of individual points, while leaving global form intact. In limited-lifetime stimuli each single point is shown only for a limited number of successive image frames, after which it is extinguished. The number of frames that a point lives determines whether this point offers motion, acceleration, or trajectory information to the viewer. If the lifetime is limited to only a single frame the point cannot offer motion information because it is not moving with the limb between frames. The minimum lifetime for motion is two frames because then apparent motion sensors can be activated. A higher lifetime may improve the local motion sensing by spatio-temporal integration. If the point moves in a straight line the motion measurement will become more robust. If the point moves along a curved trajectory, on the other hand, simple spatio-temporal integration would introduce errors since the motion direction is changing between each pair of frames. Lifetimes longer than two frames also offer acceleration information, i.e., how the local motion changes over time. Lastly, the longer the lifetime the more information about the trajectory of the point is available. The trajectory is the curve in space that the point traverses over time and is independent of direction or speed of the motion of the point. The trajectory cannot be calculated at any moment in time but is a shape that must be estimated from observing the positions of a point over time.
The limited-lifetime technique was first applied to biological motion by Neri, Morrone, and Burr (
1998) who used a lifetime of two frames in a walker with only six points placed randomly on the main joints of the body. Beintema and Lappe (
2002) examined the role of local motion and global form with limited-lifetime walkers in which the individual points appear at random locations on the limbs of the body. Local motion of these points was manipulated by limiting the lifetime of the points, i.e., the number of frames that a point moves along with a single spot on the body. When lifetime was reduced to one frame only, no local motion information was present because the points did not follow the movement of the body. Nevertheless, naive observers spontaneously recognized these animations as human walkers (Beintema & Lappe,
2002) and could reliably judge the facing direction and the coherency of a walker, as well as discriminate between forward and backward walking (Beintema, Georg, & Lappe,
2006). Thus, local motion was not necessary for these tasks.
Lange et al. (Lange, Georg, & Lappe,
2006; Lange & Lappe,
2006) have suggested that a template matching analysis of the body configuration may underlie biological motion recognition. In this model, the positions of points in each stimulus frame are matched to templates of the human body in different postures. Local image motion from individual points is not used. The motion of the body is derived from analyzing the evolution of the best-matching body postures over time.
Thus, from experimental observations and computational considerations local image motion does not appear necessary for biological motion analysis. However, experiments that used the limited-lifetime technique have so far only used profile views of walking in orthographic projection (
Figure 1A). It is important to test the usage of local image motion in other view orientations and in perspective projection because the combination of profile view and orthographic projection is a special case for two reasons.
The first reason is the difference between orthographic and perspective projection. In orthographic projection, a point
on the body is projected onto a point
in the image so that
Here, the projection is without loss of generality assumed to be along the Z-axis. Image coordinates (
x, y) directly correspond to world coordinates (
X, Y), and the depth coordinate
Z is lost in the projection. The image motion
v orth of image point
p orth is
Therefore, the image motion is independent of the motion-in depth, Ż, of point
P along the Z-axis. Any information about the motion-in depth component of the point on the body therefore has to be gleaned from the motion along the X- and Y-axes. This requires knowledge of the structure of the human body, as, for instance, provided by a template of the body. The visual information in image point positions and image point motions is mathematically insufficient to estimate body posture and movement (Ullman,
1984) and perceptual recognition can only be achieved when additional assumptions about the structure or movement of the body are introduced. This can be done either by assuming explicit body models (Aggarwal & Cai,
1999; Chen & Lee,
1992; Marr,
1982; Rashid,
1980) or biomechanical constraints on the body motions (Hoffman & Flinchbaugh,
1982; Webb & Aggarwal,
1982).
The mathematical insufficiency of the visual position and local image motion signals for biological motion recognition also holds for perspective projection. However, unlike in orthographic projection, the position and image motion signals in perspective projection contain information about the
Z (depth) component of the walker. In perspective projection, point
P is projected onto
p persp so that
where
f is the focal length of the projection. The image motion
v persp of image point
p persp is
Therefore, the image motion in perspective projection consist of a part that is specified by the motion
of
P in X and Y directions and a part that is specified by the motion-in-depth, Ż.
The comparison of the two projections shows that in orthographic projection all information in both the positions p orth of image points and the motions v orth of image points is related only to the X and Y coordinates of the body. Information about the Z component of the body structure and its motion is missing from the stimulus and can only be reconstructed by using external knowledge of the body structure. In perspective projection, on the other hand, both the positions p persp of image points and the motions v orth of image points carry information about the depth Z. Most importantly for our investigation, and carry independent information about depth, because depends on Z, i.e., the position in depth of point P, and depends on Z and also on Ż i.e., the motion of P in depth. Therefore, in perspective projection the local image motion of a point may convey information over and above the information conveyed by the point positions. Hence we must ask, whether local image motion, which has previously been shown to not contribute to perception in the orthographic projection, will contribute in the case of perspective projection.
The second reason why the combination of profile view and orthographic projection is a special case has to do with the shape and limb movement of the walker. In profile view, the movement of the limbs is almost exclusively in parallel to the image plane. Since there is little motion along the depth axis, the lack of information about Z-axis motion in the orthographic projection is of no influence. In fact, in orthographic projection in the profile view, the depth distribution of the light points of the stimulus is entirely ambiguous and the stimulus is mathematically indistinguishable from a flat arrangement of light points in a single depth plane. For a template matching recognition procedure it would be sufficient to match the stimulus frames to two-dimensional templates. The true three-dimensional structure of the body becomes visually more apparent when the walker is shown in other view orientations and in perspective projection. For instance in the half-profile view (
Figure 1B), the movement of the limbs is directed in depth, and, because of the perspective projection, the visual speed of the limb movement gets smaller when the limb is further away then when it is closer to the observer. Thus, in these stimuli, visual speed is an independent cue to distance and hence to the three-dimensional structure of the stimulus.
In perspective projection, visual speed is also informative about the depth structure of the walker in profile view. Consider, for example, the movement of the shoulders. The shoulder nearer to the observer will move faster than the shoulder further from the observer. Thus, in perspective projection the visual motion of points on the body provides a cue to the 3D structure of the walker. In orthographic projection, the speed of point movement is independent of the distance to the observer.
Limited-lifetime experiments with walkers in profile view in orthographic projection showed no influence of local point motion on biological motion perception. However, in perspective projection, and in view orientations other than the profile view, local point motion carries information about the 3D structure of the walker. Thus, local point motion that is irrelevant in orthographic projection may become important for biological motion perception in perspective projection. We wanted to test whether this is the case.
From Johanssons demonstrations and a number of further studies (Mather & Murdoch,
1994; Troje, Westhoff, & Lavrov,
2005; Verfaillie,
1993) it is known that observers not only readily recognize profile views of point-light walkers, but also point-light walkers seen in other view orientations. In this case, point-light actions convey a strong impression of depth even if static low-level depth cues are missing (Vanrie, Dekeyser, & Verfaillie,
2004). The depth percept conveyed by a point-light walker even dominates over conflicting disparity depth cues (Bülthoff, Bülthoff, & Sinha,
1998). It is possible that local motion information, which is not necessary in the profile view, aids the depth perception process in other views by exploiting the relationship between speed and depth in the light point motion (Ullman,
1984). On the other hand, depth perception of 3D walkers could also be achieved by template-matching without exploiting local motion signals. Such template matching could either use 2D templates for particular viewpoints or full 3D representations of the walker.
In the present study, we used 3D limited-lifetime walkers to investigate the role of local motion in the perception of biological motion for the case of differently oriented 3D walkers. We asked observers to discriminate between a display of a forward walking figure and the same display in reversed order (similar to backward walking). In profile view this task is easy even with lifetime 1, so that it does not require local image motion (Beintema et al.,
2006). We were interested whether this also holds true for other viewing angles. Specifically, as described above, image motion signals might convey information about the motion-in-depth of a point. If this is indeed the case, we would expect a difference in performance for non-profile views between orthographic and perspective projection. Moreover, if local point motion is important for biological motion perception in non-orthographic views, we would expect to find an advantage for lifetime 2 over lifetime 1 in perspective projection.