Point-light biological motion is possibly the most abstracted and impoverished visual display of actions, and humans are remarkably good at its recognition (Johansson,
1973). Typical studies have used profile views of walking, but human observers have a strong impression of depth if an oblique or frontal view is displayed (Bülthoff, Bülthoff, & Sinha,
1998). In normal life, walking may indeed be one of the most common whole-body activities that we may see, but a pure side view is rather uncommon. Instead, it is relevant to perceive in which direction a person is heading or which orientation the person has to the observer.
Perception of 3D orientation has been systematically studied for objects and faces. For example, rigid 3D objects and faces can be best recognized from canonical views (Blanz, Tarr, & Bülthoff,
1999), and humans tend to choose preferred views when learning 3D objects and faces (Harries, Perrett, & Lavender,
1991; Peissig & Tarr,
2006; Perrett & Harries,
1988). It has also been shown that local and global strategies are combined in a flexible manner when judging the facing in depth of 3D objects (Foster & Gilson,
2002; Troje & Bülthoff,
1996; Watson, Johnston, Hill, & Troje,
2005). Different views of objects and faces are represented by different populations of cells in the ventral temporal cortex in humans (Grill-Spector et al.,
1999). In monkeys it has been found that these populations are typically organized in orientation maps (Wang, Tanifuji, & Tanaka,
1998).
For point-light biological motion, it has been found that some neurons in the monkey inferotemporal cortex are selective for the 3D orientation (facing in depth) (Vangeneugden et al.,
2011). Also, in humans there is evidence for such orientation maps (Michels, Kleiser, de Lussanet, Seitz, & Lappe,
2009). Psychophysical experiments showed that humans can detect changes in the orientation in depth even if they occur during a saccade (Verfaillie & de Graef,
2000). Humans can also easily discriminate between forwards and reversed walking from almost every viewpoint even from single frame lifetime point lights (Kuhlmann, de Lussanet, & Lappe,
2009). An exception to this is the frontal view in which the performance was quite bad and depended entirely on the presence of the foot-point lights in the display.
Two kinds of depth information are available for seeing depth from a point light display, which we shall address as implicit and explicit depth cues. With
explicit depth cues we mean cues that can be present in any kind of visual stimulus, such as occlusion, binocular disparity, and perspective deformations. Explicit depth cues do contribute to the perception of depth (Vanrie, Dekeyser, & Verfaillie,
2004). With
implicit depth cues we mean cues for which implicit knowledge of the human body and its movements is needed. Current theories of biological motion perception are based on such implicit knowledge (Giese,
2004; Lange & Lappe,
2006; Lee & Wong,
2004). An interesting feature of implicit depth cues in point-light displays is that they are ambiguous for symmetric actions. This means that a frontal view of walking is the same as a back view (see also the more detailed explanation in the
Methods section,
Figure 2C). In the present study, we will only address implicit depth cues.
Humans have a clear impression of whether a point-light display faces toward or away from the observer, even if explicit depth cues are absent (Brooks et al.,
2008; Jackson, Cummins, & Brady,
2008; Schouten, Troje, Brooks, van der Zwan, & Verfaillie,
2010; Schouten, Troje, & Verfaillie,
2011; Schouten, Troje, Vroomen, & Verfaillie,
2011; Sweeny, Haroz, & Whitney,
2012; Vanrie et al.,
2004; Vanrie & Verfaillie,
2006). Explicit depth cues can disambiguate the point-light display and thus determine which of the two ambiguous facing directions an observer sees (Jackson & Blake,
2010; Vanrie et al.,
2004).
However, it is unknown how well humans can actually judge the facing in depth of point-light biological motion, and neither is it known on the basis of what sort of implicit information they do so. The answers to these questions are relevant because they may help understand the underlying mechanisms that the brain uses to recognize biological motion. Such information may include dynamic and static cues. A number of dynamic cues have been proposed for recognizing a side view of point-light walking, which could in principle be of help to recognize the facing in depth.
First, we have the knowledge that human movements are typically pendular, in the sense that the limb segments are rigid so that the movement patterns of the major joints typically revolve with respect to the adjacent joints (Chang & Troje,
2009; Hoffman & Flinchbaugh,
1982; Jokisch & Troje,
2003; Webb & Aggarwal,
1982). Second, we have implicit knowledge of the trajectories of individual body parts such as wrists and ankles (Troje & Westhoff,
2006). Third, we have implicit knowledge of the entire movement pattern (Giese,
2004). Fourth, it has been proposed that humans use their implicit knowledge of the body form at each stage of a movement, which is a motion-independent cue (Beintema & Lappe,
2002; Lange, Georg, & Lappe,
2006). Finally, local dynamic cues (Neri,
2009), or local static cues such as the ratio of the distance between the hips or shoulders over the length of limb segments, could be informative.
In the present study, we aimed to measure how well human observers can tell the facing in depth in the absence of explicit depth cues and to extract the relative importance of the different implicit cues mentioned above. Five manipulations were applied to a recorded pattern of human walking. These manipulated systematically the naturalness of the motion, the rigid structure of the body segments, the body structure, the movement of the dots, and the underlying coherency of the points. These stimuli faced in a random horizontal direction. The task of the subjects was to report the facing direction as precisely as possible. In the
Discussion we will address the question of which mechanisms might underlie the depth perception from implicit depth cues on the basis of the kinds of errors that the observers make.