Imagine meeting another person on the street; the person walks towards you while you move towards them. The retinal motion you experience is a combination of two motion components: your own self-motion and the biological motion of the other person. The purpose of the current study is to investigate how the visual system differentiates these two components.
Locomotion through the world generates a pattern of expanding visual motion on the retina known as optic flow (Gibson,
1950). If the world is entirely stable and does not contain any additional motion, this pattern can be decomposed to provide information about the observer's movement, such as the direction in which one is heading (Longuet-Higgins & Prazdny,
1980). Optic flow is thus an important source of information for visually guided navigation.
Humans can use optic flow to detect their heading in a number of situations and with a degree of accuracy that allows for safe locomotion (Cutting, Springer, Braren, & Johnson,
1992; Lappe, Bremmer, & van den Berg,
1999). This level of accuracy is maintained even when the visual stimulus contains perturbations induced by eye movements (Li & Warren,
2000; Warren & Hannon,
1990). Warren, Morris, and Kalish (
1988) showed that heading estimation was fairly reliable in scenes containing as few as 10 points, but dropped when only two points were visible. Heading estimation also deteriorates when the direction of dots in the flow field is randomized, but remains stable if the speed of individual dots is randomized while keeping the configuration of the flow pattern intact (Warren, Blackwell, Kurtz, Hatsopoulos, & Kalish,
1991). This suggests that the critical information for heading detection lies in the global structure of the optic flow field.
The importance of the global structure of the flow field for heading estimation is further supported by neurophysiological evidence showing that neurons in middle temporal (MT) and medial superior temporal (MST) areas, which are likely responsible for heading detection (Peuskens, Sunaert, Dupont, Van Hecke, & Orban,
2001), have large receptive fields that are responsive to motion in sizable portions of the visual field (Duffy,
1998; Duffy & Wurtz,
1995; Lappe, Bremmer, Pekel, Thiele, & Hoffmann,
1996; Peuskens et al.,
2001; Smith, Wall, Williams, & Singh,
2006; Tanaka & Saito,
1989; Yu, Hou, Spillmann, & Gu,
2017). It is also mirrored by models of optic flow processing, which generally account for heading estimation by pooling motion vectors over large portions of the scene (Bruss & Horn,
1983; Calow, Krüger, Wörgötter, & Lappe,
2005; Lappe & Rauschecker,
1993; Perrone & Stone,
1994).
The estimation of self-motion from optic flow relies on the stability and rigidity of the global flow pattern (Bruss & Horn,
1983; Longuet-Higgins & Prazdny,
1980). In ecological situations, however, humans often move in conjunction with, or towards other people. In such cases the scene is not rigid, as the movement of another person in the world also produces a characteristic pattern of visual motion. This pattern is referred to as biological motion and consists of both the translation of the other person, as well as the articulation of their limbs (Johansson,
1973).
From the point of view of heading perception, the nonrigidity produced by the locomotion of an oncoming walker provides a potential source of confusion. Any movement of a point in the environment that is in addition to the self-motion of the observer impairs the usefulness of that point for heading estimation, as the visual motion of this point is an ambiguous combination of two sources of movement. Without knowledge about the movement of the point in the environment, it is not possible to ascertain how much of its retinal motion is due to self-movement. Several studies have shown that even single objects moving externally from an observer affect heading estimation (Layton & Fajen,
2015,
2016; Royden & Hildreth,
1996; Warren & Saunders,
1995), and that the addition of multiple translating objects (Andersen & Saidpour,
2002) or random motion components (van den Berg,
1992; Warren et al.,
1991) to a scene is also deleterious to heading estimation.
In the studies cited above, moving objects were simple shapes, such as squares, points or polyhedrons. Biological motion has the additional level of complexity induced by limb motion, which adds further spurious point motion to the walker's movement in the environment. On the one hand, this limb motion complicates the optic flow pattern. On the other hand, however, the articulated movement of the body in biological motion conveys information about the source of motion, its direction, and its speed.
Biological motion carries an abundance of information about the movement of actors in the environment. Even when biological motion is reduced to several points attached to the main joints of an actor, a moving person can still be readily recognized (Johansson,
1973). The visual system is highly sensitive to these so-called point light (PL) stimuli, and they can be used to depict a wide range of complex actions (Dittrich,
1993). In addition, the properties of an action such as its speed, direction, or intention can be deduced (Blakemore & Decety,
2001; Lange & Lappe,
2006; Troje & Westhoff,
2006) and future actions can be predicted based on the immediately preceding movements (Diaz, Fajen, & Phillips,
2012).
These attributes provide information about a person's movement in the environment. For example, Jackson, Warren and Abernathy (
2006) showed that rugby players can predict direction changes of other players based on the pattern of their body kinematics and that deceptive body kinematics can adversely affect novice players' judgments. Similarly, studies have shown that there is an intrinsic link between a walker's articulation and its translational motion. Translation biases the perception of a walker's facing and walking direction (Masselink & Lappe,
2015), as well as the perceived action (Thurman & Lu,
2016). Translation also causes PL walkers to appear as animate actors (Thurman & Lu,
2013). Because biological motion cues provide information about a walker's movement through a scene, we suggest that they could potentially be used to facilitate the estimation of heading during locomotion towards other walkers.
Prior research on both optic flow and biological motion has used PL stimuli as a way to study the purely motion-based processes involved in perceiving self-motion and the motion of other people (Gibson,
1950; Johansson,
1973; Warren & Hannon,
1988). By removing all other features of the stimuli, the signals available to the visual system are clearly defined and constrained. While this does not address the natural, full-cue situation, it allows a precise investigation of the particular mechanisms that contribute to the perception of natural scenes. This research has shown that both self-motion perception (reviewed in Lappe, Bremmer, & van den Berg,
1999) and biological motion perception (reviewed in Blake & Shiffrar,
2007) are supported by the information in point-light stimuli. These stimuli have helped to explain the perceptual mechanisms underpinning biological motion and optic flow processing. Importantly, research has shown that the mechanisms supporting these two motion percepts are quite different. Self-motion perception relies on an analysis of the invariant pattern of motion vectors of points in the environment that is produced by the moving observer (Lappe & Rauschecker,
1993; Longuet-Higgins & Prazdny,
1980; Perrone & Stone,
1994). Conversely, biological motion perception relies on prior knowledge about the structure and movement possibilities of the human body (Beintema & Lappe,
2002; Giese & Poggio,
2000; Lange & Lappe,
2006) and can be supported by the characteristic movement trajectory of even a single foot point (Chang & Troje,
2009a; Mather, Radford, & West,
1992; Troje & Westhoff,
2006).
How the visual system processes the combination of motion components produced by concurrent biological motion and optic flow can also be studied using PL stimuli. Locomotion towards a PL walker in an otherwise dark environment produces a stimulus that is an ambiguous combination of self-motion and walker motion. That is to say that all motion vectors in the scene correspond to some combination of walker movement and observer translation. In this case, the visual system is faced with the task of disentangling the visual motion produced by the observer, the motion produced by the other person, and the motion produced by that person's appendages. This cannot be achieved without some access to biological motion perception. The question we ask is whether the biological motion perception system provides such information.
Though one might argue that heading estimation in the presence of biological motion represents an overly specific situation, it is one of the more frequently encountered sources of external motion in natural environments. Consequently, the brain has evolved a specialized visual network for biological motion processing, which is distinct from the network responsible for heading detection. The superior temporal sulcus (STS) is often cited as the key region involved in biological motion processing (Grossman & Blake,
2002; Grossman et al.,
2000). In addition, there is evidence that the biological motion network also recruits areas involved in both form processing, such as the extrastriate body area (Downing, Jiang, Shuman, & Kanwisher,
2001; Grossman et al.,
2000), the fusiform body area (Michels, Lappe, & Vaina,
2005; Schwarzlose, Baker, & Kanwisher,
2005) and the occipital face area (Grossman & Blake,
2002; Michels et al.,
2005), and motion perception, for example MT and the kinetic occipital area (Grossman et al.,
2000).
While optic flow and biological motion stimuli share some common features, the mechanisms responsible for their processing are largely independent. As mentioned previously, heading is derived from the pattern of image motion on the retina, and is invariant to the particular objects in the visual field (Geesaman & Andersen,
1996; Logan & Duffy,
2006), while biological motion relies heavily on the specific form of the human body (Beintema & Lappe,
2002; Giese & Poggio,
2000; Hoffman & Flinchbaugh,
1982; Lange & Lappe,
2006). Given that biological motion and optic flow processing employ largely separate neural networks and operate using distinctly different mechanisms, it is reasonable to suppose that they do not interfere with one another during locomotion.
In theory, it is possible for biological motion to be processed in parallel with the optic flow field and combined to facilitate optic flow decomposition. We suggest that biological motion could aid in the computation of optic flow in a number of ways. For example, biological motion cues could signal the presence of nonrigid motion in the environment that is independent from the observer's self-motion. This is plausible, given that previous research has shown biological motion is more likely to be perceived as animate than nonbiological motion (Chang & Troje,
2008; Thurman & Lu,
2013), and that the detection human motion is more efficient than the detection of mechanical motion in natural scenes (Mayer, Vuong, & Thornton,
2015,
2017). Another possibility is that knowledge about biological motion in the scene could be used to estimate the translation of the oncoming person and reduce its impact on the estimation of self-motion, consequently improving estimates of heading.
The current study investigates the perception of self-motion in the presence of oncoming biological motion. In four experiments observers were presented with a stimulus that displayed a PL walker moving towards the observer, while at the same time self-motion of the observer was simulated towards the walker. Other than the PL figure, the scene was empty. Because the scene lacked additional rigid environmental information, the available visual input represented an ambiguous composite of self- and biological motion. As such, heading and self-motion could only be accurately estimated if biological motion cues were used to detect the walker's motion as separate from the observer's translation. Thus, these stimuli provide a means for studying how the visual system decomposes scenes of complex visual motion.
In the first two experiments, observers were required to determine whether or not the stimulus contained self-motion. In a third task observers were required to report whether they perceived walker motion, self-motion, or a combination of both. The purpose of these experiments was to ascertain whether or not self-motion and walker motion can be separately identified in ambiguous situations based on biological motion cues. In the fourth task, observers were required to estimate their heading direction. The aim of this final experiment was to assess the accuracy with which observers were able to determine heading based on the combined biological and self-motion stimulus.