Many objects become unfamiliar or even unrecognizable when they are seen in other than their veridical orientation (e.g., Sumi,
1984; Thompson,
1980). As noted by Rock (
1973), this observation contradicts the early Gestalt theorists who considered that the essential information for the perception of form was contained primarily in the geometrical relationship of the features—that is, perceived form would remain unchanged as long as these internal relationships were maintained. Orientation specificity demonstrates that the stimuli are encoded not just in terms of internal relations but also in relation to external reference frames. Such reference frames can be either egocentric or allocentric (see Howard,
1982 for a review). Egocentric reference frames include the retina, head, and body. Allocentric reference frames include gravity and the visual environment. The reference system that is most important in determining the way in which we perceive an object seems to depend on the stimulus class involved. Here, we consider the perception of faces and biological motion.
The recognition of a face is impaired if it is inverted (see Valentine,
1988 for a review). The inversion effect observed for face perception is due primarily to disruptions of configural rather than featural processing (Freire, Lee, & Symons,
2000). Inversion effects have also been reported for the perception of biological motion (e.g., Sumi,
1984). Biological motion appears to carry at least two distinct inversion effects (Troje & Westhoff,
2006). While there appears to be an inversion effect that is due to impaired processing of the global configural shape of the walker as conveyed by the display's spatiotemporal organization (e.g., Bertenthal & Pinto,
1994), there is also a second inversion effect that is associated with local motion signals of the distal limbs (Chang & Troje,
2009a; Shipley,
2003; Troje & Westhoff,
2006). To this end, Troje and Westhoff (
2006) presented observers with intact and scrambled walker displays in which the dot's spatial organization was perturbed. Significantly, observers could retrieve the facing direction of the walker not only for the coherent displays but also for the scrambled displays, which retained solely local motion information. Moreover, an inversion effect was observed for both intact and scrambled displays. The authors later showed that the cues to direction in the scrambled displays and the associated inversion effect were carried entirely by the local motion of the feet. Subsequent work has shown that the local inversion effect depends on vertical acceleration contained in the foot motion (Chang & Troje,
2009a). Moreover, the mechanisms underlying the perception of global motion-mediated structure and local motion signals are dissociable according to a variety of behavioral characteristics such as sensitivity to masking or susceptibility to learning (Chang & Troje,
2009b).
Few studies have investigated the reference frames in which animate motions and faces are coded. Observers use implicit knowledge about the direction and effects of gravity when interpreting biological and inanimate events (e.g., Jokisch & Troje,
2003; Pittenger,
1985; Runeson & Frykholm,
1981; Shipley,
2003; Stappers & Waller,
1993). Making assumptions with regard to the direction of gravity, however, does not necessarily implicate direct measurements of gravitational acceleration (e.g., via input from the vestibular system). The visual system may simply take advantage of the fact that gravity is typically aligned with egocentric coordinates. Still, there is reason to believe that an allocentric system may be involved for the perception of dynamic events (Bingham, Schmidt, & Rosenblum,
1995; Indovina et al.,
2005; Lopez, Bachofner, Mercier, & Blanke,
2009). Bingham et al. (
1995) found that the recognition of point-light-defined events was stronger for displays that were upright rather than inverted with respect to gravity, regardless of the observer's orientation in space and concluded that point-light events are perceived in relation to a gravitational rather than an egocentric frame of reference. Later findings, however, appear to be inconsistent with this conclusion. Troje (
2003) found that performance on a biological motion task depended only on whether the display was aligned with the observer regardless of the observer's orientation in space, suggesting egocentric coding. A study with infants also suggested that the egocentric reference frame dominates in the coding of animate motions (Kushiro, Taga, & Watanabe,
2007).
In the domain of face perception, early evidence suggested that the egocentric system is also the dominant frame of reference. Kohler (
1940) and Rock (
1988) reported that the recognition of faces presented upright with respect to gravity and the visual environment was impaired for observers with their heads held upside down. The finding of Troje (
2003) that performance on a face recognition task depended only on stimulus alignment with the observer corroborates these early reports. However, recent findings by Lobmaier and Mast (
2007) seem to suggest a role for gravity as a reference frame for the coding of faces.
To our knowledge, no studies thus far have provided a clear experimental distinction between the roles of gravitational and visual environmental reference frames for the perception of biological motion and faces. Moreover, whether the global and local aspects of biological motion are coded in the same reference systems is unknown. In the present study, we teased apart the contributions of three reference frames (egocentric, visual environment, and gravity) by placing observers inside the York University “tumbling room”—a room furnished with strong directional visual cues (e.g., table, chair, drapery) that can be rotated about a horizontal axis by 360 degrees (
Figure 1A).
Inside the room (
Figure 1B), an observer can also be rotated (rolled) independently around the same axis. Using this facility, we investigated the perception of biological motion and faces by creating configurations in which two reference frames were put into conflict (aligned with or opposed to the stimulus) while the third was rendered uninformative by arranging it to be orthogonal to the stimulus. An entirely balanced design resulted in 12 experimental configurations illustrated in
Figure 2. If biological motion and faces are largely coded by egocentric (we do not distinguish between the retina, head, or rest of the body here), gravitational, or visual environmental coordinates, performance should be best when the stimulus is aligned with the respective reference frame.
We investigated the perception of both global motion-mediated structure in biological motion and local motion by manipulating the organization of the walker and the type of mask during a biological motion direction discrimination task. Specifically, the perception of global motion-mediated structure was addressed by placing veridical walkers inside a mask of additional walker dots moving in the opposing direction. This manipulation equated the local motion of the display. Consequently, the task could only be solved by retrieving the global form of the walker. The local aspect of biological motion was addressed by placing walkers that had their individual motion trajectories spatially perturbed (thereby destroying global structure) inside a mask of stationary flickering dots. These displays could thus only be solved based upon local motion cues. Additionally, we investigated reference frames for face perception via a same–different face recognition task previously employed by Troje (
2003).