Research on visual perception generally focuses on the representation of overtly visible surface properties. In addition to estimating such properties—i.e., representing what objects in view are and where they are located (e.g., Marr,
1982)—an important goal of the visual system is to
predict how visible objects are likely to behave in the near future. Predicting the physical behavior of objects is, among other things, crucial for the perceptual guidance of motor actions. Consider, for example, the visual guidance of motor actions aimed at intercepting an object in motion, or at catching a precariously balanced object that is about to fall. Predicting the physical behavior of objects in these and other situations requires observers to infer hidden forces acting on objects—e.g., gravity, support, friction—and often to do so from vision alone. Visually estimating the physical stability of objects involves an inference of unseen forces that requires integrating shape information across the entire 3-D object to accurately estimate the object's physical parameters, such as its center of mass. An open question has been how humans make these types of judgments and how their estimates relate to the physical dynamics of the world.
Traditional research on intuitive physics has shown that people often hold erroneous physical intuitions (McCloskey,
1983a,
1983b; McCloskey, Caramazza, & Green,
1980). For example, many people expect that a ball being swung at the end of a string will, if the string breaks, continue moving in a curved trajectory; or that an object dropped from a flying airplane will fall vertically straight down (McCloskey et al.,
1980). On the other hand, our visuomotor interactions with objects in everyday life suggest that we have a good comprehension of physical attributes such as gravity, friction, and support relations. Indeed, subsequent research has shown that people are much more sensitive to violations of physical laws when they view real-time dynamic displays than when they are explicitly asked about their intuitions (Kaiser, Proffitt, & Anderson,
1985; Proffitt & Gilden,
1989). In visuomotor interactions involving catching a falling ball, for instance, McIntyre, Zago, Berthoz, and Lacquaniti (
2001) have shown that, in timing their hand movements, observers take into account acceleration due to gravity in a manner that is consistent with Newton's laws of motion.
Perhaps even more impressive than perceptual predictions involving moving objects are cases where observers can infer the action of underlying forces from a static scene or image. Infants as young as 6.5 months implicitly understand the influence of gravity and expect that objects that are not supported will fall down (Baillargeon, Needham, & DeVos,
1992). By 8 months they can also judge to some extent whether or not a cuboidal object is
adequately supported (Baillargeon & Hanko-Summers,
1990; Baillargeon et al.,
1992). Such judgments about how an object is likely to behave have important implications for judgments of future object behavior. Freyd, Pantzer, and Cheng (
1988) found that adult subjects who were shown a static image of an unsupported object that was previously shown to be supported had a systematic memory distortion when tasked with a same/different judgment—consistent with how the depicted object would behave if its support were in fact physically removed (e.g., in the case of gravity, they misremembered it as being lower than it actually was in the image). Based on this and other evidence, the authors argue that the representation of static scenes includes not just a kinematic component but a dynamic one as well—in other words, a representation of underlying forces.
An ecologically important judgment that relies on the implicit inference of underlying forces is the perceptual estimation of an object's physical stability. Consider the two bottles depicted in
Figure 1: We can readily judge from vision alone that the bottle in
Figure 1a is more physically stable than the one in
Figure 1b. In other words, we naturally expect that the bottle in
Figure 1a would be more resistant to the action of perturbing forces.
Similarly, we can judge in a quick glance that in
Figure 1c, the cup is likely to return to its vertically upright position, whereas the same cup in
Figure 1e will fall over. These expectations make physical sense. All forces acting on an object in a uniform gravitational field can be summarized by a single net force that acts on its center of mass (COM). Hence, when the gravity-projected COM lies within the supporting “base” surface (as in
Figure 1c), the net torque acting on the object causes it to return to its upright position. However, with a large enough tilt—once the gravity-projected COM lies outside the support area, as in
Figure 1e—the object topples over. A natural way to quantify the difference in stability is in terms of the critical angle, i.e., the angle through which an object in a given state of stable equilibrium can be tilted before it will topple over (Cholewiak, Fleming, & Singh,
2013). The critical angle corresponds to a state of unstable equilibrium in which the object is equally likely to fall over or to return to its upright position. As can be seen in
Figure 1d, the critical angle is a function of both the width of the object's base and the height of its COM. The bottle in
Figure 1a is more physically stable because it has a larger critical angle than the bottle in
Figure 1b.
Samuel and Kerzel (
2011) examined the perceptual estimation of balance of 2-D polygonal objects. In one experiment, the shapes consisted of two polygonal parts and rested on a vertex. Subjects adjusted the orientation of the object until it was perceived to be equally likely to fall to the right or to the left—which, physically speaking, requires that its (COM) be vertically aligned with its supporting vertex. The results showed that although subjects could perform this task, they were overly influenced by the eccentricity of the top part of the object, which led to errors in their judgments. In a second experiment, Samuel and Kerzel used polygonal planar objects sitting on a supporting edge, with varying degrees of equilibrium states, and showed that observers' stability judgments exhibited a conservative/anticipatory bias—namely, a bias in the direction of perceiving an object to be unstable, even though physically it would maintain its upright posture. This bias remained even after taking into account subjects' own perceptual estimates of the objects' COMs. The authors propose that subjects' responses may have been guided by a tendency to stay “on the safe side.”
Recent work by Battaglia, Hamrick, and Tenenbaum (
2013) presented a framework for evaluating physical stability in more complex scenes using a rigid-body dynamics simulation; they compared their model's performance to observers' judgments of stability. In their experiments, observers viewed scenes that contained towers of stacked cuboidal objects and were asked to evaluate whether the objects would fall (in scenes that were either physically stable or unstable) or in which direction they would fall (in scenes where the towers would always fall). The authors then used a computational physics simulation—built upon the Open Dynamics Engine and described by them as the Intuitive Physics Engine (IPE)—that incorporated the effect of gravity on the towers to model the scenes. There were three parameters manipulated in the IPE model that controlled state uncertainty (e.g., uncertainly about the object positions), mass densities of the objects, and latent forces (e.g., bumps and vibrations that may have been applied to the scene). Using their simulation, the authors could query the last state of the scene to determine how many of the blocks had fallen, the direction in which they fell, and the distances they fell from their starting point. The predictions from their model had good correspondence with observers' judgments of whether the towers were stable and in which direction unstable towers would fall. However, their model is based upon a rigid-body dynamics simulation engine that, by extension, assumes that the brain has a representation of the full physical state of a scene and is “inverting the physics of the scene” when judging the stability. Therefore, although it provides a framework to describe human performance for scenes with multiple interacting objects, the physics simulation by itself does not directly address the question of how humans make stability judgments based upon vision alone (e.g., based on the shape of a 3-D object).
Barnett-Cowan, Fleming, Singh, and Bülthoff (
2011) investigated whether visual judgments of physical stability incorporate multisensory information and how the perceived stability of a single object can be affected by changes in the observer's gravitational frame of reference. They had observers either sit upright or lie on their left or right side when judging the critical angles of a series of objects. The objects were generated surfaces of revolution with a protrusion that was shifted up or down to alter the center of mass. The authors found a small but statistically significant effect of the shape manipulation, but interestingly, observers made stability judgments that were biased towards the subjective visual vertical—which was biased by their body posture—a finding that suggests that the visual system's estimates of physical stability incorporate multisensory information and may utilize biased internal representations of the gravitational frame.
Cholewiak et al. (
2013) further investigated the influence of 3-D shape on visual estimates of stability by examining how well observers could track the critical angle of an asymmetric 3-D object as a function of the direction in which it is tilted. In addition to a task involving the visual estimation of critical angle in different directions, they used asymmetric matching of object stability to determine what attributes of the asymmetric objects were used when judging overall physical stability: the average critical angle around the circumference or the minimum. They found that observers could track the critical angle as a function of tilt quite accurately and that their stability judgments in the asymmetric matching task were better explained by the asymmetric object's minimum critical angle (i.e., in the direction in which the object was least stable). The results suggested that physical stability is likely represented along a unitary dimension, where objects can be visually judged as more or less stable, despite variations along various shape dimensions.
The current study investigates the visual estimation of physical stability of rotationally symmetric 3-D objects, and its dependence on shape attributes. In the first experiment, we measured observers' perceived critical angles and compared these against the corresponding physical critical angles. As summarized earlier, this method has been successfully employed in recent work to measure perceived stability (Barnett-Cowan et al.,
2011; Cholewiak et al.,
2013).
Experiment 1 examined the perception of physical stability by manipulating the objects' aspect ratios and overall volumes. Changing the aspect ratio provides a simple manipulation of the object's shape, whereas changing the volume allows us to test for size invariance—whether, consistent with physical predictions, visual estimates of critical angle are a function of intrinsic shape.
1
As is clear from the definition of the critical angle, the physical notion of stability involves an object's COM in a fundamental way. Previous research has shown that observers use the COM to visually localize a shape—both in perceptual tasks, such as estimating the separation between two dot clusters (Morgan, Hole, & Glennerster,
1990), and in saccadic localization (Vishwanath & Kowler,
2003,
2004). Some systematic errors in perceptual estimation of the COM have also been reported in specific contexts, however (Baud-Bovy & Gentaz,
2004; Baud-Bovy & Soechting,
2001; Bingham & Muchisky,
1993).
2 Thus, if the estimation of perceived critical angle is found to deviate from the corresponding physical prediction, one must consider the possibility that perhaps a perceptual mislocalization of the COM is at least partly responsible. In
Experiment 2, we therefore used a task involving the perceptual localization of COM for the same set of shapes. We examine whether any misperception in the physical stability of an object (
Experiment 1) can be explained in terms of a corresponding misperception its COM (
Experiment 2).