To provide useful information about the physical environment, the visual system must generate a reasonably accurate three-dimensional (3D) percept from optical information in two 2D retinal images. The actual 3D scene that gives rise to the images is geometrically underdetermined by this optical information, but the resulting ambiguity can be reduced by combining information from different cues relevant to the same environmental property. Depth cue combination is a topic on which there has been considerable recent research. An important assumption of this research has been that different cues must be in the same units for meaningful combination to take place (Landy, Maloney, Johnston, & Young,
1995). This study explores this assumption empirically by investigating whether ordinal information can influence depth perception when unambiguous metric information is present. The ordinal information comes from the configural cues of convexity and familiarity, important factors in determining figure-ground organization, and the metric information comes from binocular disparity, a potent factor in determining perceived depth.
Figure-ground organization occurs when two adjacent regions in the visual field are perceived as if one region (the “figure”) is nearer to the viewer and shaped by the common edge, whereas the other region (the “ground”) is farther from the viewer and not bounded by the common edge, appearing instead to extend behind the figure. Research on figure-ground organization has focused primarily on identifying “configural” cues: stimulus properties that bias one region of a 2D display to be seen as nearer than the other and as shaped by the common edge (Palmer,
2002; Peterson & Skow-Grant,
2003). It is well known that the region that is more surrounded, smaller, more vertically oriented, higher in contrast, more symmetrical, bordered by more parallel contours, lower in the display, more convex, and more familiar is more likely to be seen as the nearer, figural region (Kanizsa & Gerbino,
1976; Peterson & Gibson,
1994a; Peterson, Harvey, & Weidenbacher,
1991; Rubin,
1915/1958; Vecera, Vogel, & Woodman,
2002). However, geometrical analyses of such factors indicate that metric information cannot be recovered from them. The shape of an occluding contour, for example, cannot specify the distance of either the occluding or the occluded surface; occlusion can only specify which one is closer. Geometrically, configural cues can therefore provide only ordinal information.
Perhaps because of the ordinal nature of configural information, figure-ground perception has been modeled by competitive interactions across an edge (e.g., Peterson, de Gelder, Rapcsak, Gerhardstein, & Bachoud-Lévi,
2000; Sejnowski & Hinton,
1987; Vecera & O’Reilly,
1998). The outcome of this activity is binary: one side (the figure) “wins” and appears shaped by the common edge, whereas the other side (the ground) “loses” and is not shaped by it. In some models (e.g., Sejnowski & Hinton,
1987; Vecera & O’Reilly,
1998), but not all (Peterson,
2003), perceived depth ordering—the figure appearing closer than the ground—is also an outcome of the competition. This reflects the binary nature of standard phenomenological observations about figure-ground perception and is consistent with the geometrically ordinal nature of configural cues.
A very different picture of depth perception emerges from the literature on binocular disparity (Howard,
2002; Howard & Rogers,
2002). Horizontal disparity is a relative depth cue, but it can be interpreted metrically once distance and azimuth have been estimated, and empirical research has shown that metric information is indeed recovered (Backus, Banks, van Ee, & Crowell,
1999). In fact, it has been shown that geometrically available scaling parameters can metrically calibrate many different depth cues (e.g., disparity, motion parallax, and texture).
Recent work on depth cue combination has conceptualized the generation of a depth percept as a problem of statistical inference, specifying how the visual system should infer depth from noisy measurements and prior information. In this view, both the visual system’s estimates of depth implied by various cues (likelihood functions) and by prior information (the prior probabilities) are modeled by probability distributions over metric space. Bayesian models allow optimal combination of such information to predict small, graded changes in depth perception that have been verified experimentally (e.g., Hillis, Ernst, Banks, & Landy,
2002; Hillis, Watt, Landy, & Banks,
2004; Knill,
1998). However, it is unclear how information from configural cues—indeed, from any geometrically ordinal cue—can be incorporated within this framework.
The empirical question we address in this work is whether geometrically ordinal depth information from the configural cues of familiarity and convexity combine with metric information from binocular disparity to influence depth perception. Surprisingly little research has examined this issue. Peterson and Gibson (
1993) reported the most convincing evidence that configural cues affect perceived depth of stereoscopic displays, but they failed to settle the issue. They used stereograms in which adjacent black and white regions shared an edge whose shape suggested a familiar object (e.g., a face or seahorse in profile) on one side and whose binocular disparity suggested that the familiar region was either nearer to or farther from the observer than the unfamiliar region. When disparity suggested that the familiar region was nearer, observers usually reported perceiving two parallel planes separated in depth, with the familiar region in front (
Figure 1a). When disparity suggested that the familiar region was farther, observers frequently reported that the familiar region appeared to be slanted in depth such that it was nearer at the central edge and farther at the outside edge (
Figure 1b). Thus, two strikingly different depth interpretations resulted from the same disparity information, depending on how configural cues were aligned with it. This result therefore supports the conclusion that configural cues can influence perceived depth when a metric depth cue is present.
Unfortunately, disparity information in Peterson and Gibson’s displays was present only at the luminance edges and was ambiguous because many surfaces in depth were geometrically consistent with the displays (Peterson,
2003). The two regions were different widths in each eye, but because they lacked texture, local determination of a disparity signal was impossible except at the edges. Disparity unambiguously specified the position in depth of the central contour and of the two outside edges, but not the ownership of the central contour or the slant of the regions. Such displays are geometrically consistent with either a flat surface extending behind a near surface (see
Figure 1a), or a farther surface slanting forward in depth to the central edge (see
Figure 1b). Peterson and Gibson’s results thus show that configural cues can influence the interpretation of
ambiguous disparity information, but do not indicate what would happen if disparity information were unambiguous. The present experiments were designed to answer this question.