Identifying the visual cues that determine relative depth across an image contour (i.e., figure-ground organization) is a central problem of vision science. In this paper, we compare flat cues to figure-ground organization with the recently discovered cue of extremal edges (EEs), which arise when opaque convex surfaces smoothly curve to partly occlude themselves. The present results show that EEs are very powerful pictorial cues to relative depth across an edge, almost entirely dominating the well-known figure-ground cues of relative size, convexity, shape familiarity, and surroundedness. These results demonstrate that natural shading and texture gradients in an image provide important information about figure-ground organization that has largely been overlooked in the past 75 years of research on this topic.

*should*be perceived as figural and closer is based on a general viewpoint analysis showing that, if one side of a shared edge is extremal, that side is more likely to be closer to the observer than the opposite side (see Figure 2C–2F).

*ceteris paribus*rules: the given factor has the stated effect if all other factors are eliminated or neutralized (Palmer, 1999). The joint effect of two or more cues can only be predicted from such rules when they are all consistent in predicting the same bias. If they conflict, it is unclear how the percept is determined. An important goal of the first experiment was to study how the cues of smaller size and 2D edge convexity combine with EEs in determining a single, unified perception of figural depth. It is of particular interest because EEs seem to dominate traditional figural cues completely, at least when they are correctly perceived as EEs (rather than as cast shadows: see below). Previous research on cue combination in grouping of artificially generated dot lattices showed additive effect for various cues (e.g., Kubovy & van den Berg, 2008). Recent research in computer vision has examined the nature of cue integration for FGO in natural images within local apertures (Fowlkes, Martin, & Malik, 2007). They found that size, lower region, and convexity combined linearly in a logistic regression model. For optimal aperture sizes and their sample of images, small size had the best predictive value for perceived figure, followed by lower region and convexity.

*n*

_{1},

*n*

_{2}) of superellipsoids. The equation for a superellipsoid is:

*n*

_{1}= 5.0,

*n*

_{2}= 0.25) for concave edges, (

*n*

_{1}= 0.25,

*n*

_{2}= 0.25) for straight edges, and (

*n*

_{1}= 1.0,

*n*

_{2}= 0.5) for convex edges. The shared contour was an EE on the shaded side for the matte and specular displays. Adobe Photoshop was used to make 5.0 × 5.0 cm images by putting these convex surfaces next to flat, homogeneous, 50% gray regions. A 6.5 × 6.5 cm square annulus of random noise surrounded this image to differentiate it from the black screen on which the displays were presented during the experimental trials. The bigger, equal, and smaller sized displays occupied 75%, 50%, and 25%, respectively, of the display.

*p*> 0.08 in all cases). Main effects were present for EE conditions (

*F*(2, 14) = 75.0,

*p*< 0.001), edge convexity (

*F*(2, 14) = 11.7,

*p*< 0.01), and size (

*F*(2, 14) = 8.3,

*p*< 0.01). Although size and edge convexity did not interact reliably (

*F*(4, 28) = 0.53,

*p*> 0.70), EE gradient conditions interacted with both edge convexity (

*F*(4, 28) = 11.9,

*p*< 0.001) and size (

*F*(4, 28) = 4.5,

*p*< 0.01). Because the choice probabilities were so close to 1.0 in several conditions, we performed another ANOVA on the same data following an arcsine transformation (Keppel & Wickens, 2004), but the results were essentially the same as for the raw data.

*c*(

*p*)) after applying a sigmoidal nonlinearity. The output is “0” if the likelihood of the left side being figural is less than 0.5 and “1” if it is greater than or equal to 0.5. The model parameters (

*β*) were fit using the collected data to maximize the log likelihood function. The correct classification rate of each cue and their combination was used as a measure of the predictive power of each cue and their combination.

*β*values therefore estimate the relative weights of each cue in predicting perception of the closer figure. This model is generally consistent with a Bayesian formulation of cue integration (e.g., Ernst & Banks, 2002).

*β*values were 0.35 for size, 1.27 for convexity, 2.52 for EE shading, and 3.76 for EE highlights. The same model was fit to the data for subjects individually, and their

*β*values were analyzed by difference-score

*t*-tests to determine the relative strength of the cue weights. The

*β*estimates for EE highlights were reliably higher than those for EE shading (

*t*(7) = 2.78,

*p*< 0.05), and those for EE shading were reliably higher than for either size (

*t*(7) = 10.43,

*p*< 0.001) or convexity (

*t*(7) = 2.90,

*p*< 0.05), which differed only marginally from each other (

*t*(7) = 2.05,

*p*< 0.08).

*opposite*the shaded side as figure. Gradient cuts arise when the shared edge cuts through the equiluminance contours of a shading pattern instead of being roughly parallel to them (as unoccluded EEs are), thus making it appear as if the 2D flat surface is occluding the shaded surface and hence lies in front of it. We therefore expected that the misaligned displays not only would produce the lowest bias toward seeing the shaded side as closer and figural but that this bias might actually be negative in some cases (i.e., that perception in the misaligned conditions might sometimes be biased toward seeing the shaded side as the farther ground).

*t*(10) = 1.1, 2.3, 3.3, and 2.2, for size, convexity, familiarity, and surroundedness,

*p*< 0.05, one-tailed.

*t*(10) = 12.5, 20.1, 21.1, 17.5

*p*< 0.001, for size, convexity, familiarity, and surroundedness, respectively, as listed in this order from here on). There are also reliable increases in the inconsistent conditions for each cue relative to the corresponding side of the flat displays (

*t*(10) = 14.6, 60.5, 13.0, 6.2,

*p*< 0.001) that are even larger than for the consistent conditions (

*t*(10) = 5.8, 14.6, 8.5, 3.2,

*p*< 0.01). Indeed, inspection of the EE-visible data in Figure 4 shows that the probability of choosing the shaded side when combined consistently with the flat cue was not significantly different from the probability of choosing it when it was combined inconsistently with the flat cue for any cue (

*t*(10) = 0.8, −0.8, 1.1, 1.4,

*p*> 0.05, one-tailed). Fully visible EEs along the shared contour of surfaces of revolution are thus so powerful as a cue to depth and figural status that the consistency of flat configural cues did not produce significant effects.

*t*(10) = 8.2, 14.0, 7.4, 3.9,

*p*< 0.05) and all cues except size in the consistent conditions (

*t*(10) = 4.6, 4.7, 8.6,

*p*< 0.01). The difference between the consistent and inconsistent conditions was significant for all cues (

*t*(10) = 5.6, 4.5, 6.9, 1.9,

*p*< 0.05 one-tailed), indicating that when the steepest part of the EE shading gradient was removed, the flat cues had a measurable influence. The effects of the shading gradients in the EE-occluded conditions were reduced relative to the EE-visible conditions in 7 of 8 conditions (all except convexity; see Figure 4, rows 2 vs. 3). This reduction was greatest for the size-consistent condition, probably because occluding 25 pixels from the small SOR eliminated a larger percentage of the shading gradient than in the other conditions. All of these effects indicate that partly occluding the shading gradient along an EE reduces, but seldom eliminates, the strong bias toward seeing the shaded side as closer and figural.

*t*(10) = 3.6, 3.9, 3.4, 3.7,

*p*< 0.01), and for all consistent conditions except familiarity (

*t*(10) = 4.8, 3.2, 3.1,

*p*< 0.01). These effects may not be due entirely to eliminating EE gradients, however, because gradient cuts (Ghose & Palmer, in preparation; Huggins & Zucker, 2001a, 2001b) are present along the shared edge in the flattened conditions. The flat surfaces may therefore tend to have been seen as closer because the horizontal shading gradients ended abruptly at the edge, as though occluded by the flat region on the other side.

*F*(1, 29) = 22.5,

*p*< 0.001), EE-occluded for an additional 24% (

*F*(1, 28) = 20.2,

*p*< 0.001), consistency for an additional 7% (

*F*(1, 27) = 7.0,

*p*< 0.05), flattened for a further 13% (

*F*(1, 26) = 27.6,

*p*< 0.001), misaligned for 1% more (

*F*(1, 25) = 3.0,

*p*> 0.05), and Convexity for a final 3% (

*F*(1, 24) = 11.6,

*p*< 0.05). The only flat cue that accounted for a significant percentage of variance in the delta-probability data was convexity. It seems that the data obtained for the flat displays with convexity somehow underestimated the strength of convexity as a figural cue. The coefficients of the regression analysis were as follows: EE-visible: 0.55 (

*t*(24) = 12.4,

*p*< 0.001), EE-occluded: 0.43 (

*t*(24) = 9.8,

*p*< 0.001), consistency: −0.28 (

*t*(24) = −7.2,

*p*< 0.001), flattened: 0.25 (

*t*(24) = 5.7,

*p*< 0.001), misaligned: 0.05 (

*t*(24) = .97,

*p*> 0.05), convexity: 0.15 (

*t*(24) = 3.4

*p*< 0.01). The regression model reconfirmed that the presence of a shading gradient corresponding to an EE changed the figural bias of the corresponding side of a bipartite display and that the influence of the shading gradient declines as shading information corresponding to an EE is removed from the displays.

*F*(1, 13) = 8.0,

*p*< 0.05), gradient reduction type, with compressed gradients showing stronger figural biases than occluded gradients (

*F*(1, 13) = 33.9,

*p*< 0.001), and amount of reduction, with smaller reductions producing larger figural biases than larger reductions (

*F*(7, 91) = 7.2,

*p*< 0.001). Although gradient cue type and gradient reduction type did not interact reliably (

*F*(1, 13) = 3.6,

*p*> 0.07), amount of reduction did interact with gradient cue type (

*F*(7, 91) = 5.3,

*p*< 0.001) and with gradient reduction type (

*F*(7, 91) = 8.6,

*p*< 0.001). There was also a small, but significant, three-way interaction (

*F*(7, 91) = 3.1,

*p*< 0.05).

*F*(1, 13) = 4.48,

*p*= 0.05). As predicted, however, there were no significant differences in the magnitude of the figural bias as a function of the amount of compression for either the shading-only condition (

*F*(7, 91) = 2.0,

*p*> 0.07) or for the shading + texture condition (

*F*(7, 91) = 1.3,

*p*> 0.27). Compression presumably had no systematic effect because it did not eliminate any of the EE gradient; it merely increases its slope. (More extreme amounts of compression would eventually have an effect, of course, when the gradient itself can no longer be detected.) For the occluded conditions (Figures 6D and 7D), however, the magnitude of the figural bias decreases significantly for both the shading-only condition (

*F*(7, 91) = 12.5,

*p*< 0.001) and the shading + texture condition (

*F*(7, 91) = 3.0,

*p*< 0.01). The significant three-way interaction among the type of reduction (occlusion vs. compression), gradient type (shading-only vs. shading + texture), and amount of reduction (in pixels) arises because the amount of reduction for the occlusion conditions are greater for the shading-only gradients than for the shading + texture gradients. In both cases, the occlusion curves contained a linear and a quadratic component. For the shading-only conditions, these two components together accounted for 94% of the variance, and for the shading + texture conditions they accounted for 93% of the variance. It is worth noting that the “neutral” or “chance” level of response for the shading-only condition is probably lower than 50% because the flat side contained a high-contrast checkerboard texture, which is more likely to be seen as figural than an untextured gray region with a flat gradient. This means that we cannot claim that the figural bias is eliminated when more than half of the gradient is occluded because chance performance would probably be lower than 50%.

*per se*did not influence the perception of relative depth across the adjacent regional contour. (It is possible that there was an effect of gradient slope that was exactly offset by the reduction in the area covered by the gradient, but this seems unlikely.) Second, when the steepest part of the gradient was eliminated, as in the occlusion conditions, its figural bias decreased monotonically with the amount of gradient occluded. Third, adding the monocular depth cue of a correlated texture gradient to a shading gradient increased the net figural bias, presumably by diminishing alternative interpretations of the shading gradient as resulting from a shadow cast by the adjacent surface. And fourth, the effects of occluding the gradient are more pronounced for the shading-only condition than for the shading + texture condition This interaction may also be due to the texture gradient disambiguating the interpretation of the shading gradient as being a shadow cast by a closer flat region.