Figure–ground organization refers to the visual perception that a contour separating two regions belongs to one of the regions. Recent studies have found neural correlates of figure–ground assignment in V2 as early as 10–25 ms after response onset, providing strong support for the role of local bottom–up processing. How much information about figure–ground assignment is available from locally computed cues? Using a large collection of natural images, in which neighboring regions were assigned a figure–ground relation by human observers, we quantified the extent to which figural regions locally tend to be smaller, more convex, and lie below ground regions. Our results suggest that these Gestalt cues are ecologically valid, and we quantify their relative power. We have also developed a simple bottom–up computational model of figure–ground assignment that takes image contours as input. Using parameters fit to natural image statistics, the model is capable of matching human-level performance when scene context limited.

*p*. Size(

*p*) is defined as the log ratio of the areas of the two regions. LowerRegion(

*p*) is defined as the cosine of the angle between the line connecting the center of masses of the two regions and the vertical direction given by the camera orientation. In contrast to measuring the angle of the boundary tangent at the center point, LowerRegion(

*p*) incorporates information over the entire analysis window. The convexity of a single region is computed as the fraction of point pairs in a region for which a straight line connecting the two lies completely within the region. Convexity(

*p*) is then given by the log ratio of the two region convexities.

*r*= 5% contour length) for 50,000 points sampled from the human-labeled boundaries. We plot only distributions for positive values of each cue. Because every boundary point contributes two values of equal magnitude and opposite sign, the distributions of negative values are identical with the roles of figure and ground reversed. Note that the marginal distribution of contour orientations is not uniform. The greater prevalence of horizontal (LowerRegion = 1) and vertical (LowerRegion = 0) boundaries is consistent with previous results on the statistics of brightness edges in natural images (Switkes, Mayer, & Sloan, 1978).

*p*) = log(Area

_{1}/Area

_{2}) = 0, they are equally likely to be figure. When one region is larger, Size(

*p*) > 0, it is more common that the larger region is ground. All three cues uniformly differentiate figure and ground on average, in agreement with psychophysical demonstrations of the corresponding Gestalt cues (Kanizsa & Gerbino, 1976; Metzger, 1953; Rubin, 1921; Vecera et al., 2002). At 5% contour length, we estimate the mutual information (Cover & Thomas, 1991) between each cue and the true label to be 0.047, 0.075, and 0.018 bits for Size, LowerRegion, and Convexity, respectively.

*p,*arranged into vector

**(**

*c**p*) along with a constant offset, and applies a sigmoidal nonlinearity. The classifier outputs a value in [0, 1] that is an estimate of the likelihood that a segment is figural. In the classification setting, we declare a segment to be figure if this likelihood is greater than 0.5. The model parameters

*β*were fit using iteratively reweighted least squares to maximize the training data likelihood (Hastie, Tibshirani, & Friedman, 2001). We also considered models that attempted to exploit nonlinear interactions between the cues, such as logistic regression with quadratic terms and nonparametric density estimation, but found no significant gains in performance over the simple linear model.

*r*= 2.5%, 5%, 10%, and 20% contour length. Of the eight subjects, four were presented with image luminance patches; and four, with segment-only displays. None of the subjects presented with luminance patches had previously seen the images used. For the segment-only display, subjects indicated which side was figural (black or white). In the luminance display, the image patch was overlaid with a red and blue tint to unambiguously specify the contour location.

*p*< 2 × 10

^{−4}for all subjects) as the window radius increased from 5% to 20% contour length.

r | 1 | 2 | 3 | 4 | M |
---|---|---|---|---|---|

2.5% | 65 | 65 | 63 | 60 | 63 |

5% | 64 | 64 | 67 | 64 | 67 |

10% | 63 | 64 | 72 | 71 | 62 |

20% | 72 | 67 | 68 | 72 | 68 |

r | 5 | 6 | 7 | 8 |
---|---|---|---|---|

2.5% | 57 | 70 | 67 | 70 |

5% | 60 | 75 | 72 | 68 |

10% | 81 | 84 | 82 | 82 |

20% | 83 | 89 | 87 | 88 |

Subject | 2 | 3 | 4 | M |
---|---|---|---|---|

1 | 75 | 79 | 74 | 80 |

2 | 80 | 69 | 77 | |

3 | 74 | 85 | ||

4 | 74 |

Subject | 6 | 7 | 8 | M |
---|---|---|---|---|

5 | 74 | 72 | 71 | 67 |

6 | 77 | 78 | 65 | |

7 | 76 | 67 | ||

8 | 65 |