Abstract
We present a model of perceptual grouping which uses the type of low-level information available to simple and complex cells in V1 to identify likely meaningful image segments. The model is applicable both to natural images as well as artificial displays. It can deliver a hierarchical segmentation of an image, and allows for quite general segmentations: segments may overlap, for instance, or be separated in space. The algorithm neither needs to know in advance the number of groups, nor learn this number. While many segmentation algorithms work on a two-dimensional image space, our model first identifies segments in a higher-dimensional space and then projects back to the original image space.
The model begins by extracting, at each image location, a vector of image features, f, such as luminance, chrominance, and/or local orientation. The image is then represented in a multi-dimensional (x, y, f) space. Blurring in this space with an appropriate kernel creates a function on the n-D space in which connected regions with a higher value contain pixels clustered by a combination of proximity and similarity. Finally, these regions are projected back down to the original (x, y) image space to yield groups in the image. The corresponding 2-D image regions often mimic Gestalt groupings, and in natural images often correspond to connected segments of the same object.
The model is computationally and mathematically simple, particularly compared to current alternative segmentation algorithms. In addition, the major operation necessary for the algorithm, blurring, can easily be modeled by a neural network, and closely parallels the spreading activation behavior found throughout the brain. Finally, the convolution kernel itself can naturally be defined in terms of statistics gathered from segmentations of natural images.