Abstract
In Rosenholtz et al. (2009), we presented a model framework for Gestalt grouping principles such as grouping by proximity, similarity, and good continuation. Our approach maps image pixels to points in a higher dimensional space which contains the spatial dimensions of the original image and additional dimensions corresponding to relevant feature values. Thus, using luminance as the relevant feature, pixels that are near one another in the image and have similar luminance would map to points that are near one another in higher dimensional space. One can then group these points by blurring and thresholding this higher dimensional space. Changing the scale of this blur groups pixels across varied spatial scales and feature magnitudes. This approach has the advantage that it operates on images, unlike classical approaches which presuppose foreknowledge of discrete visual elements with well-defined parameters. Unfortunately, this framework lacks any system for generating a single coherent analysis of an image. Human perceptual grouping operates across different scales; to develop the framework into a working model, we must determine how the numerous possible groupings of a scene are culled and cohered into a single organization. To address this, we ran a psychophysical experiment to determine how people perform grouping across a range of scales and luminance differences. In the experiment, subjects are shown two random fields of gray dots, one of which contains a subregion which differs from its background in proximity, luminance, or both. Subjects report which of the two images they believe contained a subregion; accuracy reflects the strength of perceptual grouping. Comparing the results of this experiment with the results of our model framework reveals that the set of parameter settings which contribute the final image organization is complex and highly image dependent, with parameter settings rejected for one image providing the predominant grouping in another.
Meeting abstract presented at VSS 2012