Abstract
When presented with visual scenes containing multiple items, humans can rapidly organize elements into groups and can estimate the number of items within groups. Here we systematically examine how humans define groups of items and how grouping influences numerical estimation of items. Through behavioral experiments and modeling work, we find that human estimates of groups and items are well-described by k-means clustering algorithm (Fig.1) that is widely used for image segmentation in computer vision. In Experiment 1, we asked subjects to estimate the number of clusters in images of randomly located dots presented for 50-300 msec. Estimates of the number of clusters were stable from as early as 50 msec, and highly consistent across individuals. Next, the model estimated the number of clusters for these same images with a single free parameter for center-to-center distance among items (i.e., clustering threshold). The best-fit clustering threshold was a distance of 4° (Fig. 2) - which is also seen as a critical distance for optimal spatial separation in object tracking tasks [1,2]. In Experiment 2, we asked a different set of subjects to estimate the number of individual dots (not clusters) in these same images. We found that subjects tended to underestimate the number of dots - especially when the image contained many clusters. Comparisons to the model estimates suggested that human subjects discount dots that fall within clusters and linear regression analyses revealed that clusters containing more items yielded more underestimation. Based on our findings, we propose a hierarchical model in which inputs from two interactive levels of representations of items and of clusters in an image together predict human performance on various numerosity tasks. Our work uses behavior and computer vision to begin to reveal how number can be rapidly estimated from brief visual scenes.
Meeting abstract presented at VSS 2013