Abstract
Human visual attention is directed to look at salient stimuli in the environment. Previous models to predict saliency in natural images usually focused on regular-density scenes. What drives attention in a crowd, however, could be significantly different from the conclusions from the regular setting, and it remains unclear how the crowd density of a scene influences the selection of attention in the context of natural complex scenes. To study saliency in crowd, we first constructed a database. We collected a set of 500 natural images with a diverse range of densities, and conducted eye tracking experiments where 16 subjects free-viewed the 500 images (5s per image). Faces have been shown to attract attention strongly and rapidly, independent of tasks, therefore we focused our studies here on human faces, and the selected images with a large range of face densities (i.e., 3~268). Our dataset also provided labels of face regions as well as their attributes including pose and partial occlusion. We investigated the influence of crowd density on a number of variables including low-level features and face related features (e.g., size, local density, pose, and occlusion). Statistical analyses showed that faces attract attention strongly across all crowd levels, yet the importance of faces in saliency decreased as crowd level increased. We also observed that the number of fixations did not change significantly with crowd density, suggesting that only a subset of faces attract attention in crowd. What then are the driving factors to determine which faces (or non-face regions) to look at? Analyses showed that larger faces and faces with smaller local density (i.e., less surrounding faces) attracted attention more strongly. Furthermore, frontal faces and unoccluded ones were found to be more salient. Finally, despite the general conclusions, evidence was found that crowd density modulated the correlation between saliency and features.
Meeting abstract presented at VSS 2014