Abstract
An understanding of how the human visual system selects and sequences image regions for scrutiny is not only important to better understand biological vision, it is also the fundamental component of any foveated, active artificial vision system. Analysis of the statistics of visual stimuli at observers' point-of-gaze can provide insights into the mechanisms of fixation selection. We recorded the eye movements of 29 observers as they viewed 101 calibrated natural images for 5s each, and attempted to quantify the differences in the statistics of features of image patches centered on human and randomly selected fixations. We studied the statistics of three low-level image features: local patch contrast (RMS), center-surround outputs of patch luminance and contrast, and discovered that the image patches around human fixations had, on average, higher values of each of these features than the image patches selected at random. Center surround contrast showed the greatest difference between human and random fixations, followed by center surround luminance, and contrast. A foveated analysis, in which we accounted for the falloff in visual resolution with increasing eccentricity, resulted in even greater differences between human and random fixations for contrast statistics than previously reported results that did not incorporate foveation. An eccentricity-based analysis of the patches revealed that the influence of these image features was not uniform across different saccade magnitudes. A simple algorithm that selected image regions as likely candidates for fixation based upon a linear combination of these features produced high correlations with fixations recorded from observers.
This material is based upon work supported by the National Science Foundation under Grant No. 0225451 and ITR-0427372.