Abstract
Humans move their eyes approximately three times per second while viewing natural images, between which they fixate on features within the image. What humans choose to fixate can be driven by features within the early stages of visual processing (salient features e.g. colour, luminance), top-down control (e.g. task, scene schemas), or a combination of both. Recent models based on bottom-up saliency have shown that it is possible to predict some of the locations that humans choose to fixate. However, none have considered the information contained within the second-order features (e.g. texture) that are present within natural scenes. Here we tested the hypothesis that a salience map incorporating second-order features can predict human fixation locations when viewing natural images. We collected eye movements of 20 human observers while they viewed 80 high-resolution calibrated photographs of natural textures and scenes. To maintain natural viewing behaviour but keep concentration, observers were asked to study the scene in order to recognize sections from it in a follow-up forced-choice test. Interestingly, human observer eye movement patterns when viewing natural textures do not show the same central bias as with natural scenes. Salience maps were constructed for each image using a Gabor-based filter-rectify-filter model that detects the second-order features. We find that the fixation location predicted by a model that incorporates second-order information does not differ from that of human observers when viewing natural textures. However, when the model is applied to natural scenes, we find that the ability of the model to predict human observer eye movements decreases, due to the failure in capturing the central bias. A further improvement to the model would be to incorporate a mixture of bottom-up salience and top-down input in the form of a central bias, which may increase the performance of the model in predicting human eye movements.