Abstract
The eyes do not sample visual scenes uniformly. Behaviourally relevant regions in images are fixated preferentially, both during picture viewing and everyday activities. Of central importance to understanding active visual sampling is the question of how locations in scenes are selected for fixation. To determine the visual characteristics selected for fixation, we recorded saccades while participants viewed natural scenes. We used this data to construct a model that can predict the likelihood that a location in an image is selected for fixation. First we derived descriptions of images in terms of raw luminosity, difference of luminance from the mean, local contrast, chromaticity, and boundary information based on the outputs of odd-symmetric Gabors. These descriptions were used as inputs to construct an optimal linear filter that operates on the images to describe the observed eye movements. An estimate of the optimal linear filter can be derived from simple reverse correlation of fixated and non-fixated image patches. However, this is only valid if inputs are uncorrelated and in this case they were not; there are correlations both between and within the derived image feature descriptions. To calculate the optimal filter with correlated inputs requires a matrix inversion of the input covariance matrix, which can be numerically unstable. We therefore developed a technique we call Bayesian ridge logistic reverse correlation to identify these filters stably. Given a patch of a complex natural image, this model can predict the likelihood that a human observer would fixate the location. Our model shows that the best predictor of saccade target locations is the output of high frequency Gabor patches. Behaviourally, this corresponds to looking at sharply defined edges or surface property boundaries. A mechanism of saccade targeting based on the selection of edges is plausible as this is the least susceptible to global illumination variance of the features investigated.