Abstract
PURPOSE Models have been proposed to predict “salient” points in an image, but little progress has been made in relating these points to human fixation patterns (e.g. Itti & Koch, 2000). Part of the difficulty is that eye movements depend on top-down cues such as object knowledge and task in addition to bottom-up stimulus properties. We had subjects learn to recognize novel object silhouettes. The information in the stimulus can be defined in terms of edge orientations. We hypothesize that subjects employ a strategy of sequential information maximization, and propose a dynamic model that computes the most informative fixation sequence. METHODS Novel objects were constructed by superimposing the silhouettes of two inverted objects from the Snodgrass and Vanderwart (1980) dataset. The experiment consisted of 4 learning phases and 4 recognition phases, conducted over 5 consecutive days. We tracked the position of the dominant eye as subjects viewed the stimulus monocularly. MODEL We use entropy to characterize the distribution of possible orientations at each edge point in the stimulus. A polar grid based on the human cortical magnification factor is centered at a possible fixation. The entropy in each bin is computed from groundtruth edge orientations and used to update orientation information at each edge point in the bin. The fixation that maximizes information gain about the stimulus is chosen as the next fixation. RESULTS Cumulative information gain curves show that subjects maintain a consistent strategy for both learning and recognition trials. The model outperforms the human observers in collecting information quickly, but it produces a sequence of fixations that resembles human behavior. CONCLUSION A simple model of sequential information maximization shows promise in describing human eye movements driven by low-level stimulus properties. Extensions of the model can be used to probe the limits of human sensitivity in a learning and recognition task.