Abstract
Due to the fact that objects of interest may be located at many possible locations in the environment, the capacity to efficiently visually search the environment for those objects is an essential visual capacity. The foveated nature of the human visual system requires that humans make fixations in rapid succession to locate targets. In this work, we develop a theory of foveated search for natural images and test the theory on simpler white noise backgrounds. The core of the theory takes advantage of recent work showing that suitably normalized template matching observers detecting targets with position known exactly are approximately optimal for detection in a wide variety of natural background conditions, and that they are consistent with human detection performance. In particular, detectability of targets has been shown to be proportional to a separable product of four factors: the background luminance, contrast, target similarity and partial masking of targets by local contrast. The background masking factors are combined with a foveation factor - detectability is inversely proportional to ganglion cell spacing - to produce detectability maps that are accurate for natural images. For search tasks, the detectability maps can be combined with incoming sensory information and a prior distribution over potential target positions to compute a posterior distribution of target presence at each location in the image. The posterior distribution is then used to select an action such as responding with the target location, responding target absent, or selecting a new fixation location. We tested the theory in white noise backgrounds in a relatively simpler covert search task. Human error rates averaged over spatial location correspond to the theoretical predictions. However, humans make a higher proportion of errors near the fovea, possibly due to underestimation of priors near the fovea or to allocating more decoding resources to the periphery.