Abstract
Given a model of target detectability over eccentricity across the visual field, the choice of next fixation is well predicted via a Bayesian ideal observer when the background is 1/f noise (Najemnik & Geisler, 2005). We have recently extended this work to real-world object targets embedded in natural image backgrounds (Rashidi, Ehinger, Turpin, & Kulik, 2020). The presented work studies whether adding short-term memory to the ideal observer model improves predictions of fixation patterns. We recorded eye movements from 5 observers searching for a person in 18 natural backgrounds. The target subtended 0.96 degrees visual angle and could appear at any of 84 possible locations (0-6.75 degrees eccentricity). We model the target detectability against each background using features extracted from deep convolutional neural networks, pooled over spatial regions increasing in size with eccentricity. We feed the calculated detectability maps to the ideal observer model and predict the fixation locations of the human observers. The model assumes that observers integrate information about target location over all previous fixations, so we represent memory span by limiting the integration to the most recent m fixations. We test m=2,4,6,8,10 and perfect memory. We find that the model with a memory span of 4 fixations best predicts human visual search performance (RMSE = 3.476). This improves on the original model which assumes no memory limitations (RMSE = 4.057, t(17) = 2.921, p = 0.0509). When considering only the backgrounds where the mean number of search fixations is greater than 4, RMSE reduces from 4.879 to 4.07 (t(11) = 2.214, p = 0.0488). This suggests that the visual system does not take the entire search history into account when selecting the next fixation location, as suggested in the Najemnik & Geisler model. Instead, the choice of next fixation is primarily based on the previous few fixations.