However, while saliency models only predict the spatial fixation distribution when viewing still images, the fixations of a scanpath are known to be highly dependent on each other. Oculomotor biases influence saccade amplitudes and directions, but also task and memory can affect the order in which image regions are scanned and whether a certain image region is explored at all.
Scanpath models try to take these effects into account. By predicting not only spatial fixation locations (e.g., by the means of a saliency map) but also whole scanpaths of fixations, they can model the effect of earlier fixations in a scanpath on later fixations and therefore the exploration behavior. While the field of scanpath modeling has not received as much attention as the field of saliency modeling, recent years have seen a substantial number of models of scanpath prediction, mostly focused on free-viewing scanpaths (see
Kümmerer & Bethge, 2021, for an extensive overview of models of scanpath prediction). The model of
Itti et al. (1998) modeled sequences of fixations via a winner-takes-all (WTA) module that got inhibited after each fixation to encourage a new fixation.
Boccignone and Ferraro (2004) proposed to model scanpaths as a constrained Lévy flight, that is, a random walk where the step length follows a Cauchy–Lévy distribution and therefore is very heavy tailed.
Engbert et al. (2015) and
Schütt et al. (2017) proposed a mechanistic model of scanpaths that implemented an attention and an inhibition mechanism with certain decay times to predict a sequence of fixations.
Le Meur and Coutrot (2016) combined a saliency map with saccade direction and amplitude biases.
Adeli et al. (2017) took inspiration from neuroscience and transformed a retinotopic saliency map to superior colliculus space where the fixation selection was implemented.
Clarke et al. (2017) proposed the saccadic flow baseline for capturing oculomotor biases independent of image content.
Assens et al. (2017) used deep neural networks to predict different spatial fixation distributions depending on the amplitude of the previous scanpath history and combined this with a bias toward short saccades to generate a scanpath.
Xia et al. (2019) built a variational autoencoder model of image statistics over the previous fixations and selected fixations where the internal model had the largest reconstruction error.
Sun et al. (2019) used recurrent neural networks to model attention and suppression to certain spatial and semantic features over a sequence of fixations.
Yang et al. (2020) used inverse reinforcement learning to train a policy in a deep learning model that mimics human scanpaths in visual search.
Schwetlick et al. (2020) extended the attention and inhibition mechanism in the model of
Engbert et al. (2015) to include perisaccadic attentional dynamics.