Abstract
Humans gather high-resolution visual information only in the fovea, therefore they must make eye movements to explore the visual world. The spatio-temporal fixation patterns (scanpaths) of observers carry information about which aspects of the environment are currently relevant.
Most of the recent progress on predicting the spatial and spatio-temporal patterns of human scanpaths has been focused on free-viewing conditions. However, fixations and scanpaths are known to be strongly influenced by the task performed by observers. The purpose of this work is to analyze those influences in a quantitative way.
The DeepGaze III model for scanpath prediction (Kümmerer et al, VSS 2017) has been shown to achieve high performance in predicting free-viewing scanpaths. DeepGaze III extracts features from the VGG deep neural network that are used in a readout network to predict a saliency map, which is then processed in a second readout network together with information on the scanpath history to predict upcoming saccade landing positions. Here, we train different task-specific versions of DeepGaze III on human scanpath data of subjects performing different tasks on the same images (freeviewing, objectsearch, saliencysearch; Koehler et al., JoV 2014). Prediction performances show that the models successfully adapt to the task-specific scanpaths. We find and visualize cases where the model predictions differ substantially for the different tasks. The task-specific models can be used to detect the task of a given scanpath via maximum-likelihood classification. We find that while purely spatial task-specific models (finetuned versions of DeepGaze II) perform above-chance (43%) at task recognition, changing to the scanpath-aware DeepGaze III models improves performance further to 45%.
This quantifies spatial and temporal contributions to task-specific differences in human scanpaths.
In the future, we plan to extend our analysis towards quantifying differences in the way scene content and scanpath history interact in fixation selection in different tasks.