Abstract
Eye movements are guided by both low level (image salience) and high level (semantic salience) factors. To investigate the contribution of low and high level factors in task-based natural scene viewing, we tracked gaze while subjects performed imaginary tasks and as they searched for target objects. Image salience (GBVS) was used to estimate low level salience of each pixel, semantic salience was estimated from the linguistic similarity (LSA) between labeled objects and the search term. In the imaginary task, participants (N=40) observed 10 scenes as if they were performing one of two tasks assigned to each scene. In the search task, participants (N=40) searched 100 scenes for a target object, clicked on it if they located it, or clicked “no object” if the target was absent. We hypothesized that the influence of low and high-level determinants of gaze selection would vary within each trial as subjects grasped the context of the scene. In the imaginary task, we compared ROC values for gaze overlapped with the matching LSA heatmap of the presented task (matched), against ROC values for the same gaze but the alternate task (opposite). Semantic salience at fixated locations was significantly higher for matched gaze and task, compared to opposite gaze and task (t(39) =10.861; p<.001). In the search task, linear mixed model analysis showed a significant interaction between salience type and fixation number (X²(1, N=40) = 47.501, p <.001), where image salience decreased with increasing fixation number. Semantic salience remained constant throughout trials. This suggests that fixations are decreasingly guided by image salience over time, while fixations are steadily guided by semantic salience. Together, these results suggest that semantic salience is a useful predictor of gaze during task related scene viewing.