Interestingly, we found that saliency correlated with human behavior significantly better than monkey behavior, over all clips combined (permutation test,
p ≤ 0.0001). Differences in the likelihood to deploy attention to salient items should be minimized when using monkeys as a model for human attention during free viewing. The saliency differences were, however, small in magnitude compared to the difference in interobserver agreement (
Figure 5). Comparing saliency scores with interobserver agreement may provide insight into a way to reconcile such differences. Although saliency was a strong predictor of human visually guided behavior, the stimulus-driven nature of the model limited its predictive power. The interobserver agreement metric captured aspects of stimulus-driven (saliency) and top-down (context specific) attentional allocation, the latter of which has also been shown to be a significant factor in guiding human gaze shifts in natural scenes (De Graef, De Troy, & Dydewalle,
1992; Neider & Zelinski,
2006; Noton & Stark,
1971; Oliva, Torralba, Castelhano, & Henderson,
2003; Yarbus,
1967). The interobserver agreement metric was the best predictor of human saccadic targets (permutation test,
p ≤ 0.0001). Interestingly, this trend did not hold for monkeys and the interobserver agreement metric was significantly less correlated with monkey gaze shifts than the saliency model (permutation test,
p = 0.0027). That is, the computational saliency model better predicted where one monkey might look than was predicted from the gaze patterns of two to four other monkeys. Any top-down information present in the monkey interobserver agreement metric was insufficient to increase predictability of gaze patterns over a purely stimulus-driven model. Monkey top-down attentional allocation may be completely inconsistent among observers (e.g.,
Figure 1G), leaving saliency to be the best predictor of visually guided attentive behavior.