The goal of the present study was not optimal fixation prediction but rather a comparison of movement saliencies in dynamic and static viewing conditions. Still, we would like to comment on our effect size in relation to previous studies (Carmi & Itti,
2006; Itti & Baldi,
2009; Le Meur et al.,
2007; Marat et al.,
2009; Vig et al.,
2009; Vig et al.,
2012). This is not straightforward because the metrics quantifying saliency differ across studies (Wilming et al.,
2011). Here, we have employed the AUC measure that has become a standard since its introduction into the field (Tatler et al.,
2005). A recent review (Wilming et al.,
2011) combining theoretical and empirical evaluations of several saliency metrics concludes that for the type of the data recorded here AUC is the best choice. Furthermore, methodological preferences, such as the choice of which regions that are not fixated will be compared with fixated ones (Carmi & Itti,
2006; D. Parkhurst et al.,
2002; Tatler,
2007), have a large influence on the value of the saliency metric. Most importantly, the central bias (D. Parkhurst et al.,
2002; Tatler
2007)—the tendency of observers to select relatively more saccade endpoints around the center of the screen—appears as a strong predictor of fixations in the absence of any information about image content (Tatler
2007). Accordingly, Wilming and colleagues (
2011) suggest that the fixation predictability of central bias must be a lower boundary that any saliency measure has to surpass. For instance, the dynamic saliency model that Le Meur et al. (
2007) proposed, albeit better than other saliency models they have tested, performs worse than the central bias. One way to control for the central bias, as is done in the current study, is to select actual and control fixations with identical spatial (Açık et al.,
2009; Einhäuser & König,
2003; Tatler et al.,
2005) and temporal (Vig et al.,
2009; Vig et al.,
2012) distributions. The present bias-controlled dynamic feature AUC of 0.60 is comparable with the saliency of individual static features previously reported (Açık et al.,
2010; Mital et al.,
2011; Tatler et al.,
2005). It is of note here that dynamic feature AUCs reach higher values around 0.70 for those portions of video viewing in which the fixated locations of different participants tend to form tight clusters (Mital et al.,
2011). Carmi and Itti (
2006) use low-resolution movie clips and a percentile metric that, as AUC, is bound between 0.50 and 1 and report values around 0.70 for their dynamic saliency map modeling. Nevertheless, they take as the actual fixation point the maximally salient location in a circle with a radius of 3.15° around the measured saccade endpoint (Carmi & Itti,
2006); accordingly, the real saliency is expected to be lower. The “surprise” measure (Itti & Baldi,
2009) quantifies the unexpectedness of spatiotemporal events in movies using Bayes' theorem. Even though no direct comparisons are made, the surprise measure appears to outperform the previous models of Itti and colleagues (Carmi & Itti,
2006; Peters et al.,
2005, cf. Itti & Baldi,
2009). The most recent spatiotemporal intrinsic dimensionality analysis of Vig and colleagues (
2012) reaches a bias-free AUC of 0.70, which is, to the best of our knowledge, the highest value reported in studies that employ free viewing and similar stimuli. What is common to these dynamic saliency studies is their integration of dynamic and static information. While the Itti lab (Carmi & Itti,
2006; Itti & Baldi,
2009) computes static and dynamic features separately and then pools them, Vig et al. (
2012) measure the saliency of spatiotemporal features. Moreover, in these studies, the features of interest are computed at multiple temporal and spatial scales, and fixation predictability is measured after the information from all scales is combined. Here, we have computed each feature at a single spatial and temporal scale, and the dynamic features were extracted from differences between two frames only. In summary, for studies aimed at optimal fixation prediction, combining dynamic and static information at several temporal and spatial scales appears to be the most fruitful methodology, an idea that our statistical dependence analysis supports.