Humans make several eye movements per second, and where they look ultimately determines what they perceive. Consequently, much research over several decades has been devoted to the study of eye movements, but for technical reasons, this research has mostly been limited to the use of static images as stimuli. More recently, however, an increasing body of research on eye movements on dynamic content has evolved. Blackmon, Ho, Chernyak, Azzariti, and Stark (
1999) reported some evidence for certain aspects of the “scanpath theory” (Noton & Stark,
1971) on very simple, synthetic dynamic scenes. Several studies were concerned with modeling saliency, i.e., the contribution of low-level features to gaze control (e.g., Itti,
2005, Le Meur, Le Callet, & Barba,
2007), and found, not surprisingly, that motion and temporal change are strong predictors for eye movements. Tseng, Carmi, Cameron, Munoz, and Itti (
2009) quantified the bias of gaze toward the center of the screen and linked this center bias to the photographer's bias to place structured and interesting objects in the center of the stimulus. Carmi and Itti (
2006) investigated the role of scene cuts in “MTV-style” video clips and showed that perceptual memory has an effect of eye movements across scene cuts. Cristino and Baddeley (
2009) recorded videos with a head-mounted camera while walking down the street; they then compared gaze on these original with that on a set of filtered movies to assess the impact of image features vs. “world salience,” i.e., behavioral relevance. Stimuli obtained with a similar setup, a head-mounted and gaze-controlled camera (Schneider et al.,
2009), were used by 't Hart et al. (
2009). These authors presented the natural video material to subjects either as the original, continuous movie or in a shuffled, random sequence of 1-s still shots. The distribution of gaze on the continuous stimuli was wider than for the static sequence and also a better predictor of gaze during the original natural behavior. In other studies, the variability of eye movements of different observers was analyzed with an emphasis on how large the most-attended region must be to encompass the majority of fixations in the context of video compression (Stelmach & Tam,
1994; Stelmach, Tam, & Hearty,
1991) and enhancement for the visually impaired (Goldstein, Woods, & Peli,
2007). Marat et al. (
2009) evaluated eye movement variability on short TV clips using the Normalized Scanpath Saliency (Peters, Iyer, Itti, & Koch,
2005). Comparing the viewing behavior of humans and monkeys, Berg, Boehnke, Marino, Munoz, and Itti (
2009) found that monkeys' eye movements were less consistent with each other than those of humans. Hasson, Landesman et al. (
2008) presented clips from Hollywood movies and everyday street scenes to observers while simultaneously recording brain activation and eye movements; both measures showed more similarity across observers on the Hollywood movies (particularly by Alfred Hitchcock) than on the street scenes. However, when playing the movies backward, eye movements remained coherent whereas brain activation did not.