In this study, we used highly engaging videos of people in real world settings and monitored eye movements in response to dynamic events, such as making eye contact, talking, or performing head movements. Previous findings give rise to two competing hypotheses: First, if viewers generally prioritize the eyes, as suggested by much of the static face-viewing literature, then there should be no significant modulation of gaze behavior as a function of depicted dynamic events, such as a person talking or making eye contact. Alternatively, if ongoing dynamic events require sampling different visual features depending on the viewer's goals, we should observe a strong modulation of gaze dependent on the depicted dynamic event. If the latter hypothesis is supported, then we can make several supplementary predictions: (a) Eye contact of an actor with the camera (the viewer) should evoke increased attention toward the face (e.g., Stein et al.,
2011) and possibly the eyes; (b) A talking face should draw gaze toward the mouth partly due to increased movement of visual features that are known to attract attention (see Itti,
2005; Mital, Smith, Hill & Henderson,
2011), but especially due to sampling of mouth movements to support speech comprehension (Buchan et al.,
2007; Lansing & McConkie,
2003); and (c) Finally, we further hypothesize that the informational content of face regions, not the mere saliency of, for example, eyes or moving mouths, predicts gaze behavior (e.g., Birmingham et al.,
2009b; Lansing & McConkie,
2003). If true, then removing the speech signal from the videos should change preferred fixation locations within a face, despite unchanged visual information. Lansing and McConkie (
2003) found that fixations toward the mouth increased when participants watched movies in silence compared to watching movies with low-intensity speech. This was likely due to the specific task instructions in this study, where participants' main aim was to reproduce the words they had seen or heard being uttered. Participants in our study, however, were merely asked to rate each video on a likeability scale. We therefore expected to find a decrease of mouth fixations when the acoustic speech signal was completely removed, since accurate speech perception was not necessary to fulfill the task.