Abstract
Research on visual attention has been mainly limited to static images, but in everyday life we often rely on a narrative to guide us through dynamically changing scenes. Sometimes the narrative is presented via audio and in special cases as a caption. Narratives can help guide attention, but may require additional processing that increases task demands, or distracts attention from relevant locations when presented as a caption. This study examined how narrative and video interact to drive attention while viewing documentary clips. Video clips (∼120 s each) were cut from 4 documentaries such that no talking heads were present in any frame. Videos were accompanied by narration in the form of either audio or captions. Videos with both audio and captions, or neither, were also tested. In order to motivate viewing, multiple choice tests on content were given after each clip. Captions were strong attractors of gaze. 56% of saccades were devoted to reading captions when no audio was present, and 41% when audio was present (surprisingly large, given that audio made reading of captions unnecessary). Durations of fixations were shorter for reading captions (∼260 ms) than inspecting the video (∼420 ms), regardless of the presence of audio. Fixations made to inspect the video clustered near the scene center when narration was present. In the absence of narration, eye movements were more exploratory; the 2D scatter of saccadic endpoint locations increased by up to 70%. These changes to the spatial distribution of saccades, as well as the adoption of the time-consuming strategy of reading captions even when redundant with audio narration, show that eye movements while inspecting videos are motivated mainly by a cognitive strategy of searching for clues that facilitate the interpretation of viewed events.