Visual processing of a static image is characterized by two stages: ambient and focal processing (Trevarthen,
1968). Ambient visual processing typically occurs during the first few seconds of viewing and involves short fixations and long saccades as people extract information from peripheral vision. After a few seconds, viewers shift to focal processing, which involves longer fixations and shorter saccades as viewers extract more detailed information from central vision (Pannasch, Helmert, Roth, Herbold, & Walter,
2008; Unema, Pannasch, Joos, & Velichkovsky,
2005). Although this shift in visual processing modes has been studied extensively while people view static visual images, there has been much less research into whether people shift back and forth from ambient to focal processing while watching dynamic visual stimuli such as movies. We therefore conducted two studies to determine whether the same processes that occur during viewing of static images are maintained during viewing of dynamic stimuli.
Evidence that people shift from ambient to focal visual processing during the course of viewing a static picture has accumulated over the past century. In 1935 Buswell (as cited in Unema et al.,
2005) found that the duration of fixations increased over the time course of viewing a picture, and Antes (
1974) found that fixation duration increased and saccade amplitude decreased over time as participants viewed a static image. Later, Irwin and Zelinsky (
2002) found that fixation duration increased over 15 s of viewing a static image. Unema et al. (
2005) found that during the first few seconds of viewing a static image of a room, participants displayed larger saccade amplitudes and shorter fixation durations, whereas after a few seconds of viewing, participants displayed smaller saccade amplitudes and longer fixation durations. Unema et al. (
2005) suggested that the larger saccade amplitudes and shorter fixation durations during the early viewing period represented ambient processing and that the smaller saccade amplitudes and longer fixation durations during the later viewing period represented focal processing. Providing further support for the distinction between ambient and focal visual processing, Velichkovsky, Joos, Helmert, and Pannasch (
2005) found that viewers were more likely to remember objects when fixation duration was long and when fixation duration was short but the subsequent saccade amplitude was short, compared to when fixation duration was short and the subsequent saccade amplitude was long. They suggested that long fixations and small saccade amplitudes represent focal processing of detailed information about the objects, allowing for better subsequent recognition of these objects. Furthermore, in a series of four studies using naturalistic scenes, participants displayed clear evidence of a shift from ambient to focal processing, with longer saccades and shorter fixations during the early viewing period and shorter saccades and longer fixations during the later viewing period (Pannasch et al.,
2008).
The limited evidence regarding ambient and focal processing in dynamic stimuli suggests that viewing patterns display a similar shift during the first few seconds of viewing. Velichkovsky, Rothert, Kopf, Dornhöfer, and Joos (
2002) studied patterns of eye movements while participants engaged in a driving simulation task. They found that when participants detected a hazard, fixation duration increased, suggesting that participants engaged in more focal processing at the times when detailed information was necessary to avoid an accident. Providing further evidence that such shifts occur during viewing of dynamic stimuli, Smith and Mital (
2013) compared viewing of dynamic to static scenes and found that participants displayed shorter fixations and larger saccade amplitudes during the first second of viewing of both types of scenes and then shifted to longer fixations and shorter saccade amplitudes during the remainder of viewing.
In contrast, Krejtz, Szarkowska, Krejtz, Walczak, and Duchowski (
2012) found the opposite viewing pattern in children watching animated educational videos. They found that in a 4-s period during which a specific concept was presented, children displayed longer fixations and shorter saccades during the first two seconds of viewing and then shifted to shorter fixations and longer saccades during the third second of viewing. The authors suggested that this shift represents a shift from a focal to an ambient processing mode while viewing the material related to the concept. This pattern is the opposite of the one we have described previously. One possibility is that because this segment was taken from the middle of their educational video, it was not necessary for the children to engage in ambient exploratory processing when the 4-s period began. In addition, the authors focused on only one time period during the movie rather than investigating whether the children shifted back and forth between ambient and focal processing throughout the movie.
When studying shifts in processing mode during viewing of static images, the onset of processing is easy to identify: It is the time when the stimulus is first presented. Given this reference point, it is possible to demarcate early and late viewing periods in which ambient and focal processing are likely to occur. However, it is more challenging to identify early and late viewing periods in dynamic stimuli such as movies or live action. Although the beginning of a movie provides one starting point, there are not necessarily other clear points during which an ambient mode of processing would be expected. Several studies have examined viewing patterns around edits, at which one continuous run of the camera ends and another begins, as shifts were hypothesized to appear at these locations. Multiple studies have found that within the first few hundred milliseconds of an edit, gaze shifts toward the center of the screen and then quickly shifts to an examination of the periphery (Carmi & Itti,
2006a,
2006b; Mital, Smith, Hill, & Henderson,
2011), suggesting a shift to ambient processing after the edit. In addition, Pannasch (
2014) found that edits during movies of a single actor walking through indoor locations shifted processing from an ambient to a focal mode. However, edits do not occur during naturalistic viewing of live action. Therefore, a different approach to examining shifts between ambient and focal processing is necessary to provide further insight into whether shifts occur during continuous dynamic scenes.
One way to identify points in movies at which viewers might shift between focal and ambient processing is to assess
event segmentation. Event segmentation is the psychological process of parsing an ongoing stream of behavior into meaningful chunks, or events. It can be measured with a unitization task, in which participants are instructed to press a button every time they believe one meaningful unit of activity has ended and another has begun (Newtson,
1973). Previous research has found that the locations of these button presses are reliable within and across participants (Zacks, Speer, Vettel, & Jacoby,
2006). According to
Event Segmentation Theory (Zacks, Speer, Swallow, Braver, & Reynolds,
2007), people experience event boundaries as a side effect of predictive processing during ongoing perception. This account proposes that perception and comprehension systems continuously make predictions about the course of events on the time-scale of a few seconds, and monitor the accuracy of those predictions. Prediction is abetted by a working memory representation of the current event, called an
event model. When features of the environment change, prediction error spikes and the event model is updated. For example, when watching someone make breakfast, people often identify an event boundary when the actor switches from making toast to making eggs, as this change in activity requires people to update their working memory representation of the actor's goal. Because people must update their working memory representation at event boundaries, it is likely that they engage in exploratory eye movements at these points—in other words, that they switch to an ambient processing mode in order to capture as much information from the new event as possible. After this rapid visual exploration, people likely switch to focal processing of the details of the environment in order to make valid predictions about future activity. Focal processing then likely continues until the next event boundary, at which time this updating process starts anew.
Smith, Whitwell, and Lee (
2006) found preliminary evidence for shifts from focal to ambient processing around event boundaries. They found a decrease in saccade frequency a few hundred milliseconds before event boundaries followed by an increase about 100 milliseconds after event boundaries. However, the authors did not provide a time course of the shift throughout multiple seconds preceding and following event boundaries.
The two studies discussed below therefore utilized the event segmentation task to investigate the multisecond time course of shifts in processing mode while participants watched movies of actors engaging in neutral, everyday activities. Participants were hypothesized to shift from focal to ambient processing for the few seconds following event boundaries and then return to focal processing. Specifically, we hypothesized that during viewing of naturalistic movie stimuli, saccade amplitude would increase and fixation duration would decrease during the few seconds following event boundaries and then would return to baseline. In addition, because pupil size has been found to increase during exploratory processing (Jepma & Nieuwenhuis,
2011) and because a preliminary study found effects of event boundaries on pupil diameter (Swallow & Zacks,
2006), we hypothesized that pupil size would increase at event boundaries.
It is important to note that because the event segmentation task requires participants to press a button to identify event boundaries, it is likely that the processes involved in button pressing would contaminate any measures of saccade amplitude and fixation duration obtained during this task. However, there is evidence that people engage in event segmentation whenever they view dynamic stimuli, even when they are completely naïve to the event segmentation task. For example, Zacks et al. (
2001) found brain activity during passive viewing of naturalistic movie stimuli that was time-locked to the event boundaries the same participants identified during a second viewing of the stimuli. Therefore, in the studies reported here, we measured saccade amplitude, fixation duration, and pupil size during an initial passive viewing phase. We then asked participants to segment the movies into meaningful events, and time-locked the oculomotor measures to the event boundaries each participant identified. We hypothesized that, during passive viewing, shifts from focal to ambient processing would be time-locked to event boundaries.