Free
Article  |   July 2013
Attentional synchrony and the influence of viewing task on gaze behavior in static and dynamic scenes
Author Affiliations
Journal of Vision July 2013, Vol.13, 16. doi:10.1167/13.8.16
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Tim J. Smith, Parag K. Mital; Attentional synchrony and the influence of viewing task on gaze behavior in static and dynamic scenes. Journal of Vision 2013;13(8):16. doi: 10.1167/13.8.16.

      Download citation file:


      © 2017 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements
Abstract
Abstract
Abstract:

Abstract  Does viewing task influence gaze during dynamic scene viewing? Research into the factors influencing gaze allocation during free viewing of dynamic scenes has reported that the gaze of multiple viewers clusters around points of high motion (attentional synchrony), suggesting that gaze may be primarily under exogenous control. However, the influence of viewing task on gaze behavior in static scenes and during real-world interaction has been widely demonstrated. To dissociate exogenous from endogenous factors during dynamic scene viewing we tracked participants' eye movements while they (a) freely watched unedited videos of real-world scenes (free viewing) or (b) quickly identified where the video was filmed (spot-the-location). Static scenes were also presented as controls for scene dynamics. Free viewing of dynamic scenes showed greater attentional synchrony, longer fixations, and more gaze to people and areas of high flicker compared with static scenes. These differences were minimized by the viewing task. In comparison with the free viewing of dynamic scenes, during the spot-the-location task fixation durations were shorter, saccade amplitudes were longer, and gaze exhibited less attentional synchrony and was biased away from areas of flicker and people. These results suggest that the viewing task can have a significant influence on gaze during a dynamic scene but that endogenous control is slow to kick in as initial saccades default toward the screen center, areas of high motion and people before shifting to task-relevant features. This default-like viewing behavior returns after the viewing task is completed, confirming that gaze behavior is more predictable during free viewing of dynamic than static scenes but that this may be due to natural correlation between regions of interest (e.g., people) and motion.

Introduction
Real-world visual scenes change dynamically. Objects, such as people, animals, vehicles, and environmental elements (e.g., leaves in the breeze), move relative to a static background, and our viewpoint on a scene changes through our own motion. This simple fact may not seem controversial but has been neglected by the majority of previous research into how we process visual scenes. Investigations into scene perception using static photographs have resulted in a substantial understanding of how we attend to static scenes and how this relates to subsequent perception and memory for scene details (for review, see Henderson, 2003; Tatler, Hayhoe, Land, & Ballard, 2011). For example, the magnitude of low-level visual features such as edges is significantly greater at fixated than nonfixated locations during static scene free viewing (Baddeley & Tatler, 2006; Foulsham & Underwood, 2008; Itti & Koch, 2001; Mannan, Ruddock, & Wooding, 1995; Tatler, Baddeley, & Gilchrist, 2005), but viewing task can significantly alter which semantic features of a scene we fixate (Buswell, 1935; Castelhano, Mack, & Henderson, 2009; Mills, Van der Stigchel, Hollingworth, Hoffman, & Dodd, 2011; Yarbus, 1967). However, comparable research into the same questions for more naturalistic dynamic scenes is only in its infancy (Carmi & Itti, 2006a, 2006b; Dorr, Martinetz, Gegenfurtner, & Barth, 2010; Itti, 2005; Le Meur, Le Callet, & Barba, 2007; Ross & Kowler, 2013; see Smith, 2013, for review). The work presented here endeavors to extend our understanding of gaze behavior in dynamic scenes by (a) examining the differences between gaze behavior in dynamic and static versions of the same scene and (b) identifying the influence of viewing task on gaze behavior in both types of scene. 
Initial research into gaze behavior during dynamic scene viewing has revealed that the way we attend to a scene in motion may differ from how we attend to static scenes. In static scenes, fixations from multiple viewers have exhibited prioritization of some features of a scene over others (e.g., faces and foreground objects), although not at the same time (Mannan et al., 1997). The lack of temporal demands on when scene features are relevant to the viewer results in a large degree of disagreement in where multiple viewers attend. This variability conflicts with computational models of visual saliency that attempt to predict the gaze location of an ideal viewer from low-level visual features by assuming exogenous (i.e., stimulus-driven) involuntary control of attention (e.g., Itti, 2000; Itti & Koch, 2001). Endogenous (e.g., internal, cognitively driven) factors such as viewing task (Buswell, 1935; Castelhano et al., 2009; Land & Hayhoe, 2001; Yarbus, 1967), knowledge about layout and scene semantics (Henderson, Brockmole, Castelhano, & Mack, 2007; Henderson, Malcolm, & Schandl, 2009; Torralba, Oliva, Castelhano, & Henderson, 2006), and a preference for social features such as people, faces, and the subject of their eye gaze (Birmingham, Bischof, & Kingstone, 2008; Castelhano, Wieth, & Henderson, 2007) have a greater influence on gaze allocation in static scenes, and the inherent individual variability in endogenous control results in greater variability of gaze location over time. 
Such variability is eradicated with the inclusion of motion. When viewing dynamic scenes, the gaze of multiple viewers exhibits a high degree of clustering in space and time, which we refer to as attentional synchrony (Smith & Henderson, 2008). For example, Goldstein, Woods, and Peli (2007) found that for more than half of the viewing time, the distribution of fixations from all viewers while watching movie clips occupied less than 12% of the screen area. Attentional synchrony has been observed for feature films (Carmi & Itti, 2006a, 2006b; Goldstein et al., 2007; Mital, Smith, Hill, & Henderson, 2011; Smith & Henderson, 2008; t'Hart et al., 2009; Tosi, Mecacci, & Pasquali, 1997), television (Mital et al., 2011; Sawahata et al., 2008), video clips with audio narration and subtitling (Ross & Kowler, 2013), and videos of real-world scenes (Cristino & Baddeley, 2009; Smith & Henderson, 2008; t'Hart et al., 2009). Although the degree of gaze clustering is always higher in dynamic than static scenes (Smith & Henderson, 2008), it varies with scene content and compositional factors such as the inclusion of editing (Dorr et al., 2010; Hasson et al., 2008; Mital et al., 2011; see Smith, 2013, for review). In natural real-world scenes, the absence of such compositional factors means that only naturally occurring features such as motion, change, and sudden onsets can capture attention (Wolfe & Horowitz, 2004). 
Recent computational models of visual saliency have significantly increased their ability to predict human gaze location in dynamic scenes by including dynamic visual features such as motion and flicker, i.e., difference over time (Berg, Boehnke, Marino, Munoz, & Itti, 2009; Carmi & Itti, 2006a, 2006b; Le Meur et al., 2007; Mital et al., 2011; t'Hart et al., 2009; Vig, Dorr, & Barth, 2009) and accounting for the large bias of gaze toward the screen center observed during dynamic scene free viewing (see Smith, 2013, for discussion; Tseng, Carmi, Cameron, Munoz, & Itti, 2009). Motion and flicker are some of the strongest independent predictors of gaze location in dynamic scenes, and their independent contributions are as high, if not higher, than the weighted combination of all features in a model of visual salience (Carmi & Itti, 2006b; Mital et al., 2011). However, it would be false to conclude that such correlations between low-level features and gaze are evidence of a causal relationship. In a dynamic scene filmed from a static viewpoint, areas of high motion will coincide with areas of cognitive relevance such as people, animals, vehicles, and the objects acted upon by these agents (as has been shown in static scenes; Cerf, Frady, & Koch, 2009; Einhauser, Spain, & Perona, 2008; Elazary & Itti, 2008). It may be these scene semantics that are being prioritized (i.e., endogenously) by selective attention rather than attention being automatically (i.e., exogenously) captured by motion. 
Such consistency in how multiple viewers attend to a scene has also been observed during real-world interactive tasks such as fixating the inside curve while driving (Land & Lee, 1994) and the point of a knife when cutting a sandwich (Hayhoe, Shrivastava, Mruczek, & Pelz, 2003). Gaze during these naturalistic tasks is closely coupled to the information needed for the immediate goal (Hayhoe & Land, 1999; Land, Mennie & Rusted, 1999). For example, gaze falls only on task-relevant objects during a task but is equally distributed among all objects within the field of view prior to beginning the task (Hayhoe et al., 2003). If an agent were subject to involuntary, bottom-up control by visual salience while interacting with the real world, he or she would never be able to complete the task as areas of the scene with greatest relevance to the behavior are often the least salient (e.g., the bend of a road). Instead, the perceived utility of an object, such as whether or not a pedestrian has collided with you in the past, appears to be more important in deciding whether you fixate that object rather than low-level features such as its trajectory (Jovancevic-Misic & Hayhoe, 2009; Jovancevic-Misic, Sullivan, & Hayhoe, 2006). This mismatch between existing theories of visual salience and the pragmatics of everyday vision has recently received a wealth of discussion (for volumes of collected articles on the topic, see Henderson, 2005; Tatler, 2009). 
However, it is false to assume that the highly volitional nature of gaze control during real-world tasks indicates that a similar override of visual salience will be observed during noninteractive dynamic scene viewing. Real-world behaviors differ from prerecorded dynamic scenes in that the viewer is also an agent within the environment and can partly control the dynamics of the scene through his or her own ego motion, viewpoint changes, and manipulation of objects within the environment. When viewing a prerecorded dynamic scene without a specified viewing task, the optimal viewing strategy may be to couple gaze to points of high motion as these are likely to be sources of the most information. To examine whether this coupling is volitional (endogenous) or due to involuntary capture (exogenous), the traditional method is to dissociate the two factors by altering the viewing task. Most previous demonstrations of attentional synchrony during dynamic scene viewing have used a free viewing or “follow the main character” viewing task (see Smith, 2013, for review). Such instructions may inadvertently bias attention toward the dynamic features of a scene. Indirect evidence of endogenous influences on gaze allocation has come from studies using prolonged or repeated presentations of dynamic scenes. Attentional synchrony decreases over repeated presentation of the same videos (Dorr et al., 2010) and over prolonged presentation of the same dynamic scene (Carmi & Itti, 2006a; Dorr et al., 2010; Mital et al., 2011), suggesting that increasing familiarity and memory for scene content leads to divergent gaze behavior. Recent evidence of optimal anticipation of moments of high salience in dynamic scenes suggests that viewers formulate and act on predictions about the future of visual events (Vig, Dorr, Martinetz, & Barth, 2011). 
Only one study (that we are aware of) has attempted to directly manipulate viewing task during noninteractive dynamic scene viewing. Taya, Windridge, and Osman (2012) presented participants with videos of tennis matches filmed from a static camera position. Participants were initially instructed to free view the clips and then watch a mixture of new and old clips while trying to decide who won the point. After each clip, they also ranked a list of objects (such as ball boy/girl, net, and ball) in terms of how much they thought they looked at each object. Although subjective ratings of attended objects differed significantly across the two viewing tasks, objective measures of intersaccadic intervals (approximately fixation durations) and gaze coherence showed no significant differences across tasks. Repetition of the same clips also failed to show the decrease in attentional synchrony previously observed (Dorr et al., 2010). Taya and colleagues (2012) suggested that the lack of task influence on gaze could be due to the highly constrained nature of a tennis match and the correspondence between points of high salience (e.g., movement) and the areas of interest during both free viewing and identifying who won the point (e.g., the players and the ball). They also presented videos with their original soundtrack introducing audiovisual correspondences that may have cued attention to the same screen locations irrespective of viewing task. 
The present study endeavors to overcome the limitations of Taya et al. (2012) by using less-constrained stimuli and a viewing task that directly dissociates endogenous from exogenous control. Naturalistic videos of indoor and outdoor scenes were presented to two sets of participants either under instruction to free view the clips or to identify the location depicted in the clips (referred to as spot-the-location). Free viewing was used as the comparison for the more specific searchlike spot-the-location task because in static scenes, gaze differences between free view and all other specific tasks (e.g., search, memory, or pleasantness rating) have been shown to be significantly larger than between the specific tasks (Mills et al., 2011). The majority of the clips used in this study were filmed in Edinburgh where the experiment was conducted, and participants were able to tell in a few seconds whether or not they knew the location. To “spot the location,” participants had to ignore dynamic, changeable features of the scene such as people and traffic and focus their attention on the static, permanent features such as buildings and landmarks. Therefore, if participants are able to employ endogenous control, gaze behavior during the spot-the-location task should be less predictable by exogenous factors such as flicker (amount of change), be less allocated to moving objects such as people, and exhibit less attentional synchrony than during free viewing as viewer gaze is distributed to more diffuse and nontemporally defined elements of the scene. To identify the specific contribution of scene dynamics to gaze location, scenes were presented to participants either as videos or as static images. 
Method
Participants
Fifty-six members of the Edinburgh University community participated for payment (17 men; mean age = 22.74 years, minimum = 18 years, maximum = 38 years). All participants had normal or corrected-to-normal vision. Participation was voluntary. The experiment was conducted according to the British Psychological Society's ethical guidelines. 
Apparatus
Eye movements were monitored by an SR Research Eyelink 1000 eye tracker, and a nine-point calibration grid was used. Gaze position was sampled at 1000 Hz. Viewing was binocular, but only the right eye was tracked. The images were presented on a 21-inch cathode ray tube (CRT) monitor at a viewing distance of 90 cm with a refresh rate of 140 Hz and a resolution of 800 × 600 pixels (subtending a visual angle of 25.7° × 19.4°). A chin rest was used throughout the experiment. The experiment was controlled with SR Research Experiment Builder software. 
Stimuli
Participants were presented 26 unique full-color real-world scenes on a black background aligned with the center of the screen. Thirteen of the scenes were static photographs (Windows Bitmap format; 720 × 576 pixels × 24 bit), and 13 were 20-s, silent videos (XviD MPEG-4 format; 720 × 576 pixels and 25 frames per second). All scenes were presented with black borders around the image/video with the scene subtending a visual angle of 23.3° × 18.62°. Scenes were originally video clips, and representative static versions of these scenes were made by randomly choosing a frame 5 to 15 s into the video clips and exporting the frame as a Bitmap. Twenty-one of the scenes were filmed by the authors around Edinburgh using the movie function on a digital still camera. These scenes depicted a variety of contexts including indoor and outdoor locations with varying degrees of activity, number of people, and depth of field (see Figure 1). Scenes were filmed according to the following constraints: The camera was mounted on a tripod and not moved during the filming period, the tripod was randomly located in a scene and not framed in relation to a particular object, all scenes had to contain moving objects (typically people, animals, or vehicles) but not contain any text (e.g., signs) or critical sounds (e.g., dialogue), all scenes were lit with the original lighting found in the location (e.g., natural light [outdoors] or fluorescent lighting [indoors]), and all scenes were focused using the camera's autofocus. 
Figure 1
 
Stills taken from the 26 scenes used in this experiment. For each participant, half of the scenes were presented as static images and half as videos.
Figure 1
 
Stills taken from the 26 scenes used in this experiment. For each participant, half of the scenes were presented as static images and half as videos.
The remaining five scenes were 20-s clips extracted from videos depicting individual people engaged in a task (e.g., washing a car). These scenes adhered to the filming constraints outlined above and were originally used to investigate the perception of events during human action (Zacks et al., 2001). 
Procedure
Participants were randomly allocated to one of two conditions: free viewing or spot-the-location. In the free-viewing condition, participants were told they would be presented 26 short everyday scenes either as still photographs or videos and were instructed to simply look at each scene. A blocked design was used with either a block of 13 still images or a block of 13 videos presented first followed by a block containing 13 scenes of the opposite type. Participants saw only one version of each scene. Balancing the order of blocks and scene types across subjects created four subject groups. An experimental trial took place as follows. First, calibration was checked using a central fixation point presented on the CRT. If gaze position was more than 0.5° away from the fixation point, a nine-point recalibration was performed. Otherwise, the trial began with a randomly positioned fixation cross. A random fixation cross was used in an attempt to minimize the central bias observed in previous studies (Le Meur et al., 2007; Tatler, 2007). Participants were instructed to fixate the cross. The cross was presented for 1000 ms before being replaced with the scene. In the static block, the still photograph was presented for 20 s before being removed and the trial terminated. In the dynamic block, the video began immediately and ended after 20 s. Eye movements were recorded throughout the viewing time. After 13 scenes were presented, the second block began with a recalibration followed by the presentation of the remaining 13 scenes of the other type. 
In the spot-the-location condition, participants were presented the still images and videos using the same trial and block design as in the free-view condition but instructed to try to identify the location depicted in each image/video. As soon as they recognized the location or decided that the location was not familiar, they pressed a button on a controller (Microsoft Sidewinder joypad). They were instructed to respond “as quickly as possible” to encourage them to complete the location task before engaging with the scene in any other way. The scene would then remain on the screen for the remainder of the 20-s viewing time. After each scene was presented, the experimenter asked participants to verbally state if they recognized the location and, if so, where it was. Verbal answers were recorded and scored as correct or incorrect. 
Data analysis
Gaze preprocessing
Eye movement–parsing algorithms such as that provided with the Eyelink eye tracker (SR Research) are incapable of dealing with eye movements during dynamic scenes as they assume a sequence of fixations, saccades, and blinks. During dynamic scenes, the movement of objects relative to the camera and the participant's static viewpoint mean that smooth pursuit movements may also be observed. Smooth pursuits are distinct from fixations as they involve displacement over time while a target object is foveated. They are distinguished from saccades by their relatively low acceleration and velocity compared with saccades. Thresholds of 8000°/s2 and 30°/s are typically used for saccade detection (Stampe, 1993), and although pursuit of objects has been shown to be accurate up to 100°/s, the majority of pursuit movements are thought to occur at considerably slower speeds (Meyer, Lasker, & Robinson, 1985). Traditionally, studies intending to measure smooth pursuit eye movements use targets moving at known velocities and trajectories. As such, the existence of appropriate smooth pursuit movements can be identified by checking for a match between the velocity of the gaze and the target. The task of identifying smooth pursuits during spontaneous dynamic scene viewing is more difficult as information is not known about the target location or velocity. Instead, the velocity and continuity of motion of the eyes irrespective of the stimuli must be used. The MarkEye algorithm provides such functionality and has been used across both human adults and primates (Berg et al., 2009). 
The MarkEye algorithm processes raw X/Y monocular gaze coordinates using a combination of velocity thresholds and a windowed principal component analysis (PCA). The algorithm progresses through several stages. Initially, the raw X/Y data are first smoothed using a low-pass filter (63 Hz Butterworth), and gaze velocities greater than 30°/s are marked as possible saccades. A sliding window PCA is then used to compute the ratio of explained variances (minimum over maximum) for X and Y position. A ratio near zero indicates a straight line movement, and if the velocity is greater than the saccade threshold, a saccade is identified. Ratios close to zero but less than the saccade threshold are identified as possible smooth pursuits or other low-velocity ocular artifacts (e.g., drift, eyelid obscuring the pupil prior to a blink). The remaining data are categorized as fixations. Saccades separated by fixations less than 60 ms in duration are assumed to belong to the same movement and are collapsed together. Finally, saccades with amplitude less than 1° or duration less than 15 ms are ignored and collapsed into adjacent movements. 
The Matlab implementation of the MarkEye algorithm was used to parse raw gaze data generated by the Eyelink 1000 eye tracker. Raw monocular gaze data were extracted for each participant and passed through the MarkEye algorithm. The same parameters were used for parsing the gaze data irrespective of whether participants were viewing static or dynamic stimuli. Any periods identified as tentative smooth pursuits (2.82% of all data), blinks, or classification error (due to tracker error/noise; 14.86% of all data) were removed from subsequent analyses. The properties of spontaneous smooth pursuit movements during naturalistic dynamic scene perception are a topic of great interest, but given that they was not the main focus of the present research, the decision was made to exclude these periods from further analyses. 
All analyses of oculomotor measures presented here are based on only fixations with durations greater than 90 ms to exclude false fixations created by the saccadic hook often observed in infrared eye trackers (Nyström & Holmqvist, 2010). Fixations greater than 2000 ms are also excluded as outliers, as these probably indicate lapses of concentration by participants (Castelhano et al., 2009). These criteria excluded 5.2% of fixations. A total of 70,875 fixations remained for analysis. 
To validate the oculomotor measures identified using the MarkEye algorithm, the raw gaze data were also parsed using the normal Eyelink algorithm (Stampe, 1993). The correlation between mean fixation durations per participant identified by the Eyelink algorithm and MarkEye was highly significant (Pearson correlations; static: r = 0.853, p < 0.001; dynamic: r = 0.864, p < 0.001). However, the mean fixation durations identified by the Eyelink algorithm were on average 35.2 ms longer than those identified by the MarkEye algorithm (static = +28.49 ms, dynamic = +41.91 ms), possibly due to the inclusion of drift and pursuit within Eyelink fixations. This suggests that our use of the MarkEye algorithm has not distorted the data in any way and provides a more conservative estimate of the oculomotor behaviors than the Eyelink algorithm. 
Quantifying attentional synchrony
The most striking difference in viewing behavior between static and dynamic scenes previously reported is the clustering of gaze across individuals (i.e., attentional synchrony; Mital et al., 2011; Smith & Henderson, 2008). Various methods have been proposed for expressing the degree of attentional synchrony during a dynamic scene, including entropy (Sawahata et al., 2008), Kullback-Leibler divergence (Rajashekar, Cormack, & Bovik, 2004), normalized scan path salience (Dorr et al., 2010; Peters, Iyer, Itti, & Koch, 2005; Taya et al., 2012), bivariate contour ellipse area (BCEA; Goldstein et al., 2007; Kosnik, Fikre, & Sekuler, 1986; Ross & Kowler, 2013), and Gaussian mixture modeling (GMM; Mital et al., 2011; Sawahata et al., 2008). Each method expresses slightly different properties of the gaze distribution such as assuming all gaze is best expressed as a single diagonal cluster (BCEA), multiple spherical clusters (GMM), or describing the overall distribution (entropy). However, most methods have been shown to express the variation in attentional synchrony and have also been shown to correlate (Dorr et al., 2010). Here, our interest is in expressing two properties of attentional synchrony: (a) the variance of gaze around a single point and (b) the number of focal points around which gaze is clustered in a particular frame. We therefore decided to use GMM as this represents a collection of unlabeled data points as a mixture of Gaussians each with a separate mean, covariance, and weight parameter. However, this approach requires knowing how many clusters are in the data a priori. Following Sawahata and colleagues (2008), we can discover the optimal number of clusters that explain a distribution of eye movements using model selection (model selection operates by minimizing the Bayesian information criterion; see Bishop, 2007, for further explanation of the algorithm). Alternatively, the number of clusters can be set a priori. If a single cluster is used, the algorithm will model all gaze points for a particular frame using a single Gaussian kernel approximated by a spherical covariance matrix. The closer the covariance of this Gaussian is to zero, the tighter the gaze clustering and therefore the greater the attentional synchrony. For ease of interpretation, the cluster covariance is expressed as the visual angle enclosing 68% of gaze points (i.e., 1 standard deviation of the full Gaussian spherical covariance matrix). When using a single Gaussian cluster, this measure is very similar to BCEA (Goldstein et al., 2007), although it does not assume independent horizontal and vertical variances and instead uses a spherical cluster. As such, our measure can be considered a more conservative estimate of attentional synchrony than BCEA. However, analysis of our data with both BCEA and single Gaussian modeling revealed strong significant correlations between the two measures (static+ free viewing: r = 0.863, p < 0.001; static+ spot-the-location: r = 0.888, p < 0.001; dynamic+ free viewing: r = 0.823, p < 0.001; dynamic+ spot-the-location: r = 0.874, p < 0.001). GMM was used to allow the second-stage analysis of fitting the minimal number of clusters per frame. 
Cluster covariance was calculated for every frame of every scene under the two viewing instructions using CARPE, open-source visualization and analysis software for eye-tracking experiments using dynamic scenes (Mital et al., 2011). Static scene presentations were divided into 40-ms “frames.” 
Flicker
To identify the contribution of low-level visual features to gaze allocation in dynamic scenes, flicker was identified. Flicker (change in luminance over time) has been shown to correlate with other measures of motion such as optic flow and be highly predictive of early gaze locations during dynamic scene free viewing (Carmi & Itti, 2006b; Mital et al., 2011). Flicker in a video filmed from a static viewpoint can be indicative of object motion, changes in environmental lighting, and color or video compression artifacts. For this analysis, flicker was calculated across luminance within a moving window of five frames to minimize the influence of compression artifacts and increase the likelihood of detecting an actual change in the scene. Equation 1 describes the Flicker computation at pixel x,y for time t, using the absolute difference between the current and previous frames luminance values (I) computed from the CIELAB color space and averaged over the last five frames (n = 5).  Raw flicker around fixation on its own is insufficient to identify if gaze is specifically biased toward areas of change in the scene as the entire scene could contain the same amount of flicker. To create a baseline estimate of how much flicker would be around a control sample of gaze points, fixation locations were sampled with replacement from other frames of the same scene and projected onto the current frame. The average flicker within a 2° region around these control locations was calculated for each participant using Equation 1 and then averaged across all scenes and viewing tasks. Using the participant's own fixations from other time points in the same scene controls for individual gaze biases, the central tendency, and compositional biases in each scene (Mital et al., 2011). 
Dynamic regions of interest
To identify the objects fixated during the dynamic scenes, dynamic regions of interest (dROI) were used. As the main objects of interest in this study were people, rectangular regions were created around each person in the scene. Most eye-tracking analysis software provides tools for creating static regions of interest. Given the dynamic nature of our objects, a dedicated software tool, Gazeatron (Smith, 2006), was used. Gazeatron allows dROI to be semi-automatically “puppetted” on top of the moving people, translating and scaling the regions as appropriate to encompass each person. For some of our scenes, this resulted in a large number of small regions (e.g., Scene 12 depicting the lobby of the National Museum of Scotland; Figure 1), whereas for others, only one person was present (e.g., Scene 23 depicting a man doing laundry; Figure 1). These regions were created for both static and dynamic versions of each scene. For the frame depicted in the static version, the regions were identical to the dynamic version of each scene. The coincidence of each fixation with these dynamic regions was then identified resulting in either a “hit” (1) or a “miss” (0) on people for each fixation. To control for the probability of randomly fixating people given the area of the screen they occupy, a control set of dROI equal in number and size to the regions occupied by people was generated for each frame and randomly located on the screen. The intersections of each fixation within these control regions were also identified. Across all scenes, the number of dROI per frame and over time varied widely, but for simplicity, the mean probability of fixating the randomly located regions during static scene viewing will be presented as the control baseline: M = 0.235, SD = 0.027. 
Results
Spot-the-location results
For participants in the free-viewing condition, no response was required. In the spot-the-location condition, participants had to respond as quickly as possible via a button press when they identified the location depicted in the image/video or when they realized that they did not know the location. The mean probability of accurately identifying a location was 0.57 (SD = 0.154) for static scenes and 0.53 (SD = 0.16) for dynamic scenes. The difference was not significant (t < 1). Reaction times (RTs) for recognition/nonrecognition across the two scene types exhibited a significant main effect of whether the scene was recognized (repeated-measures analysis of variance [ANOVA]), F(1, 27) = 9.58, p < 0.01, 2 = 0.262, but no effect of stimulus type (F < 1) or interaction between stimulus type and recognition, F(1, 27) = 2.582, p = 0.120, n.s. This trend toward an interaction can be attributed to mean RTs for nonrecognized scenes (5499 ms, SD = 2453 ms) being significantly longer than recognized scenes (3990 ms, SD = 1740 ms), t(27) = 3.214, p < 0.01, in the dynamic condition but not in the static condition (nonrecognized = 4836 ms, SD = 2175 ms; recognized = 4229 ms, SD = 1890 ms), t(27) = 1.47, p = 0.153, n.s. This delay in recording a failure to recognize dynamic scenes might suggest that participants were distracted by the moving elements of the scene when encountering hard-to-recognize scenes. This difference between static and dynamic scenes will be explored in more detail during analysis of participant eye movements. 
Oculomotor measures
Oculomotor measures (fixation durations and saccade amplitudes) were calculated using the MarkEye algorithm (Berg et al., 2009) and will be presented across both stimulus types (static and dynamic) and viewing task (free view vs. spot-the-location). Unless otherwise stated, gaze data during the spot-the-location task are analyzed only up to the response. 
Fixation durations
Fixation durations have been shown to vary with different viewing tasks, such as reading and scene viewing (Rayner, 1998). There is also some evidence that fixations during scene memorization are significantly longer than during search of the same scenes (Henderson, 2007; Henderson & Smith, 2009; Henderson, Weeks, & Hollingworth, 1999; Mills et al., 2011), although some studies have failed to replicate this difference (Castelhano et al., 2009). Fixation durations have also been shown to increase during static scene presentation (Antes, 1974; Buswell, 1935; Mills et al., 2011; Unema, Pannasch, Joos, & Velichovsky, 2005). To look for stimulus and task influences on fixation durations in the present study, fixations were parsed from the raw data and the mean (Figure 2) and change over time of fixation durations calculated (Figure 3) for each condition. 
Figure 2
 
Mean fixation durations across the two stimulus types (static vs. dynamic scenes; x-axis) and viewing tasks (lines): free view versus spot-the-location before the response. Error bars represent ±1 SE.
Figure 2
 
Mean fixation durations across the two stimulus types (static vs. dynamic scenes; x-axis) and viewing tasks (lines): free view versus spot-the-location before the response. Error bars represent ±1 SE.
Figure 3
 
Mean fixation duration (ms) over time (1000-ms time bins) for each scene type (static = green lines, dynamic = blue) and viewing task (free view = solid lines, spot-the-location = dashed lines). Saccades are analyzed for the full 20-s viewing duration in both tasks and not just up to the response in spot-the-location (as in Figure 2). Error bars represent ±1 SE.
Figure 3
 
Mean fixation duration (ms) over time (1000-ms time bins) for each scene type (static = green lines, dynamic = blue) and viewing task (free view = solid lines, spot-the-location = dashed lines). Saccades are analyzed for the full 20-s viewing duration in both tasks and not just up to the response in spot-the-location (as in Figure 2). Error bars represent ±1 SE.
As can be seen from Table 1 and Figure 2, mean fixation durations during dynamic scene viewing are longer (339 ms) than static scenes (290 ms) during free viewing and during the spot-the-location task (253 ms vs. 231 ms, respectively). These differences are present even though smooth pursuit movements have been separated from fixation durations. A mixed ANOVA comparing scene type (static vs. dynamic) to viewing task (free view vs. spot-the-location) reveals a significant within-subjects effect of scene type, F(1, 54) = 102.67, p < 0.001, 2 = 0.655; a between-subjects effect of task, F(1, 54) = 40.18, p < 0.001, 2 = 0.427; and a significant interaction, F(1, 54) = 13.48, p < 0.001, 2 = 0.20. The interaction is due to a differing effect of stimulus type within the two tasks. During free viewing, the mean difference between fixation durations in dynamic scenes and static scenes is large (48.61 ms), t(27) = 8.07, p < 0.001, compared with a smaller, but still significant, difference during spot-the-location (22.75 ms), t(27) = 6.22, p < 0.001. This suggests that spot-the-location exhibits similar eye movements to previously described scene search tasks (Henderson, 2007; Henderson et al., 1999; Henderson & Smith, 2009; Mills et al., 2011) and exhibits shorter fixations irrespective of whether the scene is static or dynamic. 
Table 1
 
Mean oculomotor measures for participants across two viewing tasks (free view vs. spot-the-location) and static versus dynamic stimulus. Notes: Significance level of paired t tests between static and dynamic are reported as asterisks (* = <.05, *** = <.001).
Table 1
 
Mean oculomotor measures for participants across two viewing tasks (free view vs. spot-the-location) and static versus dynamic stimulus. Notes: Significance level of paired t tests between static and dynamic are reported as asterisks (* = <.05, *** = <.001).
Free view Spot-the-location
Static Dynamic t Test (p value) Static Dynamic t Test (p value)
Fixation duration (ms) 290.68 (46.10) 339.29 (58.02) *** 231.18 (30.16) 253.93 (40.06) ***
Percentage of trial in fixation (%) 72.89 (16.05) 73.32 (17.41) n.s. 61.82 (16.29) 65.62 (14.53) .107
Saccade amplitude (°) 4.07 (0.63) 4.21 (0.64) * 4.91 (0.62) 5.07 (0.57) *
Percentage of trial in saccade (%) 8.02 (2.14) 6.48 (1.92) *** 10.36 (2.93) 9.98 (2.28) n.s.
It is also worthwhile to note that because of the fixed presentation time (20 s) during free viewing, longer average fixations during dynamic scenes will result in less fixations than during static scenes. The same is not necessarily true in the spot-the-location task as the duration used for analysis is dependent on participant response. 
To ensure that the task influence on fixation durations cannot be attributed to differences in when fixations were sampled (fixations during the full 20-s presentation are used for free viewing but only up until the response for spot-the-location), a time-course analysis was performed across the full trial duration. Dividing mean fixation durations into time bins relative to scene onset reveals a clear increase in fixation durations throughout the trial (Figure 3). Fixations during the first second after scene onset are significantly shorter (mean across all conditions = 213.63 ms, SD = 28.14) than all other time periods (all ps < 0.001). These initial short fixations are probably due to initial orienting to points of high interest within the scene and away from the random fixation cross prior to scene onset. What these areas of interest are will be investigated in subsequent analysis. 
To examine the change in fixation durations over time, a mixed ANOVA was performed. Main effects of Time, F(19, 969) = 36.894, p < 0.001, 2 = 0.413; Scene Type, F(1, 51) = 183.06, p < 0.001, 2 = 0.782; and Task, F(1, 51) = 10.863, p < 0.01, 2 = 0.176, were observed. A significant interaction between Type and Task was also observed, F(1, 51) = 7.080, p < 0.01, 2 = 0.122, mirroring the interaction previously reported for the means. An interaction between Time and Scene Type, F(19, 969) = 2.172, p < 0.01, 2 = 0.041, was also observed. This interaction is due to participants' exhibiting a greater increase in fixation durations over time in dynamic scenes (mean increase = 126.7 ms) than static scenes (mean increase = 95.28 ms; see Figure 3). The absence of a return to free-viewing durations by the end of the trial for the spot-the-location fixations is probably due to the fact that all trials are included irrespective of whether participants have made a location response and that those participants still searching are making shorter fixations on average. 
To examine if the increase in fixation durations over time remains significant even when the extremely short fixations of the first second are removed, follow-up ANOVAs were performed within each condition without the first time bin. All main effects of Time remained significant: static + free view, F(18, 468) = 4.276, p < 0.001; static + spot-the-location, F(18, 468) = 3.992, p < 0.001; dynamic + free view, F(18, 468) = 3.497, p < 0.001; dynamic + spot-the-location: F(18, 468) = 5.485, p < 0.001. This increase in fixation durations during dynamic and static scene presentation replicates previous observations in static scenes (Antes, 1974; Buswell, 1935; Mills et al., 2011; Unema et al., 2005), extends it to dynamic scene viewing, and demonstrates that the task and type differences in mean fixation durations are evident within 2000 ms of scene onset and remain throughout scene duration. 
Saccade amplitudes
The other standard oculomotor measure known to show influences of viewing task during scene viewing is saccade amplitude. The average amplitude of saccades typically varies according to the distribution of objects within the visual array (Rayner, 1998). Saccades are generally largest immediately following scene onset and gradually decrease in amplitude over time (Antes, 1974; Buswell, 1935; Mills et al., 2011; Unema et al., 2005) but have been shown to be significantly shorter during search of a scene compared with memorization or free viewing of the same scene (Mills et al., 2011). To look for stimulus and task influences on saccade amplitudes in the present study, saccades were parsed from the raw data and the mean (Figure 4), and change over time of saccade amplitudes (Figure 5) was calculated for each condition. 
Figure 4
 
Mean saccade amplitudes (degrees of visual angle) across the two stimulus types (static vs. dynamic scenes; x-axis) and viewing tasks (lines): free view versus spot-the-location. Error bars represent ±1 SE.
Figure 4
 
Mean saccade amplitudes (degrees of visual angle) across the two stimulus types (static vs. dynamic scenes; x-axis) and viewing tasks (lines): free view versus spot-the-location. Error bars represent ±1 SE.
Figure 5
 
Mean saccade amplitudes (degrees) over time (1000-ms time bins) for each scene type (static = green lines, dynamic = blue) and viewing task (free view = solid lines, spot-the-location = dashed lines). Saccades are analyzed for the full 20-s viewing duration in both tasks and not just up to the response in spot-the-location (as in Figure 4). Error bars represent ±1 SE.
Figure 5
 
Mean saccade amplitudes (degrees) over time (1000-ms time bins) for each scene type (static = green lines, dynamic = blue) and viewing task (free view = solid lines, spot-the-location = dashed lines). Saccades are analyzed for the full 20-s viewing duration in both tasks and not just up to the response in spot-the-location (as in Figure 4). Error bars represent ±1 SE.
As can be seen from Table 1 and Figure 4, mean saccade amplitudes during dynamic scene viewing are slightly larger (4.21°) than static (4.07°) during free viewing and during the spot-the-location task (5.07° vs. 4.91°, respectively). A mixed ANOVA reveals that these slight differences are significant, F(1, 54) = 6.076, p < 0.05, 2 = 0.101, as is the effect of task, F(1, 54) = 31.036, p < 0.001, 2 = 0.365, but there is no interaction (F < 1). Changing the task from free viewing to spot-the-location results in significantly larger saccades (mean difference = 0.847°), t(55) = −5.264, p < 0.001, indicating that the previously observed influence of searchlike tasks on saccade amplitudes (Mills et al., 2011) is independent of whether the stimulus is static or dynamic. 
To ensure the task influence on saccade amplitudes cannot be attributed to differences in when saccades were sampled, a time-course analysis was performed across the full trial duration (Figure 5). Dividing mean saccade amplitudes into time bins relative to scene onset reveals a clear decrease in saccade amplitude over the first few seconds of scene presentation for all scene types and viewing tasks. The relative differences between tasks is also preserved with larger-amplitude saccades at the start of spot-the-location trials and throughout most of the trial duration. This pattern is confirmed by a mixed ANOVA on mean saccade amplitude that reveals a significant main effect of time, F(19, 950) = 14.114, p < 0.001, 2 = 0.22; a main effect of scene type, F(1, 50) = 7.949, p < 0.01, 2 = 0.137; and viewing task, F(1, 50) = 8.492, p < 0.01, 2 = 0.145, but no interactions. The lack of any interactions indicates that the main effects observed in Figure 4 are not due to sampling different time windows in the two tasks. 
Collapsing across scene type and task, post hoc comparisons (Bonferroni corrected) between mean saccade amplitudes at different time points reveal that saccades within the first two time bins (0–1000 ms and 1000–2000 ms) are significantly larger than all others, probably due to the random fixation cross and initial orienting to points of high interest within the scene. What these points are will be explored in subsequent analyses. By the fourth time bin (3000–4000 ms), amplitudes have reached asymptote, and although a slight numerical decrease continues throughout the trial for most conditions, the differences are no longer significant. 
Attentional synchrony
To identify if the degree of attentional synchrony differs across scene type and viewing task, we fit a single spherical Gaussian model around the gaze locations of all viewers for each frame of every scene and viewing task (static scene presentations were divided into 40-ms frames). The cluster covariance expresses the smallest circle enclosing 68% of all gaze locations for that frame (i.e., a radius of 1 standard deviation). Lower clustering values represent a higher degree of attentional synchrony. 
Mean clustering across all scenes is presented in Figure 6 for the entire free-viewing duration and up to the response during spot-the-location. A mixed ANOVA of mean cluster covariance across the two stimulus types and tasks reveals an effect of scene type, F(1, 25) = 54.96, p < 0.001, 2 = 0.687; an effect of task, F(1, 25) = 18.43, p < 0.001, 2 = 0.424; and a significant interaction, F(1, 25) = 26.72, 2 = 0.001. Examining the means (Figure 6), the interaction can be attributed to a decrease in the difference between static and dynamic scenes during the spot-the-location task compared with free viewing. Clusters are smaller during free viewing of dynamic scenes (6.40°, SD = 3.99) than static scenes (7.83°; SD = 3.68°), t(25) = 8.802, p < 0.001, and this difference decreases during the spot-the-location task, dynamic (7.40°, SD = 4.06°) compared with static (7.81°, SD = 3.73°), t(25) = 2.812, p < 0.01. There is no impact of task on gaze clustering during static scenes (t < 1) but a highly significant increase in cluster covariance for dynamic scenes, t(25) = 7.762, p < 0.001. The viewing task appears to have a significant impact on gaze clustering during dynamic scenes but does not completely eradicate the difference between static and dynamic scenes. 
Figure 6
 
Mean gaze clustering (degrees of visual angle as described by a spherical Gaussian kernel with a radius of 1 standard deviation) across all scenes for the two stimulus types (static vs. dynamic scenes; x-axis) and viewing tasks (lines): free view versus spot-the-location. Lower values indicate tighter clustering (i.e., greater attentional synchrony). Error bars represent ±1 SE.
Figure 6
 
Mean gaze clustering (degrees of visual angle as described by a spherical Gaussian kernel with a radius of 1 standard deviation) across all scenes for the two stimulus types (static vs. dynamic scenes; x-axis) and viewing tasks (lines): free view versus spot-the-location. Lower values indicate tighter clustering (i.e., greater attentional synchrony). Error bars represent ±1 SE.
To examine attentional synchrony over the duration of scene presentation, the cluster covariance for each scene in each condition was averaged into 500-ms bins (Figure 7). The resulting time course of attentional synchrony shows the now well-known tendency for greater attentional synchrony (i.e., low cluster covariance) immediately following scene onset (Dorr et al., 2010; Mital et al., 2011; Tseng et al., 2009; Wang, Freeman, Merriam, Hasson, & Heeger, 2012). In our data, the initial large cluster covariance (average across all conditions = 10.48°) is due to the randomly located fixation cross prior to scene onset used to remove the center bias caused by a central fixation cross. However, rather than ameliorate the central tendency, this appears to have caused all viewers to direct their initial saccade toward the screen center irrespective of viewing task or scene type, creating a dip in cluster size 1000 ms following scene onset (mean across all conditions = 6.72°). This central bias can clearly be seen when gaze across the two viewing conditions and scene types are visualized as dynamic heat maps (Videos 1, 2, and 3). Analysis of distance of gaze from screen center indicates a decrease from an initial distance of 206 pixels (SD = 88.67) at scene onset to 106 pixels (SD = 65.2) by 1000 ms. By 2000 ms, the distance to screen center reaches asymptote for each condition: static + free-view = 169.0 pixels (SD = 81.0), dynamic + free-view = 153.1 pixels (SD = 81.1), static + spot-the-location = 173.6 (SD = 77.8), and dynamic + spot-the-location = 171.4 (SD = 75.4). 
Figure 7
 
Gaze cluster covariance over time (divided into 500-ms bins). Scene type is represented by line color (green = static, blue = dynamic) and viewing task by line type (solid = free view, dashed = spot-the-location). Error bars represent ±1 SE for mean across all scenes.
Figure 7
 
Gaze cluster covariance over time (divided into 500-ms bins). Scene type is represented by line color (green = static, blue = dynamic) and viewing task by line type (solid = free view, dashed = spot-the-location). Error bars represent ±1 SE for mean across all scenes.
 
Video 1.
 
Dynamic heat maps representing the distribution of gaze over time for Scene 20 (Edinburgh Princes Street; see Figure 1). The four videos depict the viewing conditions (free view = left column; spot-the-location = right column) and scene types (static = top row; dynamic = bottom row). During free viewing, hotter colors indicate greater synchrony between gaze of multiple viewers. During spot-the-location, attentional synchrony is initially represented as a cold-hot heat map (i.e., blue to greenish yellow), then as participants identify the location (i.e., press the button), they switch to a red-yellow heat map.
 
Video 2.
 
Dynamic heat maps representing the distribution of gaze over time for Scene 19 (Meadows playground). The four videos depict the viewing conditions (free view = left column; spot-the-location = right column) and scene types (static = top row; dynamic = bottom row). During free viewing, hotter colors indicate greater attentional synchrony between the gaze of multiple viewers. During spot-the-location, attentional synchrony is initially represented as a cold-hot heat map (i.e., blue to greenish yellow), then as participants identify the location (i.e., press the button), they switch to a red-yellow heat map.
 
Video 3.
 
Dynamic heat maps representing the distribution of gaze over time for Scene 12 (National Museum of Scotland). The four videos depict the viewing conditions (free view = left column; spot-the-location = right column) and scene types (static = top row; dynamic = bottom row). During free viewing, hotter colors indicate greater attentional synchrony between the gaze of multiple viewers. During spot-the-location, attentional synchrony is initially represented as a cold-hot heat map (i.e., blue to greenish yellow), then as participants identify the location (i.e., press the button), they switch to a red-yellow heat map.
The asymptote in the distance to screen center mirrors the asymptote in cluster covariance (Figure 7). Differences between conditions in cluster covariance emerge by 1500 ms. In free viewing, the clustering for static scenes by 1500 ms is significantly larger (M = 6.92°, SD = 4.46°) than dynamic scenes (M = 6.08°, SD = 4.31°), t(25) = 2.05, p < 0.05, and remains at that level or higher throughout the presentation time. During spot-the-location, there is initially no difference in cluster covariance between static and dynamic scenes (at 1500 ms, static = 7.44°, SD = 4.16°; dynamic = 7.20° pixels, SD = 4.3°; t < 1), but the difference reemerges by 6000 ms (static = 8.17°, SD = 4.45°; dynamic = 7.08°, SD = 4.43°), t(23) = 2.917, p < 0.01, suggesting a return to normal free-viewing dynamic scene behavior. Although only frames from the spot-the-location task occurring before the button press were entered into this analysis, this return to free viewing–like behavior suggests that some participants fail to identify the location but do not indicate this failure with a button press as instructed. Some evidence of this return to free-viewing gaze behavior can be seen in the similarity between the dynamic heat maps for the spot-the-location and free-view conditions prior to the button response (e.g., Videos 1 and 2 before gaze points turn from blue to red). This return to free viewing may explain the remaining difference in overall cluster covariance between dynamic and static stimuli during the spot-the-location task (Figure 6). Irrespective of this failure to follow the task instructions, the initial eradication of the attentional synchrony suggests that instructing participants to identify the location has enabled the viewers to change the way they view the dynamic scene. 
Given the large cluster covariances identified in all conditions other than dynamic free viewing (Figures 6 and 7), the variance in gaze position may be best fit by using a model explained by multiple focal points. Thus, instead of using a single Gaussian cluster to model the distribution of gaze, we use a GMM with a Bayesian information criterion model selection to discover the number of Gaussian clusters required to explain the gaze locations in each frame. This approach removes the assumption that all gaze will be directed toward a single part of the image that is intrinsic to other measures such as BCEA (Kosnik et al., 1986). 
Examining the number of gaze clusters detected by model selection across scene type and viewing task (Figure 8) reveals a shift from a predominance of a large number of clusters in both static viewing tasks to a smaller number of clusters during dynamic spot-the-location and free viewing. Scene type or task have no influence on the proportion of each trial when gaze is optimally described by a single cluster (1 cluster: static + free viewing = 15.22%, SD = 7.67; static + spot-the-location = 13.83%, SD = 7.40; dynamic + free viewing = 11.5%, SD = 8.52; dynamic + spot-the-location = 11.54%, SD = 7.82), F(3) = 1.34, p = 0.266, n.s., although they do when the number of clusters is greater than 1. In most viewing conditions, the percentage of each trial spent with eight clusters is significantly greater than any other number of clusters (all ps < 0.001) except for dynamic + free viewing, in which the time spent in eight clusters (dynamic + spot-the-location = 24.42%, SD = 9.22) does not differ significantly from two clusters (M = 15.27%, SD = 9.05), t(24) = 0.840, p = 0.409, n.s. The greater predominance of a small number of gaze clusters during dynamic free viewing (less than six) confirms the greater attentional synchrony previously reported in this condition (Figures 6 and 7) without assuming that such instances occur only when all gaze is directed to a single part of the scene. 
Figure 8
 
The percentage of each trial in which the spatial distribution of gaze was optimally described as a specific number of clusters (x-axis = number of clusters). Scene type is represented by line color (green = static, blue = dynamic) and viewing task by line type (solid = free view, dashed = spot-the-location). Error bars represent ±1 SE for mean percentage across all scenes.
Figure 8
 
The percentage of each trial in which the spatial distribution of gaze was optimally described as a specific number of clusters (x-axis = number of clusters). Scene type is represented by line color (green = static, blue = dynamic) and viewing task by line type (solid = free view, dashed = spot-the-location). Error bars represent ±1 SE for mean percentage across all scenes.
Instructing participants to spot the location does not change the distribution of gaze clusters in static scenes but does during dynamic scenes (compare the difference between the green lines to the blue lines; Figure 8). The number of clusters increases in dynamic spot-the-location compared with free viewing; frequency of six to eight clusters increases and two to four clusters decrease (all ps < 0.05), resulting in a higher mean number of clusters per scene (dynamic spot-the-location: mean number of clusters = 5.40, SD = 0.51; dynamic free viewing: mean = 4.68, SD = 0.89), t(48) = 3.498, p < 0.001). However, the average number of clusters per scene within the dynamic spot-the-location condition remains lower than static spot-the-location (mean = 5.88, SD = 0.37), t(48) = 3.824, p < 0.001. Instructing participants to spot the location in dynamic scenes significantly decreases attentional synchrony, represented as a larger number of gaze clusters, but does not fully eradicate the differences between static and dynamic scenes. 
Flicker
So far, we have demonstrated significant influences of viewing task on fixation durations, saccade amplitudes, and attentional synchrony in dynamic scenes. This last difference suggests that gaze is much more synchronized during free viewing of dynamic scenes compared with during spot-the-location. What might be causing this? Several dynamic scene-viewing studies have reported a bias toward dynamic elements of the scene, for example, changing pixels (i.e., flicker), motion, and optic flow (Berg et al., 2009; Carmi & Itti, 2006a, 2006b; Le Meur et al., 2007; Mital et al., 2011; t'Hart et al., 2009; Vig et al., 2009). If, as suggested by the results so far, participants are successfully directing their attention away from the dynamic features during the spot-the-location task, there should be a significant decrease in the amount of flicker around fixation compared with free viewing. 
The amount of flicker around fixation during free viewing of dynamic scenes is significantly greater (M = 0.074, SD = 0.008) than would be predicted by chance (control = 0.037, SD = 0.011), t(54) = 11.196, p < 0.001. During the spot-the-location task (Figure 9; spot-the-location + before), this difference disappears (M = 0.034, SD = 0.022), t(54) = −0.625, p = 0.534, n.s., suggesting that gaze is not directed toward changing elements of the scene. 
Figure 9
 
Mean flicker in a 2° region around fixation during free viewing of dynamic scenes (“‘Free view” = green bar) and during spot-the-location task before a response was made (“STL + Before” = middle blue bar) and after (“STL + After” = right blue bar). A baseline rate of flicker around control locations is presented (solid red line = mean flicker, dashed lines = ±1 SE). The control locations control for gaze biases (such as screen center) by being randomly sampled from different frames in the same scene. Error bars represent ±1 SE.
Figure 9
 
Mean flicker in a 2° region around fixation during free viewing of dynamic scenes (“‘Free view” = green bar) and during spot-the-location task before a response was made (“STL + Before” = middle blue bar) and after (“STL + After” = right blue bar). A baseline rate of flicker around control locations is presented (solid red line = mean flicker, dashed lines = ±1 SE). The control locations control for gaze biases (such as screen center) by being randomly sampled from different frames in the same scene. Error bars represent ±1 SE.
Given that previous analyses have revealed what appeared to be a return to free viewing–like behavior during spot-the-location trials in which participants did not respond (Figure 7), it was hypothesized that similar behavior might be observed after the spot-the-location response. To examine this behavior, the flicker around fixation after the spot-the-location response was identified. As can be seen from Figure 9 (right-most bar), the amount of flicker around fixation increases significantly (M = 0.056, SD = 0.024) relative to before the response, t(13) = 2.749, p < 0.05, and is significantly greater than the flicker around control locations, t(54) = 4.417, p < 0.001. This suggests that participants are successfully directing their gaze away from the dynamic elements of the scene while attempting to recognize the location but returning to a default, free viewing–like behavior that involves attending to changing elements of the scene after they have made a response. 
Probability of fixating people
But what are the dynamic objects that participants are attending to? Given that the scenes are filmed from a stationary viewpoint, the only elements that will change are people moving past the camera, traffic, animals, and objects acted upon by these agents or the weather (e.g., leaves blown by the wind). People have been shown to be the target of a significant amount of gaze in static scenes irrespective of viewing task (Birmingham et al., 2008; Buswell, 1935; Castelhano et al., 2007; Yarbus, 1967). Some evidence suggests this is also the case when viewing close-up dynamic scenes of people (Kuhn, Tatler, Findlay, & Cole, 2008; Laidlaw, Foulsham, Kuhn, & Kingstone, 2011; Risko, Laidlaw, Freeth, Foulsham, & Kingstone, 2012; Võ, Smith, Mital, & Henderson, 2012). The attentional synchrony observed during free viewing of our dynamic scenes may be due to all participants' attending to the same people at the same time. However, during the spot-the-location task, attending to people would be detrimental to recognizing the location. 
The probability of fixating people was calculated for each condition by identifying whether each fixation fell within a person dROI and scoring a hit (1) or a miss (0). Averaging this probability across all fixations produced an overall probability of fixating people in each condition. The overall mean probability that a person is fixated in each scene type (static vs. dynamic) and task (free view, spot-the-location before response, and spot-the-location after response) is presented in Figure 10. A mixed ANOVA of fixation probability comparing static/dynamic and free viewing to spot-the-location before the response reveals a significant effect of scene type, F(1, 54) = 32.709, p < 0.001, 2 = 0.377; task, F(1, 54) = 130.7, p < 0.001, 2 = 0.708; and a significant interaction, F(1, 54) = 19.026, p < 0.001, 2 = 0.261. The interaction can be attributed to the absence of a difference between static and dynamic scenes in the spot-the-location task before the response compared with free viewing (Figure 10). During free viewing of dynamic scenes, there is a more than 50% chance of gaze being allocated to people (M = 0.544, SD = 0.072). Gaze also demonstrates a large bias toward people in static scene free viewing (M = 0.420, SD = 0.08), although the probability is significantly less than dynamic scenes, t(27) = 7.394, p < 0.001. This difference is absent before the spot-the-location response (static = 0.289, SD = 0.073; dynamic = 0.306, SD = 0.078; t < 1) even though the probabilities still remain significantly greater than chance (chance = 0.235, SD = 0.027; all ps < 0.001; Figure 10, solid red line). 
Figure 10
 
Mean probability of each fixation falling within a dynamic person region across all scenes for the two stimulus types (static = green vs. dynamic = blue; bars) and viewing tasks (x-axis): free view (left), spot-the-location before response (middle), spot-the-location after response (right). Solid red line indicates chance fixation probability given the size of the people regions. Error bars and dashed red lines represent ±1 SE.
Figure 10
 
Mean probability of each fixation falling within a dynamic person region across all scenes for the two stimulus types (static = green vs. dynamic = blue; bars) and viewing tasks (x-axis): free view (left), spot-the-location before response (middle), spot-the-location after response (right). Solid red line indicates chance fixation probability given the size of the people regions. Error bars and dashed red lines represent ±1 SE.
The difference between static and dynamic scenes returns following the spot-the-location response (Figure 10; right bars): main effect of before versus after response, F(1, 27) = 58.03, p < 0.001, 2 = 0.682; static versus dynamic, F(1, 27) = 8.617, p < 0.01, 2 = 0.242; and a significant interaction, F(1, 27) = 10.532, p < 0.01, 2 = 0.281. The re-emergence of the greater bias toward people in dynamic scenes compared with static after the response can be seen clearly in the mean fixation probabilities: static = 0.34 (SD = 0.066), dynamic = 0.409 (SD = 0.070), t(27) = 4.521, p < 0.001. However, fixation probabilities remain significantly lower than during free viewing: static free viewing versus static spot-the-location after response, t(54) = 4.081, p < 0.001; dynamic free viewing versus dynamic spot-the-location after response, t(54) = 7.122, p < 0.001. 
These results confirm that gaze is predominantly allocated to people during free viewing of dynamic and static scenes, but removing the utility of these fixations via task instruction can result in less gaze to people, even though the scenes remain the same. The influence of viewing task on allocation of gaze to people becomes very clear once time is taken into account (Figure 11). During free viewing, more than half of all first saccades after scene onset are directed to people (1000-ms time bin; Figure 11, top). This is irrespective of whether the scene is dynamic or static. By 2500 ms after scene onset, the probability of fixating people has decreased for static scenes (0.5, SD = 0.178) but remains high for dynamic scenes (0.58, SD = 0.15), t(27) = 2.147, p < 0.05. This difference increases over the duration of each scene as the fixation probability continues to decrease for static scenes but remains high for dynamic. By comparison, gaze during spot-the-location task (before response) exhibits the same initial tendency toward people in both static and dynamic (1000-ms time bin; Figure 11, middle), but this bias is immediately eradicated as the fixation probability decreases to the chance baseline in the following time bin (1500-ms time bin; Figure 11, middle). This initial tendency toward people cannot be attributed to involuntary capture by the motion associated with people as the same tendency is observed in static and dynamic scenes (e.g., compare the dynamic gaze heat maps for frame 19 of Video 2 in both static and dynamic free viewing). It is possible that this initial peak in the probability of fixating people may be due to the collocation of people with screen center and the tendency for first saccades to be directed toward the screen center after the randomly located fixation cross. This central bias and its impact on the probability of fixating people may be responsible for the greater-than-chance mean probability previously reported before the spot-the-location response (Figure 10). This possibility will be explored in subsequent studies without a random fixation cross. 
Figure 11
 
Mean probability of fixating people over time (averaged into 500-ms bins; x-axis) across all scenes for the two stimulus types (static = green dashed lines, dynamic scenes = blue solid line) and viewing tasks (free view = top, before the spot-the-location response = middle, after the spot-the-location response = bottom). Solid red line indicates chance given the size of the people regions. Error bars and dashed red lines represent ±1 SE.
Figure 11
 
Mean probability of fixating people over time (averaged into 500-ms bins; x-axis) across all scenes for the two stimulus types (static = green dashed lines, dynamic scenes = blue solid line) and viewing tasks (free view = top, before the spot-the-location response = middle, after the spot-the-location response = bottom). Solid red line indicates chance given the size of the people regions. Error bars and dashed red lines represent ±1 SE.
As the duration of scene presentation increases, the number of participants who have not responded to the spot-the-location task decreases, introducing a high degree of variance in the fixation probabilities (Figure 11, middle). This has the opposite effect on the number of participants contributing to the after-the-location response group (Figure 11, bottom) and allows the return to “default” viewing behavior to be observed. After the spot-the-location response, the probability of fixating people increases and the difference between static and dynamic scenes returns (Figure 11, bottom), for example, 7000-ms time bin, t(27) = 2.014, p < 0.05. Thus, participants appear to modulate their allocation of gaze to people as a function of its utility to their viewing task. 
Discussion
Attending to a dynamic naturalistic scene entails reconciling the temporal demands of scene features with the intentions of the viewer. Previous investigations into the impact of viewing task on gaze behavior in dynamic scenes have failed to demonstrate any differences in gaze behavior (Taya et al., 2012), lending support to the view that low-level visual features such as motion and flicker are dominant in influencing gaze allocation in dynamic scenes (Carmi & Itti, 2006a, 2006b; Itti, 2005). However, the problem with prior studies investigating the influences on gaze in dynamic scenes is that the semantic features that may be significant to viewers by default during free viewing, such as people, are also sources of low-level salience (e.g., motion). In the present study, we aimed to decorrelate areas of high salience (i.e., motion) from areas of optimal relevance to the viewing task (i.e., static indices of scene location) and compare gaze behavior in this extreme situation. The role of scene dynamics was also identified by comparing gaze behavior between static and dynamic versions of the same scenes. 
The results of the present study revealed a significant impact of viewing task and scene type (static vs. dynamic) across a series of measures of gaze behavior. Basic oculomotor measures showed significant differences between the two viewing tasks for dynamic and static scenes. Across both scene types, fixation durations were longer and saccade amplitudes shorter during free viewing compared with spot-the-location. Both measures showed the characteristic change over time previously reported in static scenes (fixation durations increase and saccade amplitudes decrease; Antes, 1974; Buswell, 1935; Mills et al., 2011; Unema et al., 2005) irrespective of viewing task and scene type. In dynamic scenes, viewer gaze during the spot-the-location task visited more peripheral areas of the scene than during free viewing, resulting in less attentional synchrony and a higher number of gaze clusters. Task did not change attentional synchrony for static scenes. Analysis of where viewers fixated also revealed a significant influence of task. Less gaze was allocated to people in both static and dynamic scenes during spot-the-location and did not coincide with areas of high flicker in dynamic scenes. Overall, these results suggest a large impact of the spot-the-location task on gaze behavior, replicating prior evidence of task influences on static scene viewing (Buswell, 1935; Henderson, 2007; Henderson et al., 1999; Henderson & Smith, 2009; Mills et al., 2011; Yarbus, 1967) and extending this effect into naturalistic dynamic scenes. To the authors' knowledge, this is the first direct evidence of top-down, endogenous control over attention in naturalistic noninteractive dynamic scenes. 
The findings of this study do not suggest that all viewing tasks will result in a significant deviation from free viewing–like gaze behavior. As has been shown in static scenes (e.g., Yarbus, 1967), demonstrating task influences on which parts of a scene are fixated requires that the tasks prioritize different parts of the image. Failure to spatially dissociate the informative areas of an image across tasks may result in no differences between task (Castelhano et al., 2009; Mills et al., 2011), as has been shown by Taya and colleagues (2012) in dynamic scenes. Comparing free viewing (i.e., no task) to a specified task (such as object search, memorization, or, in this case, spot-the-location) has been shown to maximize the gaze differences between tasks in static scene viewing (Mills et al., 2011). Future studies will have to investigate a broader range of viewing tasks with dynamic scenes to see which tasks are specific enough to allow the default viewing strategy of prioritizing dynamic/social features to be overcome. 
The influence of scene dynamics
Our results provide further evidence of interesting differences between gaze behavior in dynamic and static scenes (see Dorr et al., 2010, for preference comparisons). During free viewing, saccade amplitudes were significantly longer in dynamic compared with static scenes, but gaze was more distributed in static scenes. This suggests that gaze spent longer in each location in dynamic scenes after making large saccades to get there. Longer mean fixation durations in dynamic compared with static scenes confirm this, as does the greater attentional synchrony, lower number of gaze clusters, and higher probability of fixating people. The pattern of behavior during free viewing of dynamic scene viewing can be described as a tendency to identify quickly a small number of areas of interest in a scene and devote a large proportion of viewing time to fixating and pursuing objects at these locations. In the naturalistic interior and exterior scenes used in this study, more than half of all fixations coincide with people (54.4%), a significantly greater proportion than would be predicted by chance. A similar bias toward people is observed during free viewing of static scenes (42%), but the shorter fixations and greater screen exploration suggest attention is held less by the people or that it takes less time for interesting information to be encoded for each static person compared with dynamic people. 
At first glance, finding longer fixations in scenes containing motion may seem counterintuitive as dynamic scenes should contain more centers of interest, which change over time, resulting in more competition for gaze. Dorr et al. (2010) also reported similar longer fixations in dynamic versions of naturalistic scenes compared with static versions. However, the influence of scene content was not controlled for as the presentation of scenes across stimulus type was not counterbalanced across participants, meaning that differences in fixation durations for particular scene content and individual viewer differences in mean fixation durations (Castelhano & Henderson, 2008) cannot be eliminated as the reason for differences across stimulus types. Our replication of longer fixation durations when such individual scene and viewer differences are controlled suggests that the simple competition hypothesis of gaze control in dynamic scenes does not hold. In a fixate-move model of eye movement control (e.g., Findlay & Walker, 1999), this would result in greater “move” signals, pulling the eyes to peripheral locations and shorter fixations. However, the video stimuli used in this study updated only every 40 ms (25 fps) and were filmed from a static, human-like vantage point, meaning that most naturally occurring motion (e.g., people, animals, traffic, etc.) could not move far across the screen between frames. Unless new objects entered the screen edge, most scenes' motion involves semipredictable continuation of prior motion (Vig et al., 2009), and once fixated, this motion may remain within the foveal region for several frames. Once a fixated object has moved away from the foveal region, compensatory eye movements such as pursuit or catch-up saccade will occur. A lot of the scenes used in this study also involved relatively stationary people engaging in tasks such as folding laundry (Scene 23; Figure 1), playing on a swing (Scene 19), playing bagpipes (Scene 20), dancing (Scene 4), or milling around at such a distance from the camera that it takes them several frames to change screen location (Scenes 2, 4, 6, 9, 10, 12, 13, 15, 17, 20). Fixating such regions of low-dynamic but high-social significance may result in greater top-down signals to hold fixation and acquire as much information as possible (Findlay & Walker, 1999; Nuthmann, Smith, Engbert, & Henderson, 2010). This hypothesis is supported by the observation of a greater probability of fixating people and more flicker at fixation. The change in gaze behavior during spot-the-location also confirms this hypothesis as gaze is directed away from people to the static elements of the scene, resulting in shorter fixations. 
Attentional synchrony
The presence of attentional synchrony during free viewing of dynamic scenes and its near eradication by viewing instructions provides insight into what may cause the spontaneous clustering of gaze across multiple viewers. Given that attentional synchrony requires all viewers to make the same decisions about where and when to allocate gaze, it might be assumed that as scene complexity increases and the number of sources of information available for making these decisions increases, so should the variation in the outcome decided by multiple individuals. However, the exact opposite appears to be true. Attentional synchrony increases with realism: static scenes < dynamic scenes. Interobserver agreement during viewing of static scenes is very low (Mannan et al., 1997; Tatler et al., 2005), even though the same objects in a scene tend to be prioritized by multiple viewers (Buswell, 1935; Yarbus, 1967). In dynamic scenes, previous studies have shown a high degree of attentional synchrony in edited and highly composed videos (Carmi & Itti, 2006a, 2006b; Goldstein et al., 2007; Mital et al., 2011; Ross & Kowler, 2013; Sawahata et al., 2008; Smith & Henderson, 2008; t'Hart et al., 2009; Tosi et al., 1997). A large factor in the shaping of attentional synchrony may be shot and sequence composition (Smith, 2012a, 2013). However, here we show a greater degree of attentional synchrony in minimally composed and unedited naturalistic dynamic scenes compared with content-matched static scenes. Averaging across different dynamic scenes reveals a consistently high level of attentional synchrony throughout presentation time. However, examining the attentional synchrony for an individual scene reveals fluctuations in attentional synchrony as scene content changes (also see figure 6 in Dorr et al., 2010). For example, Video 1 (Scene 20 in Figure 1) depicts a street scene in Edinburgh (Princes Street, by the Sir Walter Scott monument). During free viewing of the scene, there are several moments of tight gaze clustering, such as when a group of boys enter the scene from screen left (Frame 63), when a woman enters the screen to give money to the bagpiper (Frame 204), and when the woman in red walks toward the camera (Frame 330). Similar moments are evident in all dynamic scenes (also see Videos 2 and 3). Each of these moments is signified by a sudden change in the scene, either an appearance or a change in direction of a person. 
Relative to the static background, the moments when an object suddenly appears in the frame can be identified by a high degree of flicker (Mital et al., 2011). The significance of the event is temporally defined and, as such, absent from static presentations of the same scene. Although the addition of motion increases the potential sources of information in the scene compared with static presentations, the time constraints of when information is available simplifies the viewer's task of identifying which elements of the scene are significant at a particular moment. This results in attentional synchrony as all viewers default to the same strategy of prioritizing points of high novelty during free viewing. Such a predilection toward novelty has been a factor in basic attention research for decades (Treisman & Gelade, 1980; Wolfe, 2007) and has recently been applied to gaze prediction in static scenes (Najemnik & Geisler, 2005) and TV clips (Itti & Baldi, 2009). Itti and Baldi (2009) used a Bayesian definition of surprise to capture subjective impressions of the continuity of low-level visual features in TV clips and could predict 72% of all gaze shifts and 84% of the moments when all viewers gazed at the same location. However, in this study, viewers were instructed to “follow the main actors and actions, so that their gaze shifts reflected an active search for non-specific information of subjective interest” (p. 1299), which would bias them toward the moving elements of the scene which are the elements most likely to exhibit surprising changes. During our free viewing of dynamic scenes, we observe similar spontaneous tracking of moving objects and people, suggesting that this behavior may be a default for most viewers when attending naturalistic dynamic social scenes. However, as has been rightly noted before, instructing participants to “free view” a scene does not create a systematic default viewing task across all viewers but instead creates ambiguity in which task each individual is choosing to engage in (Tatler et al., 2005; Tatler et al., 2011). For example, during free viewing of Video 1, the majority of viewers decide to watch the bagpiper until new people appear in the scene. Similarly, in Video 2, all viewers initially look at the boys playing on the swing until one boy runs off and a seagull flies across the top of the scene, creating three areas of interest and dividing the viewers into three gaze clusters. By gazing at areas of greatest change, viewers are allocating their processing resources and area of greatest visual acuity to the areas of the screen that are likely to hold the greatest amount of information, creating an optimal viewing pattern for an unspecified viewing task (Najemnik & Geisler, 2005). 
Once a viewing task is specified that allows viewers to preset their sensitivity to visual features or locations of a scene (Folk, Remington, & Johnston, 1992), the utility of attending to motion or areas of change may be negated. Such presetting can explain the change in gaze behavior observed across our two viewing tasks. It has been proposed that computational models of gaze allocation in dynamic scenes should combine bottom-up salience computations with top-down factors to account for this effect (Itti, 2005). Models adopting this approach attempt to learn task relevance of scene features (Navalpakkam & Itti, 2005) or are hard coded with contextual cues that modulate the strength of the bottom-up salience map (Torralba et al., 2006). Such models might be able to account for the differences in gaze behavior observed across our free-viewing and spot-the-location tasks. The indirect instruction to attend to static features of a scene could decrease sensitivity to dynamic features in the salience map, essentially creating foreground subtraction on the scene so that moving features effectively disappear from the map. Such an approach would be blind to scene semantics. 
Two pieces of evidence argue against a simple motion presetting hypothesis: (a) attentional synchrony is still greater in dynamic spot-the-location than static and (b) the probability of fixating people remains greater than chance. These results may be spurious and evidence of participants' failing to follow the instructions to respond whether or not they know the location and instead reverting back to free viewing–like behavior before responding (see Figures 7 and 11). However, looking at individual examples of gaze behavior in dynamic spot-the-location tasks, it seems like the first couple of saccades are always directed first to the screen center and then to nearby people before heading into the periphery to locate indexical scene features such as a clock (Video 3), the Sir Walter Scott monument (Video 1), or a tower block (Video 2). This initial bias toward central people is confirmed by Figure 11. This bias could either be evidence of motion cutting through top-down presetting, an initial default bias toward people that is hard to override, or simply coincidence of motion and people with the screen center, as is also observed in static scenes (Tatler, 2007). Further experiments dissociating motion from people are required to test these hypotheses. 
Finally, our results further demonstrate online endogenous control of gaze in naturalistic visual scenes (Ballard, Hayhoe, & Pelz, 1995; Hayhoe & Ballard, 2005; Tatler et al., 2011) but extend the findings to situations in which salience has been thought to be the strongest predictor of gaze (i.e., noninteractive dynamic scenes). These findings mirror similar recent findings using dynamic scenes. Gaze behavior has been shown to change during the viewing of videos depicting people talking to a camera as the availability of audio information changes (Lansing & McConkie, 2003; Võ et al., 2012); gaze is tightly linked to the social cues of a magician (Kuhn et al., 2008) and impervious to distraction by sudden onsets during a magic trick (Smith, Lamont, & Henderson, in press). Recent evidence of gaze behavior during the perception of humans engaged in tasks also suggests that viewers distribute their attention to areas of the scene deemed relevant by their expectations about future events (Smith, 2012b). Such behaviors go far beyond the predictions of current saliency-based models of gaze and highlight the need for further work into the computational modeling of higher-order factors such as viewing task, social features, and event dynamics. 
Conclusion
The present study demonstrates how the degree of attentional synchrony during free viewing of prerecorded naturalistic dynamic scenes can be decreased by a viewing task prioritizing scene semantics defined by static, background features. Top-down control modulates bottom-up biases toward moving objects and changes which parts of a dynamic scene are fixated and the parameters of the eye movements used to do so (e.g., fixation durations and saccade amplitudes). During free viewing, a default tendency toward dynamic features leads to a high degree of attentional synchrony across multiple viewers. However, the similarity in the initial bias toward people during free viewing of both static and dynamic scenes suggests that attentional synchrony may not be due only to exogenous control of attention by motion but also to a correlation between low-level features and default cognitive relevance (e.g., interest in people). Such interactions between exogenous and endogenous influences on gaze behavior in dynamic scenes must be explored in future studies. 
Acknowledgments
Thanks to John Henderson, George Malcolm, Antje Nuthmann, Annabelle Goujon, and the rest of the Edinburgh University Visual Cognition lab for their feedback during early presentations of these data. These data were presented at the Eyetracking in Dynamic Scenes workshop (Edinburgh, UK; September 7, 2007), the Vision Science Society annual meetings (Naples, FL; May 9–14, 2008, and May 6–11, 2011), and the European Conference on Eye Movements (Marseille, France; August 21–25, 2011). 
Commercial relationships: none. 
Corresponding author: Tim J. Smith. 
Email: tj.smith@bbk.ac.uk. 
Address: Department of Psychological Sciences, Birkbeck, University of London, London, UK. 
References
Antes J. R. (1974). Time course of picture viewing. Journal of Experimental Psychology, 103 (1), 62–70. [CrossRef] [PubMed]
Baddeley R. J. Tatler B. W. (2006). High frequency edges (but not contrast) predict where we fixate: A Bayesian system identification analysis. Vision Research, 46, 2824–2833. [CrossRef] [PubMed]
Ballard D. H. Hayhoe M. M. Pelz J. B. (1995). Memory representations in natural tasks. Journal of Cognitive Neuroscience, 7 (1), 66–80. [CrossRef] [PubMed]
Berg D. J. Boehnke S. E. Marino R. A. Munoz D. P. Itti L. (2009). Free viewing of dynamic stimuli by humans and monkeys. Journal of Vision, 9 (5): 19, 1–15, http://www.journalofvision.org/content/9/5/19, doi:10.1167/9.5.19. [PubMed] [Article] [CrossRef] [PubMed]
Birmingham E. Bischof W. F. Kingstone A. (2008). Gaze selection in complex social scenes. Visual Cognition, 16, 341–355. [CrossRef]
Bishop C. M. (2007). Pattern Recognition and Machine Learning. New York: Springer.
Buswell G. T. (1935). How people look at pictures: A study of the psychology of perception in art. Chicago: The University of Chicago Press.
Carmi R. Itti L. (2006 a). The role of memory in guiding attention during natural vision. Journal of Vision, 6 (9): 4, 898–914, http://www.journalofvision.org/content/6/9/4, doi:10.1167/6.9.4. [PubMed] [Article] [CrossRef]
Carmi R. Itti L. (2006b). Visual causes versus correlates of attention selection in dynamic scenes. Vision Research, 46, 4333. [CrossRef]
Castelhano M. S. Henderson J. M. (2008). Stable individual differences across images in human saccadic eye movements. Canadian Journal of Experimental Psychology, 62 (1), 1–14. [CrossRef] [PubMed]
Castelhano M. S. Mack M. Henderson J. M. (2009). Viewing task influences eye movement control during active scene perception. Journal of Vision, 9 (3): 6, 1–15, http://www.journalofvision.org/content/9/3/6, doi:10.1167/9.3.6. [PubMed] [Article] [CrossRef] [PubMed]
Castelhano M. S. Wieth M. S. Henderson J. M. (2007). I see what you see: Eye movements in real-world scenes are affected by perceived direction of gaze. In Rome L. P. a. E. (Ed.), Attention in cognitive systems (pp. 252–262). Berlin: Springer.
Cerf M. Frady E. P. Koch C. (2009). Faces and text attract gaze independent of the task: Experimental data and computer model. Journal of Vision, 9 (12): 10, 1–15, http://www.journalofvision.org/content/9/12/10, doi:10.1167/9.12.10. [PubMed] [Article] [CrossRef] [PubMed]
Cristino F. Baddeley R. J. (2009). The nature of visual representations involved in eye movements when walking down the street. Visual Cognition, 17, 880–903. [CrossRef]
Dorr M. Martinetz T. Gegenfurtner K. R. Barth E. (2010). Variability of eye movements when viewing dynamic natural scenes. Journal of Vision, 10 (10): 28, 1–17, http://www.journalofvision.org/content/10/10/28, doi:10.1167/10.10.28. [PubMed] [Article] [CrossRef] [PubMed]
Einhauser W. Spain M. Perona P. (2008). Objects predict fixations better than early saliency. Journal of Vision, 8 (14): 18, 11–26, http://www.journalofvision.org/content/8/14/18, doi:10.1167/8.14.18. [PubMed] [Article] [CrossRef] [PubMed]
Elazary L. Itti L. (2008). Interesting objects are visually salient. Journal of Vision, 8 (3): 3, 1–15, http://www.journalofvision.org/content/8/3/3, doi:10.1167/8.3.3. [PubMed] [Article] [CrossRef] [PubMed]
Findlay J. M. Walker R. (1999). A model of saccade generation based on parallel processing and competitive inhibition. Behavioral and Brain Sciences, 22, 661–674. [PubMed]
Folk C. L. Remington R. W. Johnston J. C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 24, 1030–1044. [CrossRef]
Foulsham T. Underwood G. (2008). What can saliency models predict about eye movements? Spatial and sequential aspects of fixations during encoding and recognition. Journal of Vision, 8 (2): 6, 1–17, http://www.journalofvision.org/content/8/2/6, doi:10.1167/8.2.6. [PubMed] [Article] [CrossRef] [PubMed]
Goldstein R. B. Woods R. L. Peli E. (2007). Where people look when watching movies: Do all viewers look at the same place? Computers in Biology and Medicine, 37, 957–964. [CrossRef] [PubMed]
Hasson U. Landesman O. Knappmeyer B. Valines I. Rubin N. Heeger D. J. (2008). Neurocinematics: The neuroscience of film. Projections: The Journal of Movies and Mind, 2 (1), 1–26. [CrossRef]
Hayhoe M. Ballard D. (2005). Eye movements in natural behavior. Trends in Cognitive Sciences, 9, 188–194. [CrossRef] [PubMed]
Hayhoe M. M. Land M. F. (1999). Coordination of eye and hand movements in a normal visual environment. Investigative Ophthalmology & Visual Science, 40, S380–S380.
Hayhoe M. M. Shrivastava A. Mruczek R. Pelz J. B. (2003). Visual memory and motor planning in a natural task [Abstract]. Journal of Vision, 3 (1): 6, http://www.journalofvision.org/content/3/1/6.abstract, doi:10.1167/3.1.6. [PubMed] [Article] [CrossRef] [PubMed]
Henderson J. M. (2003). Human gaze control during real-world scene perception. Trends in Cognitive Sciences, 7, 498–504. [CrossRef] [PubMed]
Henderson J. M. (2005). Introduction to real-world scene perception. Visual Cognition, 12, 849–851. [CrossRef]
Henderson J. M. (2007). Regarding scenes. Current Directions in Psychological Science, 16, 219–222. [CrossRef]
Henderson J. M. Brockmole J. R. Castelhano M. S. Mack M. (2007). Visual saliency does not account for eye movements during visual search in real-world scenes. In van Gompel R. Fischer M. Murray W. Hill R. (Eds.), Eye movements: A window on mind and brain (pp. 537–562). Oxford, UK: Elsevier.
Henderson J. M. Malcolm G. L. Schandl C. (2009). Searching in the dark: Cognitive relevance drives attention in real-world scenes. Psychonomic Bulletin & Review, 16, 850–856. [CrossRef] [PubMed]
Henderson J. M. Smith T. J. (2009). How are eye fixation durations controlled during scene viewing? Evidence from a scene onset delay paradigm. Visual Cognition, 17, 1055–1082. [CrossRef]
Henderson J. M. Weeks P. A. Hollingworth A. (1999). The effects of semantic consistency on eye movements during complex scene viewing. Journal of Experimental Psychology—Human Perception and Performance, 25 (1), 210–228. [CrossRef]
Itti L. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40, 1489–1506. [CrossRef] [PubMed]
Itti L. (2005). Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Visual Cognition, 12, 1093–1123. [CrossRef]
Itti L. Baldi P. F. (2009). Bayesian surprise attracts human attention. Vision Research, 49, 1295–1306. [CrossRef] [PubMed]
Itti L. Koch C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2, 194–203. [CrossRef] [PubMed]
Jovancevic-Misic J. Hayhoe M. (2009). Adaptive gaze control in natural environments. Journal of Neuroscience, 29, 6234–6238. [CrossRef] [PubMed]
Jovancevic-Misic J. Sullivan B. Hayhoe M. (2006). Control of attention and gaze in complex environments. Journal of Vision, 6 (12): 9, 1431–1450, http://www.journalofvision.org/content/6/12/9, doi:10.1167/6.12.9. [PubMed] [Article] [CrossRef]
Kosnik W. Fikre J. Sekuler R. (1986). Visual fixation stability in older adults. Investigative Ophthalmology & Visual Science, 27, 1720–1725, http://www.iovs.org/content/27/12/1720. [PubMed] [Article] [PubMed]
Kuhn G. Tatler B. W. Findlay J. M. Cole G. G. (2008). Misdirection in magic: Implications for the relationship between eye gaze and attention. Visual Cognition, 16, 391–405. [CrossRef]
Laidlaw K. E. W. Foulsham T. Kuhn G. Kingstone A. (2011). Social attention to a live person is critically different than looking at a videotaped person. Proceedings of the National Academy of Sciences, USA, 108, 5548–5553. [CrossRef]
Land M. F. Hayhoe M. M. (2001). In what ways do eye movements contribute to everyday activities? Vision Research, 41, 3559–3565. [CrossRef] [PubMed]
Land M. F. Lee D. N. (1994). Where we look when we steer. Nature, 369, 742–744. [CrossRef] [PubMed]
Land M. F. Mennie N. Rusted J. (1999). The roles of vision and eye movements in the control of activities of daily living. Perception, 28, 1311–1328. [CrossRef] [PubMed]
Lansing C. R. McConkie G. W. (2003). Word identification and eye fixation locations in visual and visual-plus-auditory presentations of spoken sentences. Perception & Psychophysics, 65, 536–553. [CrossRef] [PubMed]
Le Meur O. Le Callet P. Barba D. (2007). Predicting visual fixations on video based on low-level visual features. Vision Research, 47, 2483–2498. [CrossRef] [PubMed]
Mannan S. K. Ruddock K. H. Wooding D. S. (1995). Automatic control of saccadic eye movements made in visual inspection of briefly presented 2-D images. Spatial Vision, 9, 363–386. [CrossRef] [PubMed]
Mannan S. K. Ruddock K. H. Wooding D. S. (1997). Fixation sequences made during visual examination of briefly presented 2D images. Spatial Vision, 11, 157–178. [CrossRef] [PubMed]
Meyer C. H. Lasker A. G. Robinson D. A. (1985). The upper limit of human smooth pursuit velocity. Vision Research, 25, 561–563. [CrossRef] [PubMed]
Mills M. Van der Stigchel S. Hollingworth A. Hoffman L. Dodd M. D. (2011). Examining the influence of task-set on eye movements and fixations. Journal of Vision, 11 (8): 17, 1–15, http://www.journalofvision.org/content/11/8/17, doi:10.1167/11.8.17. [PubMed] [Article] [CrossRef] [PubMed]
Mital P. K. Smith T. J. Hill R. L. Henderson J. M. (2011). Clustering of gaze during dynamic scene viewing is predicted by motion. Cognitive Computation, 3 (1), 5–24. [CrossRef]
Najemnik J. Geisler W. S. (2005). Optimal eye movement strategies in visual search. Nature, 434, 387–391. [CrossRef] [PubMed]
Navalpakkam V. Itti L. (2005). Modeling the influence of task on attention. Vision Research, 45, 205–231. [CrossRef] [PubMed]
Nuthmann A. Smith T. J. Engbert R. Henderson J. M. (2010). CRISP: A computational model of fixation durations in scene viewing. Psychological Review, 117, 382–405. [CrossRef] [PubMed]
Nyström M. Holmqvist K. (2010). An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data. Behavioral Research Methods, 42, 188–204. [CrossRef]
Peters R. J. Iyer A. Itti L. Koch C. (2005). Components of bottom-up gaze allocation in natural images. Vision Research, 45, 2397–2416. [CrossRef] [PubMed]
Rajashekar U. Cormack L. K. Bovik A. C. (2004). Point of gaze analysis reveals visual search strategies. Paper presented at the SPIE Human Vision and Electronic Imaging IX, Bellingham, WA.
Rayner K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372–422. [CrossRef] [PubMed]
Risko E. F. Laidlaw K. E. Freeth M. Foulsham T. Kingstone A. (2012). Social attention with real vs. reel stimuli: Toward an empirical approach to concerns about ecological validity. Frontiers in Human Neuroscience, 6, 143. [CrossRef] [PubMed]
Ross N. M. Kowler E. (2013). Eye movements while viewing narrated, captioned and silent videos. Journal of Vision, 13 (4): 1, 1–19, http://www.journalofvision.org/content/13/4/1, doi:10.1167/13.4.1. [PubMed] [Article] [CrossRef] [PubMed]
Sawahata Y. Khosla R. Komine K. Hiruma N. Itou T. Watanabe S. (2008). Determining comprehension and quality of TV programs using eye-gaze tracking. Pattern Recognition, 41, 1610–1626. [CrossRef]
Smith T. J. (2006). An attentional theory of continuity editing (Unpublished doctoral dissertation). Edinburgh, UK: University of Edinburgh.
Smith T. J. (2012a). Attentional theory of cinematic continuity. Projections: The Journal for Movies and the Mind, 6 (1), 1–27. [CrossRef]
Smith T. J. (2012 b). The relationship between overt attention and event perception during dynamic social scenes. Journal of Vision, 12 (9): 407, http://www.journalofvision.org/content/12/9/407, doi:10.1167/12.9.407. [Abstract] [CrossRef]
Smith T. J. (2013). Watching you watch movies: Using eye tracking to inform cognitive film theory. In Shimamura A. P. (Ed.), Psychocinematics: Exploring cognition at the movies ( pp. 165–191 ). New York: Oxford University Press.
Smith T. J. Henderson J. M. (2008). Attentional synchrony in static and dynamic scenes. Journal of Vision, 8 (6): 773, http://www.journalofvision.org/content/8/6/773, doi:10.1167/8.6.773. [Abstract] [CrossRef]
Smith T. J. Lamont P. Henderson J. M. (in press). Change blindness in a dynamic scene due to endogenous override of exogenous attentional cues. Perception.
Stampe D. M. (1993). Heuristic filtering and reliable calibration methods for video-based pupil-tracking systems. Behavior Research Methods, Instruments, & Computers, 25, 137–142. [CrossRef]
t'Hart B. M. Vockeroth J. Schumann F. Bartl K. Schneider E. König P. (2009). Gaze allocation in natural stimuli: Comparing free exploration to head-fixed viewing conditions. Visual Cognition, 17, 1132–1158. [CrossRef]
Tatler B. W. (2007). The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision, 7 (14): 4, 1–17, http://www.journalofvision.org/content/7/14/4, doi:10.1167/7.14.4. [PubMed] [Article] [CrossRef] [PubMed]
Tatler B. W. (2009). Current understanding of eye guidance. Visual Cognition, 17, 777–789. [CrossRef]
Tatler B. W. Baddeley R. J. Gilchrist I. D. (2005). Visual correlates of fixation selection: Effects of scale and time. Vision Research, 45, 643–659. [CrossRef] [PubMed]
Tatler B. W. Hayhoe M. M. Land M. F. Ballard D. H. (2011). Eye guidance in natural vision: Reinterpreting salience. Journal of Vision, 11 (5): 5, 1–23, http://www.journalofvision.org/content/11/5/5, doi:10.1167/11.5.5. [PubMed] [Article] [CrossRef] [PubMed]
Taya S. Windridge D. Osman M. (2012). Looking to score: The dissociation of goal influence on eye movement and meta-attentional allocation in a complex dynamic natural scene. PLoS One, 7, e39060. [CrossRef] [PubMed]
Torralba A. Oliva A. Castelhano M. S. Henderson J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search. Psychological Review, 113, 766–786. [CrossRef] [PubMed]
Tosi V. Mecacci L. Pasquali E. (1997). Scanning eye movements made when viewing film: Preliminary observations. International Journal of Neuroscience, 92, 47–52. [CrossRef] [PubMed]
Treisman A. M. Gelade G. (1980). Feature-integration theory of attention. Cognitive Psychology, 12 (1), 97–136. [CrossRef] [PubMed]
Tseng P. H. Carmi R. Cameron I. G. M. Munoz D. P. Itti L. (2009). Quantifying centre bias of observers in free viewing of dynamic natural scenes. Journal of Vision, 9 (7): 4, 1–16, http://www.journalofvision.org/content/9/7/4, doi:10.1167/9.7.4. [PubMed] [Article] [CrossRef] [PubMed]
Unema P. J. A. Pannasch S. Joos M. Velichovsky B. M. (2005). Time course of information processing during scene perception: The relationship between saccade amplitude and fixation duration. Visual Cognition, 12, 473–494. [CrossRef]
Vig E. Dorr M. Barth E. (2009). Efficient visual coding and the predictability of eye movements on natural movies. Spatial Vision, 22, 397–408. [CrossRef] [PubMed]
Vig E. Dorr M. Martinetz T. Barth E. (2011). Eye movements show optimal average anticipation with natural dynamic scenes. Cognitive Computation, 3 (1), 79–88. [CrossRef]
Võ M. L.-H. Smith T. J. Mital P. K. Henderson J. M. (2012). Do the eyes really have it? Dynamic allocation of attention when viewing moving faces. Journal of Vision, 12 (13): 3, 1–14, http://www.journalofvision.org/content/12/13/3, doi:10.1167/12.13.3. [PubMed] [Article] [CrossRef] [PubMed]
Wang H. X. Freeman J. Merriam E. P. Hasson U. Heeger D. J. (2012). Temporal eye movement strategies during naturalistic viewing. Journal of Vision, 12 (1): 16, 1–27, http://www.journalofvision.org/content/12/1/16, doi:10.1167/12.1.16. [PubMed] [Article] [CrossRef] [PubMed]
Wolfe J. M. (2007). Guide search 4.0: Current progress with a model of visual search. In Gray W. (Ed.), Integrated models of cognitive systems (pp. 99–119 ). New York: Oxford.
Wolfe J. M. Horowitz T. S. (2004). What attributes guide the deployment of visual attention and how do they do it? Nature Reviews Neuroscience, 5, 1–7. [CrossRef]
Yarbus A. L. (1967). Eye movements and vision. New York: Plenum Press.
Zacks J. M. Braver T. S. Sheridan M. A. Donaldson D. I. Snyder A. Z. Ollinger J. M. (2001). Human brain activity time-locked to perceptual event boundaries. Nature Neuroscience, 4, 651–655. [CrossRef] [PubMed]
Figure 1
 
Stills taken from the 26 scenes used in this experiment. For each participant, half of the scenes were presented as static images and half as videos.
Figure 1
 
Stills taken from the 26 scenes used in this experiment. For each participant, half of the scenes were presented as static images and half as videos.
Figure 2
 
Mean fixation durations across the two stimulus types (static vs. dynamic scenes; x-axis) and viewing tasks (lines): free view versus spot-the-location before the response. Error bars represent ±1 SE.
Figure 2
 
Mean fixation durations across the two stimulus types (static vs. dynamic scenes; x-axis) and viewing tasks (lines): free view versus spot-the-location before the response. Error bars represent ±1 SE.
Figure 3
 
Mean fixation duration (ms) over time (1000-ms time bins) for each scene type (static = green lines, dynamic = blue) and viewing task (free view = solid lines, spot-the-location = dashed lines). Saccades are analyzed for the full 20-s viewing duration in both tasks and not just up to the response in spot-the-location (as in Figure 2). Error bars represent ±1 SE.
Figure 3
 
Mean fixation duration (ms) over time (1000-ms time bins) for each scene type (static = green lines, dynamic = blue) and viewing task (free view = solid lines, spot-the-location = dashed lines). Saccades are analyzed for the full 20-s viewing duration in both tasks and not just up to the response in spot-the-location (as in Figure 2). Error bars represent ±1 SE.
Figure 4
 
Mean saccade amplitudes (degrees of visual angle) across the two stimulus types (static vs. dynamic scenes; x-axis) and viewing tasks (lines): free view versus spot-the-location. Error bars represent ±1 SE.
Figure 4
 
Mean saccade amplitudes (degrees of visual angle) across the two stimulus types (static vs. dynamic scenes; x-axis) and viewing tasks (lines): free view versus spot-the-location. Error bars represent ±1 SE.
Figure 5
 
Mean saccade amplitudes (degrees) over time (1000-ms time bins) for each scene type (static = green lines, dynamic = blue) and viewing task (free view = solid lines, spot-the-location = dashed lines). Saccades are analyzed for the full 20-s viewing duration in both tasks and not just up to the response in spot-the-location (as in Figure 4). Error bars represent ±1 SE.
Figure 5
 
Mean saccade amplitudes (degrees) over time (1000-ms time bins) for each scene type (static = green lines, dynamic = blue) and viewing task (free view = solid lines, spot-the-location = dashed lines). Saccades are analyzed for the full 20-s viewing duration in both tasks and not just up to the response in spot-the-location (as in Figure 4). Error bars represent ±1 SE.
Figure 6
 
Mean gaze clustering (degrees of visual angle as described by a spherical Gaussian kernel with a radius of 1 standard deviation) across all scenes for the two stimulus types (static vs. dynamic scenes; x-axis) and viewing tasks (lines): free view versus spot-the-location. Lower values indicate tighter clustering (i.e., greater attentional synchrony). Error bars represent ±1 SE.
Figure 6
 
Mean gaze clustering (degrees of visual angle as described by a spherical Gaussian kernel with a radius of 1 standard deviation) across all scenes for the two stimulus types (static vs. dynamic scenes; x-axis) and viewing tasks (lines): free view versus spot-the-location. Lower values indicate tighter clustering (i.e., greater attentional synchrony). Error bars represent ±1 SE.
Figure 7
 
Gaze cluster covariance over time (divided into 500-ms bins). Scene type is represented by line color (green = static, blue = dynamic) and viewing task by line type (solid = free view, dashed = spot-the-location). Error bars represent ±1 SE for mean across all scenes.
Figure 7
 
Gaze cluster covariance over time (divided into 500-ms bins). Scene type is represented by line color (green = static, blue = dynamic) and viewing task by line type (solid = free view, dashed = spot-the-location). Error bars represent ±1 SE for mean across all scenes.
Figure 8
 
The percentage of each trial in which the spatial distribution of gaze was optimally described as a specific number of clusters (x-axis = number of clusters). Scene type is represented by line color (green = static, blue = dynamic) and viewing task by line type (solid = free view, dashed = spot-the-location). Error bars represent ±1 SE for mean percentage across all scenes.
Figure 8
 
The percentage of each trial in which the spatial distribution of gaze was optimally described as a specific number of clusters (x-axis = number of clusters). Scene type is represented by line color (green = static, blue = dynamic) and viewing task by line type (solid = free view, dashed = spot-the-location). Error bars represent ±1 SE for mean percentage across all scenes.
Figure 9
 
Mean flicker in a 2° region around fixation during free viewing of dynamic scenes (“‘Free view” = green bar) and during spot-the-location task before a response was made (“STL + Before” = middle blue bar) and after (“STL + After” = right blue bar). A baseline rate of flicker around control locations is presented (solid red line = mean flicker, dashed lines = ±1 SE). The control locations control for gaze biases (such as screen center) by being randomly sampled from different frames in the same scene. Error bars represent ±1 SE.
Figure 9
 
Mean flicker in a 2° region around fixation during free viewing of dynamic scenes (“‘Free view” = green bar) and during spot-the-location task before a response was made (“STL + Before” = middle blue bar) and after (“STL + After” = right blue bar). A baseline rate of flicker around control locations is presented (solid red line = mean flicker, dashed lines = ±1 SE). The control locations control for gaze biases (such as screen center) by being randomly sampled from different frames in the same scene. Error bars represent ±1 SE.
Figure 10
 
Mean probability of each fixation falling within a dynamic person region across all scenes for the two stimulus types (static = green vs. dynamic = blue; bars) and viewing tasks (x-axis): free view (left), spot-the-location before response (middle), spot-the-location after response (right). Solid red line indicates chance fixation probability given the size of the people regions. Error bars and dashed red lines represent ±1 SE.
Figure 10
 
Mean probability of each fixation falling within a dynamic person region across all scenes for the two stimulus types (static = green vs. dynamic = blue; bars) and viewing tasks (x-axis): free view (left), spot-the-location before response (middle), spot-the-location after response (right). Solid red line indicates chance fixation probability given the size of the people regions. Error bars and dashed red lines represent ±1 SE.
Figure 11
 
Mean probability of fixating people over time (averaged into 500-ms bins; x-axis) across all scenes for the two stimulus types (static = green dashed lines, dynamic scenes = blue solid line) and viewing tasks (free view = top, before the spot-the-location response = middle, after the spot-the-location response = bottom). Solid red line indicates chance given the size of the people regions. Error bars and dashed red lines represent ±1 SE.
Figure 11
 
Mean probability of fixating people over time (averaged into 500-ms bins; x-axis) across all scenes for the two stimulus types (static = green dashed lines, dynamic scenes = blue solid line) and viewing tasks (free view = top, before the spot-the-location response = middle, after the spot-the-location response = bottom). Solid red line indicates chance given the size of the people regions. Error bars and dashed red lines represent ±1 SE.
Table 1
 
Mean oculomotor measures for participants across two viewing tasks (free view vs. spot-the-location) and static versus dynamic stimulus. Notes: Significance level of paired t tests between static and dynamic are reported as asterisks (* = <.05, *** = <.001).
Table 1
 
Mean oculomotor measures for participants across two viewing tasks (free view vs. spot-the-location) and static versus dynamic stimulus. Notes: Significance level of paired t tests between static and dynamic are reported as asterisks (* = <.05, *** = <.001).
Free view Spot-the-location
Static Dynamic t Test (p value) Static Dynamic t Test (p value)
Fixation duration (ms) 290.68 (46.10) 339.29 (58.02) *** 231.18 (30.16) 253.93 (40.06) ***
Percentage of trial in fixation (%) 72.89 (16.05) 73.32 (17.41) n.s. 61.82 (16.29) 65.62 (14.53) .107
Saccade amplitude (°) 4.07 (0.63) 4.21 (0.64) * 4.91 (0.62) 5.07 (0.57) *
Percentage of trial in saccade (%) 8.02 (2.14) 6.48 (1.92) *** 10.36 (2.93) 9.98 (2.28) n.s.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×