Abstract
A common simplifying assumption for dealing with vast amounts of raw eye-tracking data is to focus on spatial rather than temporal analyses. This assumption is supported by studies with still images, which showed that spatial rather than temporal correlations provide the only source of information in eye-tracking data. Here we establish the extent to which this assumption is violated during inspection of dynamic scenes.
We collected 50 video clips depicting a heterogeneous collection of natural scenes. These clips were cut into clip segments, which were re-assembled into 50 scene-shuffled clips (MTV-style). Human observers inspected either continuous or scene-shuffled clips, and inter-observer agreement in gaze position was quantified across conditions and over time.
On average, the instantaneous eye-positions of 4 human observers were clustered within a rectangle covering 8.51% and 6.04% of the display area in the continuous and scene-shuffled conditions, respectively. These values increased to 11.48% (p<0.01) and 9.36% (p<0.01) when eye-positions were sampled from the same eye traces in random order. The average cluster area increased further to 35.88% (p<0.01) when 4 eye-positions were chosen at random from a uniform distribution of spatial locations. Moreover, preserving time information led to previously unreported patterns of inter-observer agreement.
These results demonstrate that increasing stimulus dynamics triggers eye-movement patterns that diverge increasingly from previous accounts based on still images. The limited scalability of conclusions based on still images is likely to be further accentuated by future enhancements in the realism of laboratory stimuli, such as larger field of view and reduced central bias.