Individuals use knowledge about recently passed events to guide their behavior. For example, we do not touch a dish knowing that it has just been taken out of a hot oven. Although situations in which knowledge accumulation occurs are omnipresent, it remains unclear how this recently accumulated knowledge (what we call prior knowledge) guides further information-seeking. In particular, relatively little is known about how it affects how humans acquire visual information by means of shifting their gaze. To examine this issue, we conducted an eye-tracking study in which we used unique stimuli: sequences of static film frames that allowed for the accumulation of knowledge about unfolding events.
Typical experimental procedures used to study visual exploration involve presenting participants with sequences of images that are unrelated to each other. These procedures are usually conducted in laboratory conditions and, as a consequence, guarantee good control over most aspects of the experiments. They are also versatile, because they allow for presenting almost any set of images as stimuli. Owing to these features, these procedures have been used to tackle a wide range of research problems. To name a few, they allowed to characterize factors influencing oculomotor behavior while viewing emotionally laden images (
Pilarczyk, Schwertner, Wołoszyn, & Kuniecki, 2019;
Pilarczyk, Kuniecki, Wołoszyn, & Sterna, 2020;
Pilarczyk & Kuniecki, 2014) or images depicting people (
Birmingham, Bischof, & Kingstone, 2008;
Cerf, Paxon Frady, & Koch, 2009;
End & Gamer, 2017;
Flechsenhar & Gamer, 2017), as well as to uncover idiosyncrasies in gaze behavior of individuals (
Broda & de Haas, 2022;
De Haas, Iakovidis, Schwarzkopf, & Gegenfurtner, 2019;
Linka & de Haas, 2020;
Zangrossi, Cona, Celli, Zorzi, & Corbetta, 2021) and create robust methods of predicting fixation locations (
Kümmerer, Bethge, & Wallis, 2017;
Kümmerer, Wallis, Gatys, & Bethge, 2022).
An assumption embedded in the procedures described is that viewing any image is independent of viewing the other images. Participants are usually shown images presented in random order to ensure that this assumption is met. Consequently, typical image viewing experiments do not allow for investigating the influence of recently acquired knowledge on oculomotor behavior. In these experiments, participants do not have the opportunity to accumulate information relevant to upcoming images and, therefore, view each image without prior knowledge about its content. This situation starkly contrasts with a typical real world experience, where the visual input is usually continuous—people observe events as they unfold, so the current visual input is usually related to the input that immediately preceded it. This continuity enables the accumulation of information about the current environment and what is happening with it. As a result, almost all visual input the individuals receive is embedded in prior knowledge relevant to it. However, owing to the above-mentioned methodological reasons, the consequences of possessing that knowledge for oculomotor behavior during static image viewing remain underexplored. Only a handful of studies have investigated them, and they either used contrived stimuli (
Król & Król, 2018;
Pedziwiatr, von dem Hagen, & Teufel, 2023) or relied only upon a very limited number of images (
Dorr, Martinetz, Gegenfurtner, & Barth, 2010;
Hutson, Chandran, Magliano, Smith, & Loschky, 2022). What these studies have in common is that they investigated how the gaze is affected by information that is inaccessible via eye movements at a given moment. This property distinguishes them from the typical image viewing experiments in which information accumulation occurs exclusively while viewing a particular image. For example, in a visual search task, participants may inspect a certain location and determine that it does not contain the target, accumulate that information, and inspect this location again if they wish to. This possibility to acquire previously accumulated information again by re-inspecting an image region makes that kind of knowledge accumulation distinct from the kind we are interested in here.
Nevertheless, there are at least two important strands of research that touch on how the different types of continuity of visual input—a necessary condition for knowledge accumulation—affect eye movements. The first strand investigates real-world tasks that extend over time, such as putting up a tent (
Sullivan et al., 2021), making a sandwich (
Hayhoe, Shrivastava, Mruczek, & Pelz, 2003), or walking in various settings (
Bonnen et al., 2021;
Ghiani, Van Hout, Driessen, & Brenner, 2023;
Matthis, Yates, & Hayhoe, 2018;
Patla & Vickers, 2003;
Rothkopf, Ballard, & Hayhoe, 2016). These studies highlighted the interactions between eye movements and processes that require knowledge accumulation, such as planning, representing the environment, manual actions, or navigation (
Tatler & Land, 2011). The second strand of research that explicitly acknowledges the importance of continuity of visual input explores how people process and understand visual narratives, such as comics (
Foulsham & Cohn, 2021;
Foulsham, Wybrow, & Cohn, 2016;
Hutson, Magliano, & Loschky, 2018;
Kirtley et al., 2018) and films (
Kirkorian & Anderson, 2018;
Loschky, Larson, Magliano, & Smith, 2015,
Loschky, Larson, Smith, & Magliano, 2020;
Rider, Coutrot, Pellicano, Dakin, & Mareschal, 2018;
Smith, 2012;
Valuch, König, & Ansorge, 2017;
Wang, Freeman, Merriam, Hasson, & Heeger, 2012). In these media, a high-level narrative continuity is ensured by the plot and, especially in film, preserved even in the presence of abrupt changes in visual input, such as cuts and camera position shifts. The studies cited above demonstrated that following a plot—which requires accumulating information—has consequences for the oculomotor behavior of viewers.
Both these research strands demonstrate that prior knowledge matters for how visual input is sampled using eye movements. However, they both rely on very different stimuli and/or experimental settings than typical image viewing experiments. Therefore, relating findings from these two strands to the experiments that rely on viewing static images is nontrivial. To illustrate, consider that real-life tasks usually require manual action (which results in visible changes in the environment), often accompanied by head or body motion (which changes the available field of view), so in this case participants do not inspect the environment as a fixed static snapshot that would be equivalent to a single static image displayed in a laboratory-based experiment. Comics, in turn, usually consist of simplified drawings that contain specific graphical elements (e.g., speech balloons) across multiple panels and, therefore, are considerably different from typical scenes. Finally, films are obviously different because they contain motion. Overall, it is challenging to directly relate insights into the role of prior knowledge gained from the experiments on real-world tasks, comics, and films to static image viewing. Hence, the effects of recently gained prior knowledge manifesting in this context remain underexplored.
To examine these differences, we recently created a unique stimulus set comprising 80 sequences of static frames from films directed by Alfred Hitchcock (
Pedziwiatr, Heer, Coutrot, Bex, & Mareschal, 2023). The sequences were arranged in such a way that participants viewed identical frames after acquiring knowledge that was either relevant to the content of these frames or not. Thus, we retained the well-controlled nature of typical image viewing experiments while creating conditions that enabled knowledge accumulation. We found that participants who had accumulated relevant prior knowledge exhibited less exploratory gaze behavior than those who did not. Our conclusion was based on the analysis of seven characteristics of eye movements: the number of fixations, average fixation duration, average interfixation distance, interobserver consistency, the probability of blinking, first saccade latency, and heatmap entropy. This fine-grained approach, however, had its limitations. First, all of our metrics largely ignored the temporal order of fixations. Each scanpath (i.e., a raw gaze trace of an individual participant on an individual frame) generated in a 2-second trial was collapsed over time into one number per metric (e.g., average fixation duration). Second, our metrics were decoupled from the spatial distributions of fixations on the frames. For example, they all could have identical values for two sets of fixations inspecting different regions of a frame. Third, relying on the metrics requires extracting fixations from the raw eye trace. This step involves researcher degrees of freedom (
Simmons, Nelson, & Simonsohn, 2011;
Wicherts et al., 2016), that is, analytical decisions, such as selecting a fixation extraction algorithm out of many available (
Birawo & Kasprowski, 2022) and fine tuning its parameters, which might inadvertently affect the results. Fourth, although our approach was well-suited for testing hypotheses, it left little space for the exploration of the spatiotemporal structure of the rich data we collected.
To bypass these limitations, we analyzed the same data using a very different approach. Specifically, we modeled individual scanpaths as hidden Markov models (HMMs) and applied machine-learning techniques to them. An HMM is a mathematical construct describing—in a probabilistic fashion—a temporal evolution of a system with a finite number of possible states. It generates probable trajectories of the system's behavior that are expressed as the sequences of system's states. Modelling a scanpath as an HMM involves assuming that each data sample belongs to one state and finding HMM's parameters for which the HMM is most likely to generate the sequence of states constituting the scanpath. Methods based on this modelling have proven insightful and are commonly used in various research contexts (
Chuk, Chan, & Hsiao, 2014;
Coutrot, Binetti, Harrison, Mareschal, & Johnston, 2016;
Coutrot, Hsiao, & Chan, 2018;
Haji-Abolhassani & Clark, 2014;
Hsiao, Lan, Zheng, & Chan, 2021,
Hsiao, Liao, & Tso, 2022;
Liu et al., 2013).
Using the HMMs has several advantages over the more traditional approach based on multiple oculomotor metrics we have used previously. First, HMMs offer a succinct way to holistically capture both the spatial and temporal dynamics of a scanpath. Second, analyses combining HMMs and machine learning algorithms (classifiers) offer data-driven ways of measuring the relative importance of different aspects of scanpaths for distinguishing between conditions in which they were recorded (e.g., see
Coutrot et al., 2018). Third, HMMs can be analyzed in multiple ways, which facilitates sophisticated data exploration. Fourth, HMMs can be fit to raw data, which eliminates the researcher aforementioned degrees of freedom related to fixation extraction. However, using HMMs does not eliminate the problem of these degrees of freedom completely. Therefore, we have guarded against this concern by preregistering the general outline of our HMMs analysis.
To foreshadow our results, we found that an off-the-shelf linear classifier could distinguish between HMMs fitted to scanpaths of participants who viewed identical frames in different conditions: either with or without prior knowledge relevant to the frames’ content. An exploratory analysis of the classification results revealed the link between this content and the variability in eye movements along the horizontal dimension to which the classifier was sensitive. In addition to these main results, we also describe two methodological caveats we have encountered when working with HMMs: one related to the role of data dimensionality reduction in the classification analysis and one related to the influence of a random number generator on the outcomes of the HMMs fitting procedure. To our knowledge, these caveats have not yet been described in the literature on modelling gaze patterns with HMMs.