Our interest in this problem comes from the recently developed theoretical framework of the
Scene Perception & Event Comprehension Theory (SPECT; Loschky, Hutson, Smith, Smith, & Magliano,
2018; Loschky, Larson, Smith, & Magliano,
2019; Magliano, Loschky, Clinton, & Larson,
2013). SPECT considers scene-gist recognition as one part of the larger problem of perceiving and understanding scenes and events in the real world. According to SPECT, the extraction of a scene's gist is essential to understanding an
event (i.e., a period of time with a distinct beginning and ending). SPECT proposes a distinction between front-end and back-end mechanisms. Front-end mechanisms are involved with processing information within single eye fixations, including visual processing of a scene's gist (i.e., information extraction). Back-end mechanisms are involved in processing information in memory across multiple fixations, particularly in constructing the current event model in working memory (i.e., one's online representation of what is happening now). Scene gist plays an important role in both front- and back-end processes. The gist of a scene is recognized within a single eye fixation during the process of information extraction in the front end (Greene & Oliva,
2009; Larson,
2012), which is important for laying the foundation of the current event model in the back end (Loschky et al.,
2019). Importantly, SPECT proposes that back-end processes involved in event-model construction influence front-end processes involved in information extraction. Let's suppose a viewer is watching a handheld video of someone walking from their office to a parking lot. Within the first fixation, the viewer recognizes the gist of the first scene category—for instance, that it is an office. Will the viewer need to extract the same gist on each fixation (office, office, office, etc.) even when the scene category repeats? As noted earlier, many have claimed that scene-gist recognition processes are automatic and feed-forward (e.g., Joubert et al.,
2007); therefore, observers may extract the gist of the scene anew on every fixation. We call this hypothesis the
feed-forward gist hypothesis. Nevertheless, it would seem to be a waste of cognitive resources to extract the same gist on many consecutive fixations. Thus, fewer resources may be required to extract the scene location on subsequent fixations if the scene information is consistent with what one expects (i.e., when the scene category is not expected to change). We call this hypothesis the
scene-gist priming hypothesis.