Abstract
Past research has argued that scene gist, a holistic semantic representation of a scene acquired within a single fixation, is extracted using purely feed-forward mechanisms. As such, scene gist recognition studies have presented scenes from multiple categories in randomized sequences. We tested whether rapid scene categorization could be facilitated by priming from sequential expectations. We created more ecologically valid, first-person viewpoint, image sequences, along spatiotemporally connected routes (e.g., an office to a parking lot). Participants identified target scenes at the end of rapid serial visual presentations. Critically, we manipulated whether targets were in coherent or randomized sequences. Target categorization was more accurate in coherent sequences than in randomized sequences. Furthermore, categorization was more accurate for a target following one or more images within the same category than following a switch between categories. Accuracy was also higher following two primes from the same category than following only one between scene categories (e.g., multiple office primes facilitated recognition of a hallway). Likewise, accuracy was higher for targets more visually similar to their immediately preceding primes. This suggests that prime-to-target visual similarity may explain the coherent sequence advantage. We tested this hypothesis in a second experiment, which was identical except that target images were removed from the sequences, and participants were asked to predict the scene category of the missing target. Missing images in coherent sequences were more accurately predicted than missing images in randomized sequences, and more predictable images were identified more accurately in Experiment 1. Importantly, partial correlations revealed that image predictability and prime-to-target visual similarity independently contributed to rapid scene gist categorization accuracy.