Abstract
Observers can categorize a novel scene within the first 100 ms of its onset. Researchers have suggested this is accomplished through a rapid feed-forward sweep of neural activation. Consequently, researchers have focused on examining the minimal perceptual information diagnostic of a scene's semantic category, rather than investigating the role that top-down processes play in rapid scene categorization. Thus, most scene gist studies present scenes from multiple categories in randomized sequences.Conversely, in this experiment, we tested whether scene gist recognition is facilitated by sequential expectations.To do this, we created more ecologically valid spatio-temporal "narrative" sequences of images along spatially connected routes from starting points to destinations (e.g., office, hallway, stairwell, sidewalk, parking lot). Scene images were presented one-at-a-time, and based on pilot-testing, briefly flashed (24 ms) and masked (48 ms), followed by selecting the scene category from an 8-AFC array. To reduce predictability of subsequent images, we included subsequences of randomly 1-4 scenes from each category (e.g., 1-4 office images followed by 1-4 hallways, etc.). Critically, scenes were shown in either coherent or randomized narrative sequences to test two competing alternative hypotheses: 1) "Narrative coherence": accuracy is higher for images in coherent narrative sequences because scene category expectations prime representations for to-be-presented scenes; 2) "Feed-forward": accuracy does not differ between coherent and randomized image sequences because it is a purely feed-forward process. Results: images presented in randomized sequences were categorized just as accurately as images presented in coherent sequences, consistent with the "Feed-forward" hypothesis. Furthermore, accuracy did not increase as a function of the number of sequential exposures to a single scene category (e.g., 1-4 office images in a row), suggesting that the mechanisms responsible for perceiving a scene's meaning are so rapid that prior exposures do not support extracting the gist of subsequent scenes.
Meeting abstract presented at VSS 2017