Abstract
Historically, scene understanding has focused on the understanding of individual objects and their locations. However, recent computational approaches have shown that regularities in spatial layout can mediate scene recognition. These approaches are not mutually exclusive as the layout of scenes (e.g. bedroom) is often constructed from the arrangement of large diagnostic objects (e.g. bed). In Experiment 1, we set up a pairwise comparison matrix of eight indoor categories, testing categorization performance along a range of short presentation times (20–80 msec). These data provide a measure of confusion between one indoor category to another, and suggest that categories whose objects are diagnostic of the indoor category and/or who form a regular layout (e.g. conference room) were more easily discriminable than categories whose objects were not diagnostic, and/or whose layout varied (e.g. office). How much of this performance is due to the generalized layout from these objects and how much is due to the objects themselves? In Experiment 2, we presented the images for 200 msec at varying degrees of blur, maintaining coarse layout but making individual objects unrecognizable. We found that categories whose objects form a diagnostic layout were recognized at a coarser scale than categories for which layout was not diagnostic. Altogether, the results suggest that scene gist understanding seems to be mediated by a hierarchy of regularities, in both spatial layout and in its objects. We show the extent to which statistical regularities of objects and their layout across scene categories predict human performances in rapid scene categorization tasks.