Abstract
Humans can grasp the gist of complex natural scenes very quickly and can remember extraordinarily rich details in thousands of scenes viewed for a very brief period. This amazing ability of rapid scene perception challenges both the traditional view of image-based, bottom-up visual processing and more recent models of scene categorization based on global visual features and features at low spatial frequency. We developed ultra-sparse structural representations of natural scenes using natural scene structures as encoding units. Each natural scene structure is a spatial concatenation of a set of structured patches in natural scenes and each structured scene patch is a concatenation of independent components of natural scenes. Natural scene structures convey various amount of information about scene identities and categories since general structures are shared by more scenes while specific structures are shared by only a few scenes. Thus, any natural scene and category can be represented by a probability distribution based on a set of natural scene structures and their spatial concatenations. These structural representations require no isolation of objects or figure-background segmentation, nor computation of global scene features. We tested this model of rapid scene categorization. We compiled a set of informative natural scene structures for each scene category and then, from a set of natural scenes, constructed experimental stimuli that consisted of only the selected natural scene structures. We either maintained or shuffled the spatial locations of the natural scene structures in the original scenes. We found that the subjects' categorization performance was significantly above chance even when the selected scene structures covered only a few percent of the area of the scenes. Furthermore, shuffling the spatial locations of the scene structures significantly reduced the subjects' performance. These results support our model of probabilistic, ultra-sparse, structural representations of natural scene categories.
Meeting abstract presented at VSS 2012