Abstract
People can accurately categorize natural scenes with brief presentations (Potter, 1975). It has been suggested that this efficient processing is mediated by global texture, e.g., orientation and spatial frequency (Oliva & Torralba, 2001). Recent work, however, showed that local structure, e.g., vertex types and angles, can explain human scene categorization (Shen & Walther, 2012). Here, we independently manipulated the availability of global texture or local structure of scenes. In experiment 1, we selectively removed either category-specific amplitude information or phase information of an image by normalizing amplitude or phase across all images. These manipulated images along with intact images were presented to participants briefly (27-80ms, M=43ms) to categorize the images into beaches, city streets, forests, highways, mountains, and offices. Although removal of phase and amplitude both caused a significant decrease in accuracy, removal of phase was significantly more detrimental than removal of amplitude. In addition, error patterns were significantly correlated between the intact and amplitude-removed images, but not between the intact and phase-removed images. In experiment 2, we manipulated the visual features available in line drawings. Specifically, we used intact line drawings of natural scenes, contour-shifted line drawings, in which vertex information was distorted, and rotated line drawings, in which orientation information was distorted. The task was identical to experiment 1, except for using 27ms of presentation for all participants. The results showed not only a higher accuracy for rotated than contour-shifted images, but also a significant correlation in error patterns between intact and rotated images, but not between intact and contour-shifted images. Moreover, error patterns were significantly correlated between phase-removed and contour-shifted as well as between amplitude-removed and rotated images in spite of a large difference in pixel-by-pixel similarity. Our findings together suggest that human scene categorization relies more on local structure than global texture.
Meeting abstract presented at VSS 2013