Abstract
Natural images contain information at multiple spatial scales. To extract this information, the visual system parses these images using independent, spatial-frequency selective channels. However, it is not well understood how these channels are recombined to produce a unified visual percept. To illuminate this process, we measured how well human observers discriminate the presence or absence of naturalistic structure in texture images. We used the method of Portilla & Simoncelli (2000) to generate families of texture images, based on a single ancestral image, that span the range from fully naturalistic structure to spectrally matched noise. We then measured how this “naturalness” sensitivity of human observers depended on signals in different spatial frequency bands, using three different experimental manipulations. First, we removed high frequencies using low-pass filtering. Second, we shifted the frequency spectrum by rescaling the images (as if changing viewing distance). Third, we presented images at different eccentricities. The effect of all three manipulations can be explained if sensitivity to image naturalness disproportionately depends on high-frequency information. Analysis of image statistics present in the image sets indeed shows that high-frequency features provide a more reliable naturalness signal than low-frequency features. Different families of texture images showed idiosyncratically different dependences on different frequency bands. To ask how information in different frequency bands combines to support texture discrimination, we have measured naturalness sensitivity with textures made from mixtures of bandpass filtered components. Information in nearby bands (within an octave) combines efficiently, while information in bands spaced further apart is processed more independently. Our results suggest that the perception of natural image structure is most strongly mediated by information at fine spatial scales and in nearby frequency bands.