Abstract
Texture information plays a critical role in the immediate perception of scenes, objects, and materials. Past studies on human texture vision have postulated two major models following Julesz's conjecture that the visual system discriminates textures on the basis of low-level image statistics. The filter–rectify–filter (FRF) model explains a range of psychophysical data on texture segregation by assuming second-order filtering that extracts spatial variations in the energy output of first-order filters tuned to the spatial frequency and orientation. The Portilla–Simoncelli (PS) texture-statistics model describes, and synthesizes, the perception of natural textures by assuming multiple classes of image statistics, including moments, subband energy, and auto-correlation in subband energy across position, orientation, and spatial frequency. The two models may appear to be largely different from each other but are so only in mathematical expression. Given that the power spectrum is a Fourier transform of the auto-correlation function, high-level PS statistics (auto-correlations) can be expressed as a four-dimensional (x–y space / orientation / spatial frequency) energy spectrum, and this representation is equivalent to a four-dimensional extension of the FRF model in the Fourier domain. On this basis, we unify the two theories to propose a conceptually simpler model whereby texture perception is described only by the luminance spectrum and the four-dimensional energy spectrum (+ pixel moment statistics). To verify this model, we synthesized 'energy-phase randomized (ePR)' images for a variety of natural textures (>300 images, such as images of leaves, gravels, and fabrics), in the same manner as classical luminance phase randomization. Only the pixel skewness was finally matched to the original. Our psychophysical experiments of similarity rating and peripheral discrimination clearly showed that the ePR images perceptually mimicked the original natural textures with a high quality comparable to that of PS synthesized images.