Purchase this article with an account.
Christina Funke, Thomas Wallis, Alexander Ecker, Leon Gatys, Felix Wichmann, Matthias Bethge; A parametric texture model based on deep convolutional features closely matches texture appearance for humans. Journal of Vision 2017;17(10):1081. doi: https://doi.org/10.1167/17.10.1081.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Much of our visual environment consists of texture—"stuff" like cloth, bark or gravel as distinct from "things" like dresses, trees or paths—and we humans are adept at perceiving textures and their subtle variation. How does our visual system achieve this feat? Here we psychophysically evaluate a new parameteric model of texture appearance (the CNN texture model; Gatys et al., 2015) that is based on the features encoded by a deep convolutional neural network (deep CNN) trained to recognise objects in images (the VGG-19; Simonyan and Zisserman, 2015). By cumulatively matching the correlations of deep features up to a given layer (using up to five convolutional layers) we were able to evaluate models of increasing complexity. We used a three-alternative spatial oddity task to test whether model-generated textures could be discriminated from original natural textures under two viewing conditions: when test patches were briefly presented to the parafovea ("single fixation") and when observers were able to make eye movements to all three patches ("inspection"). For 9 of the 12 source textures we tested, the models using more than three layers produced images that were indiscriminable from the originals even under foveal inspection. The venerable parameteric texture model of Portilla and Simoncelli (Portilla and Simoncelli, 2000) was also able to match the appearance of these textures in the single fixation condition, but not under inspection. Of the three source textures our model could not match, two contain strong periodicities. In a second experiment, we found that matching the power spectrum in addition to the deep features used above (Liu et al., 2016) greatly improved matches for these two textures. These results suggest that the features learned by deep CNNs encode statistical regularities of natural scenes that capture important aspects of material perception in humans.
Meeting abstract presented at VSS 2017
This PDF is available to Subscribers Only