Abstract
An important hypothesis that emerged from crowding research is that the perception of image structure in the periphery is texture-like. We investigate this hypothesis by measuring perceptual properties of a family of naturalistic textures generated using Deep Neural Networks (DNNs), a class of algorithms that can identify objects in images with near-human performance. DNNs function by stacking repeated convolutional operations in a layered feedforward hierarchy. Our group has recently shown how to generate shift-invariant textures that reproduce the statistical structure of natural images increasingly well, by matching the DNN representation at an increasing number of layers. Here, observers discriminated original photographic images from DNN-synthesised images in a spatial oddity paradigm. In this paradigm, low psychophysical performance means that the model is good at matching the appearance of the original scenes. For photographs of natural textures (a subset of the MIT VisTex dataset), discrimination performance decreased as the DNN representations were matched to higher convolutional layers. For photographs of natural scenes (containing inhomogeneous structure), discrimination performance was nearly perfect until the highest layers were matched, whereby performance declined (but never to chance). Performance was only weakly related to retinal eccentricity (from 1.5 to 10 degrees) and strongly depended on individual source images (some images were always hard, others always easy). Surprisingly, performance showed little relationship to size: within a layer-matching condition, images further from the fovea were somewhat harder to discriminate but this result was invariant to a three-fold change in image size (changed via up/down sampling). The DNN stimuli we examine here can match texture appearance but are not yet sufficient to match the peripheral appearance of inhomogeneous scenes. In the future, we can leverage the flexibility of DNN texture synthesis for testing different sets of summary statistics to further refine what information can be discarded without affecting appearance.
Meeting abstract presented at VSS 2016