Abstract
Human visual abilities vary dramatically with location. For example, grating acuity is several hundred times higher at the fovea than in the far periphery (Frisen and Glansholm, Invest. Ophthalmol. Vis. Sci., 1975). This decline in performance with eccentricity generalizes to most visual tasks. A parsimonious hypothesis is that the representation of any visual feature is accompanied by blurring (spatial pooling) that increases with eccentricity, where the spatial scale of the blur depends on the choice of image feature. Here, we built models of two types of visual representations, luminance and spectral energy. Both models average the corresponding feature in pooling windows whose diameters grow linearly with eccentricity. We then synthesized a set of "model metamers", physically distinct images with matched model responses, which served as stimuli in psychophysical experiments to measure the window size for which human and model discrimination abilities match. These stimuli were much larger than those previously used (Freeman and Simoncelli, Nat Neurosci, 2011; Wallis et al., eLife, 2019), subtending 53.6 by 42.2 degrees of visual angle. We found the critical scaling for the luminance model was about four times smaller than that of the energy model. Further, consistent with earlier studies (Wallis et al., eLife, 2019; Deza et al., ICLR, 2019), we find smaller critical scaling when discriminating a synthesized image from a natural image than when discriminating two synthesized images. We also demonstrate that initializing metamer synthesis with a natural image (as opposed to a sample of white noise) decreases critical scaling when discriminating two synthesized images, but not when discriminating a synthesized from a natural image. We provide a coherent explanation of all these perceptual results in terms of interactions between the choice of local features, the synthesis algorithm, and the initialization.