Abstract
Human abilities to discriminate and identify many visual attributes vary across the visual field, and are notably worse in the periphery compared to the fovea. This is true of acuity, as well as more complex features or objects such as letters or faces. Statistical pooling models have been proposed as explanations for these variations (Balas et al., 2009). These models posit that the early visual system computes summary statistics that are locally averaged over pooling windows whose diameters grow in proportion to eccentricity. Here, we examine two pooling models over a wide field of view (FOV), one for retinal ganglion cells, which pools pixel intensity, and one for primary visual cortex, which pools local spectral energy, as measured with oriented receptive fields. To validate these models, we generate model “metamers”: stimuli that are physically different but whose pooled model responses are identical (Freeman & Simoncelli, 2011; Keshvari & Rosenholtz, 2016), and present them to subjects in a psychophysical experiment. The stimuli for both models are generated in a common computational framework that can be easily adapted to match a variety of image statistics within pooling windows. The synthetic stimuli have a large FOV of 82 by 47.6 degrees and a resolution of 3528 by 2048 pixels. We vary the model scaling values (the ratio between the pooling window diameter and eccentricity), testing values that are far lower than those previously reported (Freeman & Simoncelli, 2011, Wallis et al., 2019). Subjects are asked to discriminate pairs of synthesized images, as well as reference vs. synthesized images. Visual inspection of the images indicate that the threshold scaling values (at which the images become indistinguishable) for the two models differ by an order of magnitude, in rough correspondence with the receptive field sizes of neurons in the corresponding visual areas.