Abstract
The visual system has been shown to be efficient at summarizing scenes – for instance, capturing the average hue of tree leaves or facial expressions in a crowd, a phenomenon known as ensemble perception. Previous studies have shown that our visual system estimates the average size of multiple objects based on their perceived sizes, rather than their retinal sizes. This means that when representing size ensembles, the averaging is performed after the visual system rescales the retinal size of the objects by their corresponding distances. However, previous studies did not investigate if (and to what extent) retinal size influences this averaging process. In this study, we investigated whether the retinal size of objects at different distances influences average size responses. Observers were presented with multiple spheres rendered across three distinct depth planes within a Virtual Reality environment. They were tasked with reporting the mean size by adjusting a probe sphere positioned in the middle plane. In Experiment 1, the spheres were displayed in an empty setting with a black background. In Experiment 2, perspective cues were added by presenting the spheres within a simulated room with textured walls. Each sphere’s size was correlated with its distance, such that the retinal size of each sphere was equal across the different depth planes. In both experiments, object sizes were overestimated in the near plane, but underestimated in the far plane. In contrast to prior research, our findings suggest that the average size is not solely determined by the perceived size of objects, but rather also influenced by their retinal size. These results underscore the complexity of the size-averaging mechanism, and highlight a more intricate interaction between depth and size processing in building ensemble representations of object sizes.