Abstract
Ensemble perception is referred to as an ability to rapidly and accurately acquire the information about summary statistics of multiple objects, such as mean feature or feature variance. Recent research suggests that richer distributional information, such as the shape of a feature distribution, can also be accessed. The computational mechanism of ensemble perception is still under debate and existing models typically apply to mean perception (although there are a few exceptions considering variability perception alike). Here, we propose a simple, neurally plausible model of ensemble representation that generalizes across statistical summaries, feature spaces, and tasks. The model is implemented in a two-layer neural network. Layer 1 (L1) represents individual items as independent units corrupted by early noise (this resembles small, V1-like receptive fields, RF). These noisy items converge at L2 simulating spatial pooling of local signals, like in neural populations with a large RF (such as V4). Each L2 neuron applies its Gaussian tuning curve to each L1’s noisy input centered around a neuron preferring this stimulus. L2 then pools all these component population responses by averaging. The resulting pooled distribution represents an ensemble. Distributional properties of the pooled response are defined only by the stimulus and two free parameters: the amount of early noise and the width of the tuning curve. The peak of the pooled response is decoded as the average, its width is decoded as variability, and its shape is somewhat isomorphous to the real feature distribution. We simulated data from multiple published papers on ensemble perception with our model and found that it predicted various patterns across different feature domains successfully. Our model explained average or variance discriminability as a function of stimulus variance and set size, and sensitivity to the shape of feature distribution.