Many studies have shown that observers can accurately estimate the average feature of a group of objects. However, the way the visual system relies on the information from each individual item is still under debate. Some models suggest some or all items sampled and averaged arithmetically. Another strategy implies “robust averaging,” when middle elements gain greater weight than outliers. One version of a robust averaging model was recently suggested by Teng et al. (2021), who studied motion direction averaging in skewed feature distributions and found systematic biases toward their modes. They interpreted these biases as evidence for robust averaging and suggested a probabilistic weighting model based on minimization of the virtual loss function. In four experiments, we replicated systematic skew-related biases in another feature domain, namely, orientation averaging. Importantly, we show that the magnitude of the bias is not determined by the locations of the mean or mode alone, but is substantially defined by the shape of the whole feature distribution. We test a model that accounts for such distribution-dependent biases and robust averaging in a biologically plausible way. The model is based on well-established mechanisms of spatial pooling and population encoding of local features by neurons with large receptive fields. Both the loss functions model and the population coding model with a winner-take-all decoding rule accurately predicted the observed patterns, suggesting that the pooled population response model can be considered a neural implementation of the computational algorithms of information sampling and robust averaging in ensemble perception.

_{early}) is one of the model's parameters. Each noisy item is then translated to a population response of a set of neurons ordered by their feature preferences at a top, pooling layer with a large RF. The population response is obtained by applying a normal (for linear feature dimensions) or a wrapped normal (for circular or semicircular spaces, such as orientation or motion direction) probability density function peaked at a neuron preferring a transmitted feature (given the noise) and a SD corresponding with the width of a tuning curve. When several such stimuli are presented, the pooled response is the average of population responses to individual stimuli or the normalized sum of the individual responses. It can be seen from Figure 1A that, when population responses to the individual features overlap, the resulting average population response will accumulate more signals in the middle than at the edges, so the middle of the range has more representation than extremes (which is reminiscent of the main point of robust averaging). The amount of the overlap is defined externally by the features physically presented and internally defined by the width of the tuning curve. Therefore, the SD of the Gaussian tuning curve (σ

_{tuning}) is a crucial parameter of the model. Finally, the peak of the pooled population response can be used to decode the best representative feature (which should be approximately the mean for a symmetrical feature distribution).

_{tuning}. Figure 1B shows how the peak of a pooled response changes its location as a function of σ

_{tuning}. If a tuning curve is narrow, then the whole shape of the population response approaches the actual feature distribution and, hence, the peak gets closer to the mode. When the tuning curve gets wider, the peak shifts closer to the middle of the distribution, or the mean. It occurs because neurons, which preferentially respond to stimuli from the longer tail of a skewed distribution, gain more activation from prevailing features owing to the widespread tuning curve, and this can eventually shift the population peak away from the mode.

*bias*, our dependent variable of interest. To analyze the change of the bias as a function of the feature distribution skewness, we used a linear mixed-effects model with the fixed effect of mode–mean distance and the random effect of participant's identity, slope, and intercept via lme4 package for R (Bates, Mächler, Bolker, & Walker, 2014). To calculate

*p*-values for the regression we used Satterthwaite's (1946) method.

*S*∈ {

*S*

_{1},

*S*

_{2}, …

*S*

_{n}}, where

*S*

_{1...n}are individual orientations in degrees. Each orientation from

*S*is corrupted by Gaussian early noise in layer 1 of the model (σ

_{early}), so that the output of layer 1 is

*S’*=

*S + η*, where

*η*are independent random samples from a normal distribution with a mean of 0 and a variance of σ

^{2}

_{early}. We set the σ

_{early}= 4° as a fixed parameter of the model based on the average fits from the previous studies of orientation averaging (Dakin, 2001; Solomon et al., 2011). Each output number of layer 1 is a proxi of neural activity within a small V1-like RF responding to a single orientation falling onto this RF. Although the biologically plausible neural activity in each such RF is also a noisy population response of multiple neurons, we used the proxies for the sake of computational simplicity. We assume that this simplification is affordable by our model because it is focused on the shape of population responses at layer 2 of the model. Because the tuning curve of any neuron of interest can be presented directly as a function of the stimulus, there is no need to model the whole population response of intermediate neurons (which are layer 1 neurons in our case). Such a representation of the layer 1 output can also be interpreted as intermediate noisy decoding of individual orientations before this information is carried over to the pooling stage.

*f*) scaled to convert probability density units into spike frequency units, so that each item elicits a distribution of spikes in the whole layer 2 population in accordance with this tuning function:

_{WN}*is an orientation preference of the*

_{j}*j*-th layer 2 neuron, Sʹ

*is the noisy representation of layer 1 of the*

_{i}*i*-th item from the stimulus display as defined elsewhere in this article, σ

_{tuning}is the width of the tuning curve of layer 2 neurons approximated with a wrapped normal distribution, and

*M*= 60/max (

*f*[θ

_{WN}*;*

_{j}*S*ʹ, σ

_{i}_{tuning}]), which converts probability densities into spike frequencies, so that a neuron at peak of its preference responds with 60 spikes/s. The σ

_{tuning}was a free parameter of the model that we fit to the data.

*j*-th layer 2’s neuron shows activation as follows:

*n*is the number of individual items presented in a stimulus.

*Aʹ*(θ;

*Sʹ*, σ

_{tuning}) of a neuron preferring orientation θ

_{j}with a tuning function width of σ

_{tuning}was randomly drawn from a Poisson distribution (

*Pois*) with a mean of

*A*(θ

*; σ*

_{j}_{tuning}) + 4, where 4 was added as an average default spike rate of each neuron (baseline noise):

*m*layer 2 neurons is found and its preference defined by the peak of its tuning curve was taken as the perceived mean (

*PM*), so that:

_{j}and activation (

*A*’(θ

*)) of each of the*

_{j}*m*layer 2 neurons:

*whose presentation would be most likely to cause the observed population response given the shape of the tuning curves (σ*

_{k}_{tuning}), as defined by equation 1. The log-likelihood of the population response under each θ was determined by the Poisson density function (

*f*) applied to the spike rate

_{Pois}*A*’(θ

*) of each layer 2 neuron:*

_{j}_{tuning}) for each stimulus distribution from our experiment. The range of tuning width that we simulated for was from 20° to 90° with a step of 1°. For each stimulus mode-mean distance

*d*∈ {−20.0°, −13.7°, −7.0°, 0°, 7.0°, 13.7°, 20.0°}, each tuning width σ

_{tuning}∈ {20°, 21°, …, 90°}, and each decoding rule

*r*∈ {WTA, VA, MLE}, we ran 5,000 Monte Carlo simulations with a fixed σ

_{early}parameter, as specified elsewhere in this article. The importance of including the early noise applied to any single item into the model is explained by the fact that it can systematically shift the peak location of an encoded distribution if the generative stimulus distribution is skewed. The mean output of the 5,000 simulations termed

*m*(

*d*, σ

_{tuning},

*r*) was taken as the model's predicted bias (from each of the decoding rules) for a given combination of

*d*, σ

_{tuning}, and

*r*. The SD of the simulation outputs termed σ(

*d*, σ

_{tuning},

*r*) was taken as a measure of decoding noise for a given combination of

*d*, σ

_{tuning}, and

*r*.

_{tuning}determining the bias and the decoding noise, our model included two more parameters to account for other sources of systematic and non-systematic response variability, constant error and late noise. The constant error reflects an overall clockwise or counterclockwise bias that observers could systematically show in all responses regardless of the feature distribution. In our model, it was defined as a constant added to all predicted average decoded means in all distributions, thus shifting them either to a positive (clockwise) or to a negative (counterclockwise) direction. The range of tested constant errors was from −3° to 3°, with steps of 0.1°. Late noise (σ

_{late}) is a parameter that accounts for response variability beyond the early input noise and the decoding noise in our model (e.g., memory distortions, decision and motor factors). The range of tested σ

_{late}was from 0° to 40°, with steps of 1°.

_{tuning}, σ

_{late}, constant error, and decision rule

*r*for each observer. For each

*k*-th trial completed by the observer, we estimated the parameter likelihood as the probability of an observed response error

*E*(as defined in Design and data analysis section) under a normal distribution with mean

_{k}*m*(

*d*, σ

_{k}_{tuning},

*r*) + Constant error and a SD defined as a linear combination of variances of decoding noise and late noise. The log-likelihood for all trials, therefore, was defined as follows:

_{loss}) and the size of a random sample of items (

*N*) to which the loss function is applied. The tested σ

_{loss}ranged from 1° to 50°, with steps of 1°. The tested

*N*ranged from 2 to 24 or 36 (depending on the total number of items in a set), with steps of 1. For each combination of parameters, we ran 5,000 simulations. There were two modifications, however, as to what noise parameters we included in our simulations. Whereas Teng et al. (2021) did not include any early noise to their model, we did include it into our version of their model (σ

_{early}= 4°) as a fixed parameter, which is important, in our opinion, for the reason explained elsewhere in this article. As in the population coding model, we also included the late noise and the constant error. The most likely parameters model was found with MLE the same way, as described for the population coding model, although the search has been made across four rather than three parameters.

*p*< 0.001. In addition to the bias caused by the skew of the distribution, there was a small intercept (−0.3° based on lmm best fit) that we interpret as a counterclockwise constant error. Both the population coding model and the loss function model showed very similar performance (Figure 4A). Model comparison showed that the loss function model showed the best overall fit of our data (ΔAIC = 8.18) (Figure 5A). The population coding model with the WTA decoding rule (

*PCM*) showed the best fit (

_{WTA}*PCM*ΔAIC = 25.73) among the three versions of the population coding models (Figure 5A). The participants’ median best fit parameters of the

_{WTA}*PCM*were as follows: σ

_{WTA}_{tuning}= 35°, constant error = −0.1°, and σ

_{late}= 6° (here and in other experiments, we provide the parameters only for the best version of the population coding model). The parameters of the loss function were as follows: σ

_{loss}= 27.5°, N = 5, constant error = −0.05°, σ

_{late}= 12°.

*PCM*) and the loss function model could capture this pattern quite accurately, yet, the latter model was shown to be more preferable in terms of likelihood.

_{WTA}*p*< 0.01, suggesting that the systematic bias toward the mode of the orientation distribution growing with the mode–mean distance (Figure 4B). Model comparison showed that the

*PCM*better fit the data, ΔAIC = 32.79, than the loss function model, ΔAIC = 42.12, and that the loss function was followed by the other two models (Figure 5B). The median best fit parameters of the two best models were as follows: for the

_{WTA}*PCM*, σ

_{WTA}_{tuning}= 42°, constant error = 0.2°, and σ

_{late}= 7°; for the loss function model, σ

_{loss}= 20°,

*N*= 4, constant error = 0.2°, and σ

_{late}= 20°.

_{tuning}) and a feature distribution of a stimulus. In Experiment 3, we tested this prediction of the model by using new shapes of feature distributions while keeping the mode–mean distances almost the same as in Experiments 1 and 2. Specifically, we generated distributions in such a way that we might expect a greater bias away from the mean than in Experiments 1 and 2, given equal mode–mean distances. In Figure 6, we show an example of two such distributions (one from Experiment 1 and another from Experiment 3) with predicted population responses to these distributions. Note that these population responses are predicted based on tuning properties from an expected range based on our model fits in these experiments. Accordingly, we were also interested in how both the population coding model and the loss function model would perform in accounting for the data obtained from these new distributions.

*x*° was approximately proportional to the probability density within an interval [

*x*° – 10°;

*x*° + 10] of the generative mixture distribution. Using this method, we created distributions whose mode–mean distances were most similar to those from Experiments 1 and 2, namely, 0°, ±13.3°, and ±20.9°. For these distributions, our preliminary simulations predicted biases approximately three to four times as large in magnitude as in the corresponding distributions from Experiments 1 and 2, given that the tuning curves of model neurons are in the range of 40° to 45°. We did not use the distribution with the smallest mode–mean distance from Experiments 1 and 2 (±7°), because these distributions failed to produce a sufficient bias in Experiment 2 to make a reliable prediction. Overall, therefore, we had five fixed distributions with various amounts and directions of skewness. Histograms of these distributions are shown in Figure 7. The mean orientation of the set ensemble was randomly assigned for each trial in the same way as described in Experiment 1. Observers gave their responses using the adjustment method (also see Experiment 1 for description).

*p*< 0.001, suggesting the bias toward the mode growing in magnitude as a function of mode–mean distance (Figure 4C). Importantly, this slope was also steeper compared with the slope observed in Experiment 2, 0.16 vs. 0.05, t(94) = 2.69,

*p*< 0.01. The loss function model showed the best fit to the data, ΔAIC = 34.46 (Figure 5C). The

*PCM*was the best among the three versions of the population coding model, ΔAIC = 80.23 (Figure 5C). The best fit median parameters across participants of the best population coding model and the loss function model were as follows: for

_{WTA}*PCM*: σ

_{WTA}_{tuning}= 51.5°, constant error = 0.6°, and σ

_{late}= 14.5°; for the loss function model: σ

_{loss}= 31.5°, N = 5, bias = 0.4°, and σ

_{late}= 14°.

*p*< 0.001. The intercept of the model was 2.43°, suggesting a considerable constant error. Most important, when the sample and the test distributions had opposite skew directions, there were strong biases in opposite directions (negative when the distance was –40° and positive when the test sample mode distance was 40°), but there were much less bias (except that associated with a constant error) when the sample and the test had the same direction of skew (Figure 4D).

*PCM*, ΔAIC = 99.23, and they were then followed by other two versions of the population coding model (Figure 5D). The median best fit model parameters for the

_{WTA}*PCM*and the loss function models were as follows: for the

_{WTA}*PCM*, σ

_{WTA}_{tuning}= 84°, constant error = 3°, and σ

_{late}= 36°; for loss function model, σ

_{loss}= 26°,

*N*=3, constant error = 3°, and σ

_{late}= 18°.

**Open practices statement:**The data and scripts to analyze them are available at the Open Science Framework (https://osf.io/enwym/?view_only=a0f5821fe2614b558104b600281ae2f4), and none of the experiments were preregistered.

*Vision Research,*83, 25–39, https://doi.org/10.1016/j.visres.2013.02.018. [PubMed]

*Psychological Science,*12(2), 157–162. JSTOR. [PubMed]

*Perception & Psychophysics,*70(7), 1325–1326, https://doi.org/10.3758/PP.70.7.1325. [PubMed]

*Psychonomic Bulletin & Review,*27, 1–5, https://doi.org/10.3758/s13423-020-01718-7.

*Attention, Perception, & Psychophysics,*82(1), 63–79, https://doi.org/10.3758/s13414-019-01827-z. [PubMed]

*ArXiv:1406.5823 [Stat]*, http://arxiv.org/abs/1406.5823.

*Psychological Record,*59(2), 171–185, https://doi.org/10.1007/BF03395657.

*Neural Computation,*30(2), 428–446, https://doi.org/10.1162/neco_a_01037. [PubMed]

*PeerJ,*8, e9414, https://doi.org/10.7717/peerj.9414. [PubMed]

*Cognition,*153, 196–210, https://doi.org/10.1016/j.cognition.2016.04.018. [PubMed]

*Journal of Vision,*17(2), 21–21, https://doi.org/10.1167/17.2.21. [PubMed]

*Psychological Science,*28(10), 1510–1517, https://doi.org/10.1177/0956797617713787. [PubMed]

*Vision Research,*140, 144–156, https://doi.org/10.1016/j.visres.2017.08.003. [PubMed]

*Cognition,*196, 104075, https://doi.org/10.1016/j.cognition.2019.104075. [PubMed]

*Spatial learning and attention guidance,*vol 151. New York: Humana, https://doi.org/10.1007/7657_2019_20.

*Behavior Research Methods,*48(3), 1086–1099, https://doi.org/10.3758/s13428-015-0632-x. [PubMed]

*Vision Research,*43(4), 393–404, https://doi.org/10.1016/S0042-6989(02)00596-5. [PubMed]

*Tutorials in Quantitative Methods for Psychology,*1(1), 42–45, https://doi.org/10.20982/tqmp.01.1.p042.

*Journal of the Optical Society of America A,*18(5), 1016–1026, https://doi.org/10.1364/JOSAA.18.001016.

*Vision Research,*45(24), 3027–3049, https://doi.org/10.1016/j.visres.2005.07.037. [PubMed]

*Nature Neuroscience,*2(8), 740–745. [PubMed]

*Quarterly Journal of Experimental Psychology,*62(9), 1716–1722, https://doi.org/10.1080/17470210902811249.

*Journal of Experimental Psychology: Human Perception and Performance,*46(11), 1267–1279, https://doi.org/10.1037/xhp0000857. [PubMed]

*Behavioral and Brain Sciences,*39, e229, https://doi.org/10.1017/S0140525X15000965.

*Proceedings of the National Academy of Sciences of the United States of America,*108(32), 13341–13346, https://doi.org/10.1073/pnas.1104517108. [PubMed]

*Science,*233(4771), 1416–1419, https://doi.org/10.1126/science.3749885. [PubMed]

*Psychological Science,*32(3), 437–450, https://doi.org/10.1177/0956797620970561.

*Journal of Vision,*14(9), 22–22, https://doi.org/10.1167/14.9.22. [PubMed]

*Current Biology,*17(17), R751–R753, https://doi.org/10.1016/j.cub.2007.06.039.

*Attention, Perception, & Psychophysics,*72(7), 1825–1838, https://doi.org/10.3758/APP.72.7.1825. [PubMed]

*From perception to consciousness: Searching with Anne Treisman*(pp. 339–349). Oxford, UK: Oxford University Press.

*Scientific Reports,*11(1), 3899, https://doi.org/10.1038/s41598-021-83358-y. [PubMed]

*Journal of Vision,*18(13), 12–12, https://doi.org/10.1167/18.13.12. [PubMed]

*Attention, Perception, & Psychophysics,*83(3), 1251–1262, https://doi.org/10.3758/s13414-020-02089-w. [PubMed]

*Attention, Perception, & Psychophysics,*75(2), 278–286, https://doi.org/10.3758/s13414-012-0399-4. [PubMed]

*Nature Neuroscience,*9(5), 690–696, https://doi.org/10.1038/nn1691. [PubMed]

*Proceedings of the Royal Society B: Biological Sciences,*285(1879), 20172770, https://doi.org/10.1098/rspb.2017.2770.

*Journal of Vision,*18(9), 23–23, https://doi.org/10.1167/18.9.23. [PubMed]

*Attention, Perception, & Psychophysics,*81(8), 2850–2872, https://doi.org/10.3758/s13414-019-01792-7. [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance,*46(9), 1013–1028, https://doi.org/10.1037/xhp0000804. [PubMed]

*Nature Communications,*7, 13186, https://doi.org/10.1038/ncomms13186. [PubMed]

*PLoS Computational Biology,*13(8), e1005723, https://doi.org/10.1371/journal.pcbi.1005723. [PubMed]

*Nature Neuroscience,*9(11), 1432–1438, https://doi.org/10.1038/nn1790. [PubMed]

*Acta Psychologica,*142(2), 245–250, https://doi.org/10.1016/j.actpsy.2012.11.002. [PubMed]

*Vision: A computational approach,*San Francisco: Freeman & Co, https://apastyle.apa.org/style-grammar-guidelines/references/examples/book-references.

*Journal of Vision,*15(4), 6–6, https://doi.org/10.1167/15.4.6. [PubMed]

*Journal of the Optical Society of America A,*33(3), A22–A29, https://doi.org/10.1364/JOSAA.33.000A22.

*Psychological Science,*11(6), 502–506, https://doi.org/10.1111/1467-9280.00296. [PubMed]

*Proceedings of the National Academy of Sciences of the United States of America,*111(21), 7873–7878, https://doi.org/10.1073/pnas.1308674111. [PubMed]

*Perception & Psychophysics,*70(5), 772–788, https://doi.org/10.3758/PP.70.5.772. [PubMed]

*Journal of Neuroscience,*22(21), 9530–9540.

*Nature Neuroscience,*4(7), 739–744. [PubMed]

*Behavior Research Methods,*51(1), 195–203, https://doi.org/10.3758/s13428-018-01193-y. [PubMed]

*Nature Reviews Neuroscience,*1(2), 125–132, https://doi.org/10.1038/35039062. [PubMed]

*Biometrics Bulletin,*2(6), 110–114, https://doi.org/10.2307/3002019.

*Behavior Research Methods,*49(4), 1241–1260, https://doi.org/10.3758/s13428-016-0783-4. [PubMed]

*Proceedings of the National Academy of Sciences of the United States of America,*90(22), 10749–10753. [PubMed]

*Journal of Vision,*10(14), 19–19, https://doi.org/10.1167/10.14.19. [PubMed]

*Journal of Vision,*11(12), 13–13, https://doi.org/10.1167/11.12.13. [PubMed]

*Journal of Vision,*21(5), 2, https://doi.org/10.1167/jov.21.5.2. [PubMed]

*Visual Cognition,*14(4–8), 411–443, https://doi.org/10.1080/13506280500195250. [PubMed]

*Nature Neuroscience,*3(3), 270–276, https://doi.org/10.1038/72985. [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance,*46(5), 458–473, https://doi.org/10.1037/xhp0000727. [PubMed]

*A population response model of ensemble coding.*bioRxiv, https://doi.org/10.1101/2022.01.19.476871.

*Vision Research,*32(5), 931–941, https://doi.org/10.1016/0042-6989(92)90036-I. [PubMed]

*Vision Research,*29(1), 47–59, https://doi.org/10.1016/0042-6989(89)90173-9. [PubMed]

*Proceedings of the National Academy of Sciences of the United States of America,*104(9), 3532–3537, https://doi.org/10.1073/pnas.0611288104. [PubMed]

*Vision Research,*50(22), 2274–2283, https://doi.org/10.1016/j.visres.2010.04.019. [PubMed]

*Psychonomic Bulletin & Review,*18(3), 484–489, https://doi.org/10.3758/s13423-011-0071-3. [PubMed]

*Annual Review of Psychology,*69(1), 105–129, https://doi.org/10.1146/annurev-psych-010416-044232. [PubMed]

*Journal of Experimental Psychology. General,*149, 1811–1822, https://doi.org/10.1037/xge0000745. [PubMed]