**Ensemble perception, the extraction of a statistical summary of multiple instances of a feature, enables efficient processing of information. Here we investigated whether ensemble representations can be formed for facial attractiveness, a socially important complex feature. After verifying that our face stimuli produced by geometric morphing represented a valid continuum of attractiveness (Experiment 1), we asked participants to compare the average attractiveness of four faces with a single probe face. Whether the four faces were homogeneous or heterogeneous resulted in highly similar performance levels, suggesting the visual system could extract an ensemble representation of the attractiveness of a heterogeneous group of faces. Statistical simulations with human-level bias and noise indicated participants did not rely on subsampling one random face or the most/least attractive face from the array (Experiment 2). Ensemble perception of facial attractiveness was not affected by variance in the stimulus array (Experiment 3), did not depend on memory of individual faces in the array (Experiment 4), and could be extended to larger arrays with faces asymmetrically distributed around the set mean (Experiment 5). Our findings give further evidence to the prevalence of perception of statistical regularities in vision.**

*SD*= 1.92) from the Sun Yat-sen University participated with informed consent and received monetary compensation. All participants had normal or corrected-to-normal vision.

*contrast*level (left − right). Whether the face on the left or on the right was more attractive was randomized and balanced. After 2,000 ms, the two faces disappeared, and the participants were instructed to indicate which face was more attractive with a key press. There was no intertrial interval (ITI). After 20 practice trials, participants completed 360 trials (30 trials for each contrast level), which took approximately 15 min. Their response and response time were collected.

*SD*from the individual means (mean discard rate: 1.83% ± 0.73%). For the remaining trials, we define a “positive response” as judging the face on the left as more attractive and a “negative response” otherwise. We used the logistic psychometric function to describe the positive response rate as a function of the contrast level, defined by Wichmann and Hill (2001):

*F*(

*x*;

*α, β*) being the logistic function:

*x*is the contrast level,

*γ*is the guess rate (the probability of giving a positive response when the stimulus is not detected), and

*λ*is the lapse rate (the probability of missing a positive response independent of stimulus intensity). In a two-alternative, forced-choice (2AFC) paradigm, the psychometric function models the proportion of one particular response over the other, and the guess rate

*γ*should be set to zero in model fitting to estimate

*α*,

*β*, and

*λ*(Wichmann & Hill, 2001). In the logistic function,

*α*is the contrast level that corresponds to a 50% positive response rate, i.e.,

*F*(

*x*=

*α*;

*α, β*) = 0.5, which will be used as the threshold of the psychometric function.

*β*is the slope of the psychometric function that indicates the rate of change.

*α*,

*β*, and

*λ*) were estimated for each participant using maximum likelihood estimation (MLE). To search for the globally optimal estimates, MLE was run with a grid of different initializing values. We used the log-likelihood ratio to measure the goodness of fit, defined by Wichmann and Hill (2001, equation 5):

*ℒ*

_{1}is the likelihood of the best-fitting model, and

*ℒ*

_{2}is the likelihood of the saturated model with no residual errors. The best-fitting model is one that has a unique combination of estimates for

*α, β*, and

*γ*, such that it generates the greatest log-likelihood given the inputs (the contrast level in each trial) and the outputs (the observed positive response rates). The likelihood (

*ℒ*

_{1}or

*ℒ*

_{2}) is calculated as

*N*is the total number of trials, and

*p*is the probability of observing positive response rate

*y*given contrast level

_{k}*x*and the parameter estimates

_{k}*α*

_{0},

*β*

_{0}, and

*γ*

_{0}in the

*k*th trial. The

*p*value associated with

*D*was estimated by 400 Monte Carlo simulations from the best-fitting model to obtain the distribution of the simulated

*Ds*. The standard errors of the parameters were estimated with 400 runs of bootstrapping.

*t*and

_{75}*t*are the contrast levels corresponding to the 75% and 25% positive response rates, respectively, derived from the psychometric function.

_{25}*D*s ≤ 19.91,

*p*s ≥ 0.578) except Participant LHL (

*D*= 32.20,

*p*= 0.045), possibly due to a high level of noise in the data.

*encoding conditions*: the

*homogeneous*condition, where the four faces were identical, and the

*heterogeneous*condition, where they differed from their mean by ±5° and ±10°. Positions of the four faces in the grid were randomly assigned. The probe was a single face placed at the center of the screen (Figure 3B), which differed from the mean of the encoding array by positive or negative 3°, 4°, 8°, 9°, or 12°. This difference defined the

*contrast level*(probe − mean). To ensure each contrast level could be tested multiple times against a different set mean, the choice of set means was restricted to between 11° and 40°.

*SD*from the individual mean were discarded (mean discard rate: 1.79% ± 0.53%). We defined a “positive response” as judging the probe as more attractive than the set mean. The same psychometric function specified by Equations 1 and 2 were fitted to the positive response rate for each participant and each encoding condition with the independent variable

*x*being the contrast level. Only the threshold (

*α*) and slope (

*β*) parameters indicate properties of the underlying sensory mechanisms. The guess rate (γ) is set to zero, determined by the 2AFC procedure. The lapse rate (

*λ*), which is independent of the stimulus level, can be assumed constant between the two encoding conditions. We thus fixed

*λ*at 0.02 in fitting the psychometric functions. We compared the threshold (

*α*) and slope (

*β*) between the two encoding conditions to examine difference in performance.

*α*and the slope

*β*were estimated in three ways: (a) the two encoding conditions fitted with a single function (i.e., one

*α*and one

*β*), (b) the two encoding conditions fitted with

*β*fixed and

*α*free to vary (i.e., two

*α*s and one

*β*), and (c) the two encoding conditions fitted with both

*α*and

*β*free to vary (i.e., two

*α*s and two

*β*s). We computed the log-likelihood ratio

*ℒ*

_{a}_{,}

*between models a and b to assess difference in the threshold and the log-likelihood ratio*

_{b}*ℒ*

_{b}_{,}

*between models b and c to assess difference in the slope. Four hundred runs of Monte Carlo simulations were used to estimate distribution of the log-likelihood ratios and the associated*

_{c}*p*values.

*p*). (d) A Bernoulli sampler then generated a binary response (

*k = 0, 1*) with the probability

*D*, defined in Equation 3, between the likelihood of the actual psychometric function and the likelihood of the saturated model for the simulated data.

*t*test),

*t*(15) = 0.69,

*p*= 0.497, 95% CI = [−0.01, 0.02] for the difference between the means [heterogeneous − homogeneous], Cohen's

*d*= 0.17.

*D*s ≤ 19.70,

*p*s ≥ 0.30). Results from a representative participant (Participant HYF) are shown in Figure 4A. Supplementary Table S1 summarizes model-fitting results for each individual participant. Nested hypothesis testing revealed no significant difference in the threshold for 14 out of 16 participants (

*ℒs*∈[0.003, 2.720],

_{a,b}*p*s∈[0.105, 0.957]), and no significant difference in the slope for 14 out of 16 participants (

*ℒs*∈[0.004, 0.530],

_{a,b}*p*s∈[0.440, 0.957]). Participants 2-1 and 2-8 had (marginal) differences in the threshold (

*ℒ*= 6.57,

_{a,b}*p*= 0.014 for Participant 2-1;

*ℒ*= 2.78,

_{a,b}*p*= 0.079 for Participant 2-8), and Participants 2-5 and 2-11 had (marginal) differences in the slope (

*ℒ*= 3.60,

_{b,c}*p*= 0.067 for Participant 2-5;

*ℒ*= 4.83,

_{b,c}*p*= 0.038 for Participant 2-11). However, combining the effect of threshold and slope, none of these participants showed a significant difference between the two encoding conditions (

*ℒs*∈[2.64, 5.50],

*p*s∈[0.103, 0.323]). Confirming the subject-based comparisons, paired-sample

*t*tests for all participants combined showed no difference between the homogeneous and heterogeneous conditions for both the threshold,

*t*(15) = 1.26,

*p*= 0.228, 95% CI = [−2.41, 0.62], Cohen's

*d*= 0.31, and the slope,

*t*(15) = −0.83,

*p*= 0.421, 95% CI = [−0.01, 0.02], Cohen's

*d*= 0.21. This result strongly supported our hypothesis that the participants could perform the task in the heterogenous condition as well as they could in the homogeneous condition.

*t*test showed a significant difference in positive response rate between “probe < mean” and “probe > mean,”

*t*(15) = 5.27,

*p*< 0.001, 95% CI = [0.13, 0.31], Cohen's

*d*= 1.32. Importantly, among these trials, although the probe itself was always below the midpoint of the attractiveness continuum, the positive response rate was still greater when the probe was above the set mean compared to when it was below the set mean (mean difference = 22.26%). Similarly, we performed a

*t*test on probes that were above the midpoint of continuum (ranged between 26° and 31°) and found a significant difference between the “probe > mean” and “probe < mean” conditions,

*t*(15) = 5.13,

*p*< 0.001, 95% CI = [0.09, 0.22], Cohen's

*d*= 1.28. Among these trials, even though the probe was always above the midpoint, the positive response rate was lower when it was below the set mean (mean difference = 15.89%). These results suggested that participants did derive their response from comparison between the probe and the encoding array.

*small*variance condition), ±5° and ±15° (

*medium*variance condition), or ±5° and ±20° (

*large*variance condition). To ensure the same set means could be tested in all three conditions, we restricted the choice of set mean to the range 21°–30°. The probe was a single face placed at the center of the screen, separated from the mean by positive or negative 2° to 20° in a step of 2°.

*SD*from the individual mean (mean discard rate: 2.02% ± 0.68%). Difference in the threshold and the slope between the three variance conditions was examined using nested hypothesis testing.

*F*(2, 32) = 1.88,

*p*= 0.169,

*D*s ≤ 24.90,

*p*s ≥ 0.168) except for Participant XH in the small variance condition (

*D*= 24.93,

*p*= 0.077) and Participant 2-8 in the large variance condition (

*D*= 25.81,

*p*= 0.025) (Supplementary Table S3). The psychometric functions for a representative participant (Participant JXY) are shown in Figure 5. All 17 participants had a nonsignificant difference in the slope across variance conditions (

*ℒs*∈[0.06, 4.04],

_{b,c}*p*s∈[0.151, 0.972]). In terms of the threshold, 15 out of 17 participants showed nonsignificant difference (

*ℒs*∈[0.07, 4.08],

_{a,b}*p*s∈[0.122, 0.956]). Participant 2-1 had a significant difference (

*ℒ*= 8.66,

_{a,b}*p*= 0.012), and Participant MZF had marginally significant difference (

*ℒ*= 4.68,

_{a,b}*p*= 0.085). However, both showed no difference across the three variance conditions if effects of the threshold and the slope were combined in testing (

*ℒ*= 6.59,

*p*= 0.401 for Participant 2-1;

*ℒ*= 8.75,

*p*= 0.414 for Participant MZF). At the group level, repeated-measures ANOVA showed variance had a nonsignificant effect on the threshold,

*F*(2, 32) = 1.98,

*p*= 0.155,

*F*(2, 32) = 4.48,

*p*= 0.019,

*t*tests with Bonferroni correction: small vs. medium:

*p*= 1.000; medium vs. large:

*p*= 0.149; small vs. large:

*p*= 0.077).

*member*condition) or a new face that differed from any face in the encoding set by at least 7° (

*nonmember*condition). Participants indicated with a key press whether the probe was a member of the encoding array or not. Response time was unlimited, and the probe remained on the screen until a response was given. There was no ITI.

*member*condition and 80 trials of the

*nonmember*condition. Each participant completed 360 trials in total.

*SD*from the individual mean were discarded (mean discard rate: 2.07% ± 0.93%). We used

*d′*as a metric to compare participants' sensitivity in the ensemble perception task (Experiment 3) and in the working memory task (the current experiment). In the working memory task, we define the

*hit*rate as the proportion of the

*member*trials being correctly identified and the

*false alarm*rate as the proportion of the

*nonmember*trials being mistaken as a

*member*. The

*d′*is computed as

^{−1}is the normal inverse cumulative distribution,

*p*is the hit rate, and

_{HIT}*p*is the false alarm rate.

_{FA}*hit*rate for this task as the proportion of trials in which the probe face is more attractive than the average and the participant correctly identifies the probe as more attractive and the

*false alarm*rate as the proportion of trials in which the probe is less attractive than the average, but the participant mistakenly reports it as more attractive.

*d′*is computed following Equation 7.

*F*(2, 32) = 21.14,

*p*< 0.001,

*d′*showed significant main effects of task,

*F*(1, 16) = 487.75,

*p*< 0.001,

*F*(2, 32) = 4.51,

*p*= 0.002,

*t*tests revealed

*d′*in the ensemble perception task was greater than that in the working memory task at all variance levels,

*t*s(16) ≥ 13.96,

*p*s < 0.001, Cohen's

*d*s ≥ 3.39.

*F*(2,32) = 10.25,

*p*< 0.001,

*d′*in the ensemble task,

*F*(2, 32) = 1.76,

*p*= 0.188,

*F*(2, 32) = 9.86,

*p*< 0.001,

*d′*in the memory task, post hoc paired

*t*tests with Bonferroni correction showed a difference between the small and the large variance conditions,

*t*(16) = 4.11,

*p*= 0.027, Cohen's

*d*= 1.00, 95%

*CI*= [0.28, 0.88], and between the medium and the large variance conditions,

*t*(16) = 2.97,

*p*= 0.003, Cohen's

*d*= 0.72, 95%

*CI*= [0.14, 0.82]. In fact, mean

*d′*was not different from chance level zero in the large variance condition,

*t*(16) = −0.65,

*p*= 0.522, Cohen's

*d*= 0.16, 95%

*CI*= [−0.25, 0.13]; but was significantly different from zero in the small and medium variance conditions,

*t*s(16) ≥ 3.91,

*p*≤ 0.001, Cohen's

*d*s ≥ 0.95.

*positive bias*condition, the four faces were at −20°, +5°, +15°, and +20° around the nominal set mean, which is the average of the two extreme faces (±20°); in the

*negative bias*condition, the four faces were at −20°, −15°, −5°, and +20° around the nominal set mean. Thus, in the positive bias condition, the true set mean was more attractive than the nominal set mean and vice versa in the negative bias condition. The nominal means were chosen in the range 21°–30° on the attractiveness continuum. The probe face was separated from the nominal set mean by positive or negative 3°, 4°, 8°, 9°, or 12°. Note that this made the negative bias condition have more trials with the probe more attractive than the true set mean and vice versa for the positive bias condition. However, the psychometric functions, fitted against the contrast between the probe and the nominal contrast, would not be affected by this asymmetry.

*SD*from the individual mean were discarded (mean discard rate = 1.67% ± 0.67%). For each set size × bias condition, we fitted a psychometric function to the positive response rate against the contrast between the probe and the nominal set mean (probe − nominal mean) following the procedure described in Experiment 2.

*F*(1, 19) = 52.51,

*p*< 0.001,

*F*(2, 38) = 0.89,

*p*= 0.421,

*F*(2, 38) = 1.48,

*p*= 0.240,

*p*s ≥ 0.638; see Figure 7 for plots from two representative participants). Supplementary Table S4 summarizes psychometric model fitting for each individual participant. A two-way repeated-measures ANOVA on the threshold (Figure 8) revealed a significant main effect of the bias condition,

*F*(1, 19) = 47.22,

*p*< 0.001,

*F*(2, 38) = 2.74,

*p*= 0.077,

*F*(2, 38) = 0.38,

*p*= 0.687,

*t*tests (with Bonferroni correction) showed

*n.s*. difference in the threshold between set sizes four and eight (

*p*= 1.000) or between set sizes eight and 12 (

*p*= 0.254). The main difference was between set sizes four and 12 (

*p*= 0.048). Importantly, when we examined the effect of bias at each set size independently, paired-sample

*t*tests revealed a significant difference between the two bias conditions at all set sizes, set size four:

*t*(19) = 5.35,

*p*< 0.001, 95% CI = [1.12, 2.76], Cohen's

*d*= 1.20; set size eight:

*t*(19) = 4.64,

*p*< 0.001, 95% CI = [9.98, 2.69], Cohen's

*d*= 1.04; set size 12:

*t*(19) = 4.68,

*p*< 0.001, 95% CI = [0.88, 2.31], Cohen's

*d*= 1.05, when the difference was computed as “positive bias condition − negative bias condition”. In contrast, the slope (Figure 8) did not seem to be affected by either the bias conditions,

*F*(1, 19) = 0.39,

*p*= 0.541,

*F*(2, 38) = 0.64,

*p*= 0.532,

*F*(2, 38) = 0.56,

*p*= 0.575,

*p*s ≥ 0.245).

*t*tests) regardless of the set size. This is consistent with the hypothesis that when the middle faces skew the set mean to a more positive value, participants are less likely to judge the probe as more attractive than the (true) set mean than when the middle faces skew the set mean to a more negative value. Thus, not only the extreme faces, but also faces in the middle of the range, are pooled when average attractiveness is computed.

*not*biased toward judging the ensemble representation to be more attractive than the arithmetic mean. Attractive faces are not always average (DeBruine, Jones, Unger, Little, & Feinberg, 2007; Halberstadt, Pecher, Zeelenberg, Ip Wai, & Winkielman, 2013). This casts doubt on Walker and Vul's premise and suggests an alternative explanation is needed for the cheerleader effect.

*, 15 (3), 122–131, https://doi.org/10.1016/j.tics.2011.01.003.*

*Trends in Cognitive Sciences**, 19 (4), 392–398, https://doi.org/10.1111/j.1467-9280.2008.02098.x.*

*Psychological Science**, 12 (2), 157–162.*

*Psychological Science**, 59 (2), 171–185.*

*Psychological Record**, 22 (3), 384–392, https://doi.org/10.1177/0956797610397956.*

*Psychological Science**, 10 (4), 433–436.*

*Spatial Vision**, 57 (2), 125–135.*

*Perception & Psychophysics**, 43 (4), 393–404.*

*Vision Research**, 67 (1), 1–13.*

*Perception & Psychophysics**, 20 (2), 211–231, https://doi.org/10.1080/13506285.2012.657261.*

*Visual Cognition**, 18 (5), 1016–1026.*

*Journal of the Optical Society of America A, Optics, Image Science, and Vision**, 9 (11): 28, 1–16, https://doi.org/10.1167/9.11.28. [PubMed] [Article]*

*Journal of Vision**, 33 (6), 1420–1430, https://doi.org/10.1037/0096-1523.33.6.1420.*

*Journal of Experimental Psychology, Human Perception and Performance**, 62 (9), 1716–1722, https://doi.org/10.1080/17470210902811249.*

*Quarterly Journal of Experimental Psychology**, 20 (3), 468–473, https://doi.org/10.3758/s13423-013-0381-8.*

*Psychonomic Bulletin & Review**, 70 (5), 789–794.*

*Perception & Psychophysics**, 70 (3), 456–464, https://doi.org/10.3758/PP.70.3.456.*

*Perception & Psychophysics**, 20 (5), 409–424, https://doi.org/10.1016/0022-1031(84)90035-0.*

*Journal of Experimental Social Psychology**, 41 (6), 711–724.*

*Vision Research**, 108 (3), 233–242.*

*Journal of Comparative Psychology**, 24 (2), 187–206, https://doi.org/10.1521/soco.2006.24.2.187.*

*Social Cognition**, 144 (2), 432–446, https://doi.org/10.1037/xge0000053.*

*Journal of Experimental Psychology: General**, 15 (4): 16, 1–11, https://doi.org/10.1167/15.4.16. [PubMed] [Article]*

*Journal of Vision**, 17 (17), R751–R753, https://doi.org/10.1016/j.cub.2007.06.039.*

*Current Biology**, 35 (3), 718–734, https://doi.org/10.1037/a0013899.*

*Journal of Experimental Psychology: Human Perception and Performance**, 24 (11), 2343–2346, https://doi.org/10.1177/0956797613491969.*

*Psychological Science**, 75 (2), 278–286, https://doi.org/10.3758/s13414-012-0399-4.*

*Attention, Perception & Psychophysics**, 34 (3), 556–568, https://doi.org/10.1037/0096-1523.34.3.556.*

*Journal of Experimental Psychology: Human Perception and Performance**, 70 (8), 1581–1591, https://doi.org/10.3758/PP.70.8.1581.*

*Perception & Psychophysics**, 48 (1), 61–70.*

*Social Psychology Quarterly**, 36 (14), 1–16.*

*Perception**, 1 (2), 115–121, https://doi.org/10.1111/j.1467-9280.1990.tb00079.x.*

*Psychological Science**, 28 (11/12), 729–743.*

*Sex Roles**, 390 (6657), 279–281, https://doi.org/10.1038/36846.*

*Nature**, 142 (2), 245–250, https://doi.org/10.1016/j.actpsy.2012.11.002.*

*Acta Psychologica**. New York: McGraw-Hill.*

*Introduction to the theory of statistics*(3rd ed.)*, 70 (5), 772–788.*

*Perception & Psychophysics**, 128 (1), 56–63, https://doi.org/10.1016/j.cognition.2013.03.006.*

*Cognition**, 31 (6), 1073–1078.*

*Vision Research**, 4 (7), 739–744, https://doi.org/10.1038/89532.*

*Nature Neuroscience**, 10 (4), 437–442.*

*Spatial Vision**, 34, 101–127, https://doi.org/10.1016/j.riob.2014.10.001.*

*Research in Organizational Behavior**, 40 (1), 49–70, https://doi.org/10.1023/A:1026553619983.*

*International Journal of Computer Vision**, 3 (3), 179–197.*

*Spatial Vision**, 34 (3), 319–340.*

*Perception**, 10 (14): 19, 1–16, https://doi.org/10.1167/10.14.19. [PubMed] [Article]*

*Journal of Vision**, 11 (12): 13, 1–11, https://doi.org/10.1167/11.12.13. [PubMed] [Article]*

*Journal of Vision**, 25 (10), 1903–1913.*

*Psychological Science**, 18 (4), 556–568.*

*Developmental Science**, 41 (4), 559–574.*

*Personality and Social Psychology Bulletin**, 25 (1), 230–235, https://doi.org/10.1177/0956797613497969.*

*Psychological Science**, 47 (1), 108–118.*

*Acta Psychologica Sinica**, 63 (8), 1293–1313.*

*Perception & Psychophysics**, 42 (3), 671–684, https://doi.org/10.3758/BRM.42.3.671.*

*Behavior Research Methods**, 28 (2), 238–249, https://doi.org/10.1177/0146167202282009.*

*Personality and Social Psychology Bulletin**, 453 (7192), 233–235, https://doi.org/10.1038/nature06860.*

*Nature*