Open Access
Article  |   March 2023
Perceptual history biases in serial ensemble representation
Author Affiliations
  • Noam Khayat
    ELSC Edmond & Lily Safra Center for Brain Research & Life Sciences Institute, Hebrew University, Jerusalem, Israel
    noamkhayat@gmail.com
  • Merav Ahissar
    ELSC Edmond & Lily Safra Center for Brain Research & Psychology Department, Hebrew University, Jerusalem, Israel
    msmerava@gmail.com
  • Shaul Hochstein
    ELSC Edmond & Lily Safra Center for Brain Research & Life Sciences Institute, Hebrew University, Jerusalem, Israel
    shaulhochstein@gmail.com
Journal of Vision March 2023, Vol.23, 7. doi:https://doi.org/10.1167/jov.23.3.7
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Noam Khayat, Merav Ahissar, Shaul Hochstein; Perceptual history biases in serial ensemble representation. Journal of Vision 2023;23(3):7. https://doi.org/10.1167/jov.23.3.7.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Ensemble perception refers to the visual system's ability to efficiently represent groups of similar objects as a unified percept using their summary statistical information. Most studies focused on extraction of current trial averages, giving little attention to prior experience effects, although a few recent studies found that ensemble mean estimations contract toward previously presented stimuli, with most of these focusing on explicit perceptual averaging of simultaneously presented item ensembles. Yet, the time element is crucial in real dynamic environments, where we encounter ensemble items over time, aggregating information until reaching summary representations. Moreover, statistical information of objects and scenes is learned over time and often implicitly and then used for predictions that shape perception, promoting environmental stability. Therefore, we now focus on temporal aspects of ensemble statistics and test whether prior information, beyond the current trial, biases implicit perceptual decisions. We designed methods to separate current trial biases from those of previously seen trial ensembles. In each trial, six circles of different sizes were presented serially, followed by two test items. Participants were asked to choose which was present in the sequence. Participants unconsciously rely on ensemble statistics, choosing stimuli closer to the ensemble mean. To isolate the influence of earlier trials, the two test items were sometimes equidistant from the current trial mean. Results showed membership judgment biases toward current trial mean, when informative (largest effect). On equidistant trials, judgments were biased toward previously experienced stimulus statistics. Comparison of similar conditions with a shifted stimulus distribution ruled out a bias toward an earlier, presession, prototypical diameter. We conclude that ensemble perception, even for temporally experienced ensembles, is influenced not only by current trial mean but also by means of recently seen ensembles and that these influences are somewhat correlated on a participant-by-participant basis.

Introduction
Ensemble perception has been suggested as a cerebral strategy to manage the limited capacity (Cowan, 2001; Luck & Vogel, 1997) of attentional and cognitive resources (Cohen et al., 2016). While real-life environments comprise complex and dynamic objects, instead of representing all of them in detail, the visual system summarizes similar and redundant information as statistical properties such as the ensemble mean (Arieli, 2001; Corbett & Oriet, 2011; Khayat & Hochstein, 2018), range (Khayat & Hochstein, 2018), variance (Chetverikov et al., 2016; Semizer & Boduroglu, 2021; Solomon et al., 2011), and distribution (Chetverikov et al., 2016, 2017, 2020). These summarized representations are used to perceive the “gist” or the essence of visual ensembles (Alvarez & Oliva, 2009; Hochstein & Ahissar, 2002). They are formed at multiple processing levels, from those representing low-level features (Ariely, 2001; Corbett & Oriet, 2011; Khayat & Hochstein, 2018) to high-level categorical (Khayat et al., 2021; Khayat & Hochstein, 2019) and social information (Haberman & Whitney, 2007, 2009), thus serving multiple sensory functions (for recent reviews, see Bauer, 2015; Cohen et al., 2016; Corbett, Utochkin, & Hochstein, 2022; Whitney & Yamanashi Leib, 2018). 
Perceptual averaging is the most common measure of ensemble representation—the mean of a set of items was shown to be extracted and judged better than the features of individual set items (Alvarez & Oliva, 2008; Ariely, 2001; Haberman & Whitney, 2007). There is much debate concerning ensemble perception and whether it entails integrating information from all of the presented set or subsampling a limited number of items from the set, and we relate to this issue at length in the Discussion and attached Appendix. We shall use the term “ensemble perception” as has been used in any case, whether the ensemble is perceived by subsampling or sampling the entire ensemble (see reviews, Corbett, Utochkin, & Hochstein, 2022; Whitney & Yamanashi Leib, 2018). 
The mean was also shown to be effectively represented implicitly—observers tend to estimate items as members of a sequence as a function of their proximity to the ensemble mean (Ariely, 2001; Khayat & Hochstein, 2018), and their memory of individual item properties tends to be biased toward the ensemble mean (Brady & Alvarez, 2011; Utochkin & Brady, 2020). 
Despite extensive research of ensemble perception over the past two decades, little attention was given to the question of how perceptual averaging is affected by prior information. Numerous studies pointed out that perception, in general, is not merely the processing of current sensory input but a consequence of integrating stimulus information with internal models based on prior knowledge of stimuli and environmental statistics (e.g., Ashourian, & Loewenstein, 2011; Fischer & Whitney, 2014; Raviv et al., 2012; Raviv et al., 2014). These contraction biases were shown to occur early in the perceptual stage, rather than in a late decisional stage (Loewenstein et al., 2021). Real-world scenes are typically composed of massive sensory information. Top-down processing allows us to quickly select and understand sensory information (Hochstein & Ahissar, 2002). In addition, low-level perceptual regions can learn from prior experience, as well (e.g., Ahissar & Hochstein, 1993), and bias higher-level perception by bottom-up processing. In both cases, in the context of visual ensembles, these processes benefit from previous experience that supports the visual stability of complex environments (Fischer & Whitney, 2014). 
The contraction of perception toward past stimuli—namely, the tendency to perceive stimuli as more similar to those expected based on previous experience—has been mainly discussed in recent years by serial dependence (Fischer & Whitney, 2014; Fritsche et al., 2017). The various effects of recent events were shown to occur at different processing levels, from early perception to working memory (Bliss et al., 2017; Kiyonaga et al., 2017), and for different visual features such as orientation (Fritsche et al., 2017), position (Manassi et al., 2018), and face expression (Liberman et al., 2018). While most studies describe the common attraction (positive) bias to recent stimuli (e.g., Cicchini et al., 2018; Fischer & Whitney, 2014; Liberman et al., 2018), others found it to be negative (i.e., a repulsion bias) under specific conditions (Fritsche et al., 2017; Son et al., 2021). Serial dependence was also found to occur in ensemble statistics judgments with both attraction (Manassi et al., 2017; Son et al., 2021; Suárez-Pinilla et al., 2018; Tanrıkulu et al., 2021) and repulsion (Son et al., 2021; Suárez-Pinilla et al., 2018) biases by recently perceived ensembles, depending on different factors. 
These studies investigated serial dependence of ensemble perception when each ensemble set was presented simultaneously, spatially distributed over the screen. We now ask if there are trial-to-trial serial dependences also when each trial consists of a temporal sequence of stimuli presented serially, despite the serial nature of each trial's sequence, which may overshadow the trial-to-trial effects. Moreover, most previous studies determined this bias in explicit mean estimation tasks, by measuring the difference of the estimated mean (using an adjusted response), from the true mean. In contrast, our current paradigm was designed to measure serial dependence using an implicit measure of ensemble statistics representation, with participants being instructed to remember individual stimuli within the ensemble sequences, while the mean feature is not mentioned. 
Another type of longer-term biases are those that originate from experience we acquire over our lifetime, prior to any experimental session. Such prototypical percepts were characterized for different features as cardinal axes—that is, the horizontal and vertical orientations that reflect environmental statistics (Girshick et al., 2011), slow velocities (Ullman, 1979; Weiss et al., 2002), and color prototypes (Douven, 2019). These priors can bias perception in a way that may interfere with our desired assessment of recent history influence. Thus, an additional experimental session was conducted in order to exclude the possibility of long-term bias by a preexperiment bias, as detailed in the Methods section. 
In the present study, we use serial presentation of ensembles of circles and ask participants to remember their individual diameters for a following two-alternative membership task. By manipulating the distances of the test item diameters from the current trial mean diameter and from the previous trial mean, we are able to compare the measured effects on the ensemble summary representation by the recent trial mean compared to those of the current trial mean. Our methods were designed to isolate the influence of recent history from the currently perceived ensemble information and to control for biases of any presession preferred size. 
Methods
Participants
In total, 185 master workers from the Amazon Mechanical Turk (MTurk) platform participated in three experiments that took place over two sessions (100 participants for the first session that included Experiments 1–2 and 85 participants for the second session of Experiment 3). MTurk is a crowdsourcing platform enabling coordination of online participants performing uploaded human information tasks. Masters are a group of workers who have demonstrated superior performance while completing a wide range of human intelligence tasks across the crowdsourcing marketplace. All participants were compensated for participation (a few dollars per session). All participants were naive as to the purpose of the experiment; none were excluded. The first screen provided instructions including two-alternative choices, use of full- screen mode, request for rapid but accurate replies, and so on. 
Stimuli
Stimuli were created using Python 3.7, and the experiment was designed using JavaScript and uploaded to the online MTurk platform. Ensemble sequences consisted of hollow circles with different diameters, expressed in arbitrary units (a.u.) of 1 to 40, where each unit represents a radius of 6 pixels (overall range across two sessions is a 6- to 240-pixel radius or 12- to 480-pixel diameter). The actual size of stimuli depends on screen size and resolution. For a typical 24-in. monitor with 1080p (1920 × 1080 pixel) resolution, the range of circle diameters is 0.324 to 12.96 cm. Each trial range of six stimuli was limited to a range of radii of 7 to 17 units, and the gap between adjacent stimuli within a trial was <5 units. To create a clear differentiation between consecutive ensembles and to distinguish the effects of the current trial mean from the recent trial mean, a gap of a minimum 4 units separated consecutive trial means. The distributions of stimuli were specifically designed for each experiment, as detailed in the methods sections below. 
Experimental procedure
Trial design
We implemented the implicit ensemble averaging paradigm used in Khayat and Hochstein (2018, 2019) and Khayat, Fusi, and Hochstein (2021), as demonstrated in Figure 1a. Each trial began with a 500-ms fixation cross, then a sequence of six stimuli (circles with different sizes) presented serially for 100 ms/stimulus with an additional 100-ms interstimulus interval, followed by a 100-ms masking stimulus to limit within-trial recency effects. After a 250-ms delay, a two-alternative forced-choice membership task was administered, where two test items were presented side by side and participants were instructed to indicate which one was a member of the sequence by pressing the keyboard's left or right arrow. Response time was unlimited. There was always one test item that had been present in the sequence (i.e., the “SEEN” item—the correct choice) and another one that was not (i.e., the “NEW” item). SEEN and NEW test items were randomly located on the left or right side of the display. In total, each trial lasted on average (onset-to-onset) 2,837 ms, with average reaction time (RT, from test item presentation to keyboard response) of 887 ± 451 ms (standard deviation). Thus, influences from trial to trial must overcome at least a delay of seconds. Different trial conditions were determined by test item characteristics, relative to the current trial statistics, and to the recent trial mean, as detailed below. The basic method of estimating the different biases was a comparison of the membership task accuracy and RT for different trial conditions, assuming that performance will depend on current trial and previous trial statistics. 
Figure 1.
 
Experiment 1 design. (a) Illustration of trial procedure for two consecutive trials—six circles are presented serially in each trial with time intervals (100 ms + 100-ms interstimulus interval), and then, following a masking stimulus, a two-alternative forced-choice membership task is presented; Δ parameters: upper example, trial “t –1,” standard trial with nonequidistant SEEN and NEW test items. (In this example, the NEW and SEEN test items are 3 and 1 units from current trial mean, respectively, so that Δ(Tmean) = 3 – 1 = 2, with the SEEN item closer to the current mean); lower example, trial “t,” an equidistant trial where SEEN and NEW test items are the same distance from the current trial mean, so that the different distances from the recent trial mean, Rmean, can influence current trial choice of test item. (In the example shown, both SEEN and NEW are 2 units from the current trial mean, so that Δ(Tmean) = 2 – 2 = 0; here the difference in distances from the recent (t – 1) mean, Rmean, is Δ(Rmean) = 2 – 6 = –4, with the NEW test item closer to the recent mean). (b) Size probability distributions of stimuli and means of Experiments 1 and 2 (Session 1) are represented by the black and red curves, respectively; stimulus and mean distributions of Experiment 3 (Session 2) are represented by the gray and blue curves, respectively. Stimuli are normally distributed (Gaussian fits), while the trial means are distributed in a trapezoidal shape, with the 10 most central means of each session with uniform probability (dots represent the actual probability of presenting each size). (c) Trial subtypes and conditions. “in” = in trial range, not mean; “out” = out of trial range.
Figure 1.
 
Experiment 1 design. (a) Illustration of trial procedure for two consecutive trials—six circles are presented serially in each trial with time intervals (100 ms + 100-ms interstimulus interval), and then, following a masking stimulus, a two-alternative forced-choice membership task is presented; Δ parameters: upper example, trial “t –1,” standard trial with nonequidistant SEEN and NEW test items. (In this example, the NEW and SEEN test items are 3 and 1 units from current trial mean, respectively, so that Δ(Tmean) = 3 – 1 = 2, with the SEEN item closer to the current mean); lower example, trial “t,” an equidistant trial where SEEN and NEW test items are the same distance from the current trial mean, so that the different distances from the recent trial mean, Rmean, can influence current trial choice of test item. (In the example shown, both SEEN and NEW are 2 units from the current trial mean, so that Δ(Tmean) = 2 – 2 = 0; here the difference in distances from the recent (t – 1) mean, Rmean, is Δ(Rmean) = 2 – 6 = –4, with the NEW test item closer to the recent mean). (b) Size probability distributions of stimuli and means of Experiments 1 and 2 (Session 1) are represented by the black and red curves, respectively; stimulus and mean distributions of Experiment 3 (Session 2) are represented by the gray and blue curves, respectively. Stimuli are normally distributed (Gaussian fits), while the trial means are distributed in a trapezoidal shape, with the 10 most central means of each session with uniform probability (dots represent the actual probability of presenting each size). (c) Trial subtypes and conditions. “in” = in trial range, not mean; “out” = out of trial range.
Experiment 1. Trial subtype effect on performance
For evaluation of the biases by current trial statistics, as already established using this paradigm (Khayat & Hochstein, 2018, 2019; Khayat, Fusi, & Hochstein, 2021), we applied four different trial subtypes, with a difference only in the SEEN and NEW test item characteristics, as outlined in Figure 1c. These could be one of the following options: “mean,” a test element equal to the current trial mean, called Tmean; “in,” an item in the range (but not the mean) of that trial ensemble; “out,” an element outside the ensemble range (necessarily a NEW test item); and “baseline,” a trial where neither SEEN nor NEW test item equals the trial mean. Trial subtypes were pseudo-randomly mixed, and participants were not aware of this division. 
Gradual mean effect by Δ
In order to assess the gradual effect of the difference in distances from the mean of the two test items, we measured performance as a function of the parameter Δ, which represents the difference of the test item absolute distances from the mean (see Figure 1a). Δ is calculated by the absolute distance of the NEW item from the mean minus the absolute distance of the SEEN item from the mean. Thus, positive Δs correspond to trials where the SEEN is closer to the mean, and negative Δs correspond to trials where the NEW is closer to the mean. This measure is more informative and detailed than the rough division to trial subtypes in which the test items are either exactly equal to the mean or not, as it incorporates the distances of both test items from the mean. Baseline trials were designed to be counterbalanced by their relative test item distances from the trial mean, Tmean (so that the mean over all baseline trials would be close to Δ(Tmean) = 0; actually, it was −1.02); choice of test items was otherwise random, close to the Gaussian distribution of stimuli. Analysis of membership task performance by Δ was performed for trials where both test items were within the trial range, to dissociate the result from the robust range effect (i.e., rejection of the NEW test items that were outside the trial sequence range). 
Experiments 2–3. Isolating the influence of prior information using the equidistant trial condition
The above paradigm (Experiment 1) was found to provide very robust effects of the current trial mean (Tmean) and range on performance (see results below and Khayat & Hochstein, 2018), and we were concerned that this Tmean effect may override potential biases by prior trial information. We therefore designed an additional condition to isolate prior trial biases. In these trials, the SEEN and NEW test items are of equal distance from the current trial mean (Δ(Tmean) = 0), reducing the robust preference for test items that are closer to the trial mean. This manipulation enabled us to measure biases by the relative distances of the test items from the most recent mean (Rmean), as illustrated in Figures 1a (bottom) and 1c (right). We term this manipulation equidistant trials and apply it in Experiments 2 and 3. 
Shifting the stimulus distribution
To exclude the possibility of long-term, preexperiment history effects (i.e., due to biases toward any particular size), the equidistant trial blocks of Experiment 2 were repeated in another experimental session with a shifted stimulus distribution (Experiment 3). The distribution shape was identical but shifted to larger sizes by 10 units (of the total 30), keeping an overlap of two thirds of the distribution in Experiment 2. Using different circle diameters, we compare the trends of task performance and patterns of test item selection with respect to the different stimulus probability distributions and verify that the effects that are seen are a result of serial dependence and free from any other priors. 
Experimental design and structure
The study is divided into three experiments. Experiments 1 to 2 were conducted in a single session and consisted of 400 trials in eight blocks of 50 trials. Four blocks were designed to measure the standard effect of current trial statistics (Experiment 1), using unequal (and only occasionally equal) distances of the SEEN and NEW test items from the trial mean, with some trials with the NEW test item being outside the trial range. In the other four blocks, all trials were of the equidistant condition, designed to measure the influence of prior experience (Experiment 2). These different block types were interleaved. Stimulus sizes were in the range of 1 to 30 arbitrary units (global mean = 15.5; Figure 1b: black points and curve: distribution of stimulus circle sizes; red: distribution of trial means). Experiment 3 was conducted in a separate session, which had a shifted stimulus range of 11 to 40 (global mean = 25.5; Figure 1b: gray points and curve: distribution of stimulus circle diameters; blue: distribution of trial means), with 200 equidistant trials in four blocks of 50 trials. Thirty-six observers overlapped and participated in both sessions. 
Data analysis and statistical tests
Statistical tests
One-way repeated-measures analysis of variance (RM-ANOVA) was performed on task accuracy (percentage of selecting the SEEN item) with trial subtypes as a within-subject factor, to assess how accuracy was manipulated by current and previous trial statistics. We then conducted one-tailed t tests to compare the means of different trial subtype pairs and Cohen’s d to assess the effect size. 
Curve fitting
The gradual mean effects by the distances of test items from the mean (either current or recent mean) were fitted to the following sigmoid function: \(y = min + \frac{{( {max \,{-}\, {\rm{min}}} )}}{{1 \,{+}\, {e^{( { - slope*( {x - c} )} )}}}}\), with the coefficients of min and max representing the sigmoid minimum and maximum, respectively; c representing the point on the x-axis where the y value is in the middle of max and min; and slope representing the slope of the sigmoid at point x = c. y is the fraction of choosing (correctly) the SEEN image; x is Δ, the difference in test item distances from the current or recent trial mean. 
Data were analyzed using MATLAB 2020b, SPSS 28.0, and Excel. Trials with RT below 200 ms or above 3 s were excluded from the analysis, reducing the number of valid trials by 3% and 6% in the first and second sessions, respectively. Data, detailed instructions to participants, and Python code will be shared online following publication. 
Results
Experiment 1. Effect of the mean of the current trial—unequal test item distances
Experiment 1 (standard, nonequal distance trials), testing membership accuracy as a function of trial subtype, showed a significant current trial mean, or Tmean, effect, as shown in Figure 2a. This effect is seen in the differences in average performance for the three subtype conditions: “Baseline” trial performance is near chance, 0.5, where neither SEEN nor NEW equals the mean of the stimulus sequence, Tmean; performance is >0.5 (more frequent correct choice of the SEEN test item) for trials where the SEEN item equals Tmean; and performance is <0.5 (less frequent choice of the SEEN test item) for trials where the NEW item is equal to Tmean. The robust range effect is demonstrated by the difference from baseline associated with performance in trials where the NEW item is outside the trial range, suggesting that participants easily rejected NEW items, increasing task accuracy. These subtype performance differences are highly significant (RM-ANOVA, F(3, 297) = 378.7, p < 0.001; paired t test, p < 0.001 for all subtype pairs; effect size [Cohen's d] = 1.86: NEW = Tmean vs. SEEN = Tmean; 0.96: NEW = Tmean vs. baseline; 1.41 SEEN = Tmean vs. baseline; 2.16 NEW = out vs. baseline). 
Figure 2.
 
Experiment 1: effects of current trial test item statistics on performance. (a) Accuracy in the membership task for the various trial subtypes. Each symbol represents the average accuracy of a single participant; horizontal bars denote group averages, and error bars represent standard error of the mean. Performance when the SEEN item equals the trial sequence mean, Tmean, is significantly above baseline and when the NEW item equals Tmean, significantly below baseline; in baseline trials, neither SEEN nor NEW equals Tmean; performance is best when the NEW test item is outside the trial sequence range. (b) Gradual effect by relative difference of absolute distances of the SEEN and NEW test items from the current trial mean (Δ(Tmean)), as defined above and in Figure 1a, with a best-fit sigmoid curve. Performance depends parametrically on the relative proximity of the SEEN test item and relative distance of the NEW test item from the trial sequence mean. When they are the same distance (0 difference), there is no average bias. When the SEEN is more distant, participants fail to detect its presence in the set and score below chance, whereas when it is closer, they score well above chance. Data for this plot were collected from in-range trials only. (c) RT for correct (green) and incorrect (orange) trials in the various trial subtypes (e.g., fastest correct response is when the NEW is out of range). Data collected from four blocks with all distances. Error bars in (a) and (c) are standard error of the mean; in (b), they were <0.02, not shown. *Indicates t test result of p < 0.05. ***Indicates p < 0.001.
Figure 2.
 
Experiment 1: effects of current trial test item statistics on performance. (a) Accuracy in the membership task for the various trial subtypes. Each symbol represents the average accuracy of a single participant; horizontal bars denote group averages, and error bars represent standard error of the mean. Performance when the SEEN item equals the trial sequence mean, Tmean, is significantly above baseline and when the NEW item equals Tmean, significantly below baseline; in baseline trials, neither SEEN nor NEW equals Tmean; performance is best when the NEW test item is outside the trial sequence range. (b) Gradual effect by relative difference of absolute distances of the SEEN and NEW test items from the current trial mean (Δ(Tmean)), as defined above and in Figure 1a, with a best-fit sigmoid curve. Performance depends parametrically on the relative proximity of the SEEN test item and relative distance of the NEW test item from the trial sequence mean. When they are the same distance (0 difference), there is no average bias. When the SEEN is more distant, participants fail to detect its presence in the set and score below chance, whereas when it is closer, they score well above chance. Data for this plot were collected from in-range trials only. (c) RT for correct (green) and incorrect (orange) trials in the various trial subtypes (e.g., fastest correct response is when the NEW is out of range). Data collected from four blocks with all distances. Error bars in (a) and (c) are standard error of the mean; in (b), they were <0.02, not shown. *Indicates t test result of p < 0.05. ***Indicates p < 0.001.
Figure 2c shows that RT analysis of these trial subtypes reflects a complementary effect of trial mean: significantly faster correct responses (choosing the SEEN test item) when SEEN = Tmean than correct responses when NEW = Tmean and the opposite for incorrect responses, where the NEW item is chosen. Similarly, for the range effect: significantly faster correct responses when NEW is outside the trial range than baseline trials and the opposite for incorrect responses. RTs for correct and incorrect trials are significantly different (NEW = Tmean: 902 ms vs. 871 ms, one-tail t test, p < 0.05; SEEN = Tmean: 878 ms vs. 929 ms, p < 0.001; NEW = out: 851 ms vs. 1,029 ms, p < 0.001), except for baseline trials (915 ms vs. 922 ms, ns). RTs are faster for SEEN = mean than NEW = mean for correct trials (878 ms vs. 902 ms, p < 0.05) and slower for incorrect trials (929 ms vs. 871 ms, p < 0.001). RTs for NEW = out are extremely different from baseline (correct: 851 ms vs. 915 ms, p < 0.001; incorrect: 1,029 ms vs. 922 ms, p < 0.001). Taken together with Figure 2a, these results indicate that there is no speed–accuracy trade-off. These mean effects were also apparent in the gradual response accuracy dependence on the relative distances of the test items from the mean (Δ(Tmean)), as illustrated in Figure 2b (max = 0.73, min = 0.26, c = 0.04, and slope = 0.32). These results confirm previous findings of the implicit effects of current trial statistics, including both set mean and set range (Khayat & Hochstein, 2018). 
Experiments 2 and 3. Contraction toward the previous trial mean—equidistant test items
The results of this section combined data that were collected from Experiments 2 to 3, which had similar designs but tested different stimulus ranges (see below). Since the results were similar over the different distribution sessions, they are sometimes shown together using all the data from both sessions (the effects shown here were similar and significant also for each experiment separately). In these experiments, we analyze performance when the SEEN and NEW test items were equidistant from the trial mean, Tmean. In these equidistant trial blocks, while the two test items were of the same distance from the trial mean, they were differently distant from the recent trial (t – 1) mean, Rmean. 
Figure 3 presents the Rmean effect. First, the rate of selecting the SEEN test item on trials of different Rmean subtypes showed a significant effect of selecting the SEEN test item more often when it equaled Rmean and less often when the NEW item equaled Rmean (RM-ANOVA, F(2, 368) = 51.08, p < 0.001; paired t test, p < 0.001 for all trial subtype pairs) and small to medium effect size (Cohen's d = 0.57 for NEW = Rmean vs. SEEN = Rmean, 0.45 for NEW = Rmean vs. baseline, 0.45 for SEEN = Rmean vs. baseline), as shown in Figure 3a. The larger variance in participant performance in trials where the SEEN or NEW items are equal to the mean is due to the lower number of these trials (30 or 10 per participant for Experiments 2 and 3, respectively, compared to 140 or 180 baseline trials per participant). The average rate of selecting the test item closer to Rmean across all trial subtypes (of equidistant trials) was significantly above chance level (p < 0.001, d = 0.49), illustrating the bias toward previously seen items, as plotted per participant in Figure 3b. This bias evolves gradually as a function of the relative distances of test items from the previous trial mean (ΔRmean), with increasing task accuracy for larger positive Δs and decreasing task accuracy with larger negative Δs (Figure 3c). The best-fit sigmoid curve (max = 0.55, min = 0.43, c = 0.83, and slope = 0.51) demonstrates a more modest bias relative to the effect by the current trial mean (Figure 2b), with smaller maxmin difference (although larger slope at point c). The smaller contraction to history statistics relative to the present ensemble statistics is reflected in the trial subtype performance, as well, with reduced manipulation of task accuracy when test items are equal to Rmean (Figure 3a) compared to the standard unequal distances condition when test items are equal to Tmean (Figure 2a; difference between SEEN = mean and NEW = mean is 0.16 for Rmean, compared to 0.26 for Tmean). We note that the results collected from the two ranges of Experiments 2 and 3 showed similar trends, with, respectively, an average fraction of selecting the item closer to Rmean of 0.541 versus 0.539 (Figure 3b), a significant performance dependence on trial subtype (p < 0.05 for all subtype pairs in both experiments), and similar effects by ΔRmean, as shown in the dots of Experiments 2 (red) and 3 (blue) in Figure 3c. 
Figure 3.
 
Contraction toward the recent trial mean, Rmean, in equidistant trials. (a) Accuracy in the membership task for the three trial subtypes. Each symbol represents the average accuracy of a single participant; horizontal bars denote group averages, and error bars represent standard error of the mean. Performance when the SEEN item equals the recent trial sequence mean, Rmean, is significantly above baseline, and when the NEW item equals Rmean, it is significantly below baseline; in baseline trials, neither SEEN nor NEW equals Rmean. (b) Rate of selecting the test item closer to the recent mean. Each circle reflects average performance of a single participant; horizontal line corresponds to average across participants; error bars, standard error of the mean. (c) Average accuracy as a function of the difference (Δ) of the test items’ distance from the recent trial mean (Rmean). Red symbols represent data from Experiment 2; blue symbols represent data from Experiment 3; sigmoid curve is calculated across data of both experiments. Standard error of the mean was <0.02.
Figure 3.
 
Contraction toward the recent trial mean, Rmean, in equidistant trials. (a) Accuracy in the membership task for the three trial subtypes. Each symbol represents the average accuracy of a single participant; horizontal bars denote group averages, and error bars represent standard error of the mean. Performance when the SEEN item equals the recent trial sequence mean, Rmean, is significantly above baseline, and when the NEW item equals Rmean, it is significantly below baseline; in baseline trials, neither SEEN nor NEW equals Rmean. (b) Rate of selecting the test item closer to the recent mean. Each circle reflects average performance of a single participant; horizontal line corresponds to average across participants; error bars, standard error of the mean. (c) Average accuracy as a function of the difference (Δ) of the test items’ distance from the recent trial mean (Rmean). Red symbols represent data from Experiment 2; blue symbols represent data from Experiment 3; sigmoid curve is calculated across data of both experiments. Standard error of the mean was <0.02.
A direct comparison of the recent history bias and the current trial effect was additionally made by plotting the ΔTmean and ΔRmean sigmoid curves from the same participants of the first experimental session (Experiments 1 and 2), as shown in Figure 4a. The superiority of the current trial mean is clearly observed with a larger slope, and maxmin difference for the purple curve compared with the black curve. A within-subject measure of the relation between the current trial and the recent trial is a subtraction of the NEW = mean from the SEEN = mean trial subtypes in Experiment 1 (i.e., the Tmean effect) and Experiment 2 (i.e., the Rmean effect). This analysis was made for each subject, as illustrated in Figure 4b. Confirming the above conclusion, most of the participants had a larger Tmean effect—the dots above the diagonal dashed line of equivalence. The two effects are also correlated, and a Spearman correlation analysis for these effects per participant showed a significant effect (Spearman r = 0.21, p = 0.03), demonstrating that participants who had a relatively large current trial effect also had a relatively large recent trial effect. To assess the within-subject consistency of the effects by the current trial mean and the recent trial mean, these effects were measured separately in odd blocks and even blocks per participant. These test–retest reliability correlations were highly significant for the effects by the current trial mean (Figure 4c; Spearman r = 0.57, p < 0.001) and the recent trial mean (Figure 4d; Rmean, Spearman r = 0.64, p < 0.001). 
Figure 4.
 
Comparing current trial mean and recent trial mean effect on task performance within the subjects of the first experimental session. The data for these analyses are taken from the same 100 participants of Experiments 1 and 2. (a) Sigmoid curve fitting of the fraction of selecting SEEN item as a function of the distances of test items from the current (purple) and recent (red) trial mean. Data are calculated and averaged across participants for each Δ value. (b) Within-subject correlation of the current trial mean effect with the recent trial mean effect. Tmean and Rmean effects are calculated by subtraction of the task performance in trial subtype NEW = mean from SEEN = mean, for each subject. Most data points are above the dashed line of equal Tmean and Rmean effects, showing a larger Tmean effect for most subjects. (c) Test–retest reliability of the Tmean effect. Data are divided into odd blocks versus even blocks. (d) Test–retest reliability of the Rmean effect. Data are divided into odd blocks versus even blocks. Significant Spearman correlation shows within-subject consistency. Each hollow symbol in (b, c, d) corresponds to a single subject; filled circles in (a) represent data averaged across participants; solid line is the correlation between participants' performance.
Figure 4.
 
Comparing current trial mean and recent trial mean effect on task performance within the subjects of the first experimental session. The data for these analyses are taken from the same 100 participants of Experiments 1 and 2. (a) Sigmoid curve fitting of the fraction of selecting SEEN item as a function of the distances of test items from the current (purple) and recent (red) trial mean. Data are calculated and averaged across participants for each Δ value. (b) Within-subject correlation of the current trial mean effect with the recent trial mean effect. Tmean and Rmean effects are calculated by subtraction of the task performance in trial subtype NEW = mean from SEEN = mean, for each subject. Most data points are above the dashed line of equal Tmean and Rmean effects, showing a larger Tmean effect for most subjects. (c) Test–retest reliability of the Tmean effect. Data are divided into odd blocks versus even blocks. (d) Test–retest reliability of the Rmean effect. Data are divided into odd blocks versus even blocks. Significant Spearman correlation shows within-subject consistency. Each hollow symbol in (b, c, d) corresponds to a single subject; filled circles in (a) represent data averaged across participants; solid line is the correlation between participants' performance.
Experiment 2 vs. 3. Comparing sessions with different stimulus distribution ranges shows similar effects and no presession bias
To exclude a longer-term presession bias by any particular circle diameter, the equidistant blocks of Session 1, Experiment 2 were repeated in another session, administered with a similar distribution shape over a different stimulus range (Experiment 3; see Methods, Figure 1b). The overall distribution was shifted by 10 units (of the total of 30) relative to the first one, with an overlap of two thirds of the distribution in the two sessions. As noted, these blocks were designed to measure the influence of prior experience, as the test items were equidistant from the current trial mean, Tmean, precluding its robust effect. The resulting contraction (depicted in the previous section) did not differ in the two distribution ranges (see the red and blue data points for the different ranges in Figure 3c). We now wish to confirm that the perceptual judgments were indeed biased by the previously experienced stimuli within the experimental session rather than by any earlier internal models of environmental statistics. 
For this purpose, we assessed the tendency of selecting test items as a function of their diameter, regardless of the relative difference from the recent trial mean. This differs from Figures 3c and 4a, where we specifically plot responses as a function of the relative distances of the test items from the preceding trial's mean. Here we are interested in performance as a function of the value of the test item chosen, which should shift according to the shifted probabilities of the stimuli used, unless there is a bias toward a long-term internal preference. This analysis yielded a similar pattern of responses in the two experiments, with higher fractions of selecting items near the middle parts of the overall size distributions, as demonstrated in Figure 5. This pattern reflects a contraction toward the central part of the distribution from any position on the diameter scale, where statistically there is a larger chance that the previous trial mean was here, compared to the distribution edges. The responses are plotted irrespective of choice accuracy. That is, choice of the SEEN item is correct, and choice of the NEW item is an error, but we are interested here in the rate of these choices for the two experiments, not their accuracy. In such a difficult membership task, where the representation of individual items is poor and the bias by the current trial mean is limited by the equidistant condition, task performance is likely to be influenced by prior experience. Specifically, since memory of the SEEN item identity is just marginally (if at all) better than chance level, both the SEEN and NEW items are chosen with respect to their similarity to the previously seen stimuli. Performance notation is now changed to “selection” rather than accuracy (i.e., “fraction select SEEN”). In the case of selecting the NEW item, the measure is error rate (i.e., fraction select NEW), and in the case of selecting the SEEN, the measure is accuracy (i.e., “fraction select SEEN”). 
Figure 5.
 
Rate of test item selection as a function of test item size in equidistant trials, in two session distributions of Experiments 2 and 3, with a best-fit Gaussian curve. Red and pink dots and curves correspond to rates of selecting the SEEN (membership accuracy) and NEW (membership error rate) test items in Experiment 2 (with smaller size circles). Blue and light blue points and curves denote the rate of selecting these items in Experiment 3 (with larger circle sizes). Each symbol corresponds to the average selection of a specific size of test item across participants. Curves were calculated over the entire data rather than over the averages at each test item value.
Figure 5.
 
Rate of test item selection as a function of test item size in equidistant trials, in two session distributions of Experiments 2 and 3, with a best-fit Gaussian curve. Red and pink dots and curves correspond to rates of selecting the SEEN (membership accuracy) and NEW (membership error rate) test items in Experiment 2 (with smaller size circles). Blue and light blue points and curves denote the rate of selecting these items in Experiment 3 (with larger circle sizes). Each symbol corresponds to the average selection of a specific size of test item across participants. Curves were calculated over the entire data rather than over the averages at each test item value.
Results for the two sessions roughly follow the shift in the stimulus and mean distributions (with slightly smaller diameters in both cases than the actual distribution). There is a shift between the results for the two distributions that correlates with the shift of the distribution ranges themselves (Figure 5; best-fit Gaussians shift of 11 units). Note the difference in results for the overlapping sizes of the two sessions (i.e., diameters 10–30), indicating that performance does not depend on participants’ preexperiment history (as found by other studies for different variables, e.g., Douven, 2019; Girshick et al., 2011; Ullman, 1979; Weiss et al., 2002) but by the context of the recent percepts within the session. Participants more often selected the test items closer to the recent trial mean, which is statistically positioned as more central among the two test items. This observed shift follows the stimulus probability distribution shape and range, reflecting a pulling effect in which perception is contracted toward the center of the distribution to the regions that cover the majority of the previously experienced stimuli. 
Discussion
Many studies in the past two decades investigated the efficient nature of perceptual averaging from a set of items (see Introduction). While the main focus in this literature is rapid feature integration from the current trial to form a summary representation, only a few tested the influence of history and found contraction biases in mean estimation tasks for facial expressions (Crawford et al., 2019), orientation (Manassi et al., 2017; Son et al., 2021; Tanrikulu et al., 2021), and brightness (Crawford et al., 2019). In these experiments, ensembles are presented all at once for a short time, but in real-world environments, groups of items are often dynamically perceived over longer time scales. In the current study, we expand these findings, using a rapid temporal presentation mode to test size ensemble perception over time, rather than over space. Moreover, we focus on contraction biases by short-term history via an implicit measure of ensemble representation and compare two time windows—by the current trial (t) sequence presented a few hundreds of milliseconds prior to the test stimuli and by the recent trial (t – 1) sequence presented a few seconds prior to the test stimuli. 
To test these biases, we implemented an implicit averaging paradigm using a challenging visual memory task (Khayat & Hochstein, 2018, 2019; Khayat et al., 2021), where participants are subjected to manipulations by stimulus statistics. While participants are asked to report which of two test items were present in the previously presented set, which would require maintaining a reliable individual-item representation, we find that instead, they (unconsciously) tend to respond in this membership task on the basis of the trial summary representations, which are formed immediately within a trial (the shortest time window in this study) or by stimulus statistics formed over previously experienced ensemble sequences. 
The greatest effect found was by the current trial statistics, confirming the results of Khayat and Hochstein (2018) and illustrating a robust effect of immediate extraction of ensemble summary representations (Experiment 1). This condition (nonequidistant trials) manipulated task performance to the largest degree and created the largest difference from chance level, as illustrated in Figure 2. Next, we evaluated the effect of previous trials using the equidistant condition of Experiments 2 and 3. This condition avoided the robust effect by the current trial mean that could overshadow those by previous trials, which we desired to explore. Although this paradigm is designed for an implicit evaluation of ensemble representation, the equidistant condition is analogous to having high uncertainty or low confidence regarding the actual stimulus (i.e., neither of the test items is closer to the current trial mean), leading to greater dependence on previous experience (Weiss et al., 2002; Xiang et al., 2021). Results show that task performance (rate of SEEN item selection) was contracted to the most recent trial (t – 1), although to a lesser extent relative to the current trial mean effect. This is reflected in the comparative sigmoid curve parameters and the effect size between trial subtypes, shown in Figures 2a, b compared to Figures 3a, c, as well as in the direct comparison on a participant-by-participant basis in Figure 4b. We attribute this effect to a serial dependence by recently perceived ensemble means. 
In order to exclude possible effect by preexperiment bias, we compared the tendency of selecting the test items (either SEEN accuracy or NEW error rate) on the different diameter distribution ranges of Experiments 2 and 3 (Figure 5). As previously found by Ashourian and Loewenstein (2011) in a line length comparison task and by Crawford et al. (2019) for ensemble mean estimation of face expressions and square brightness, our results showed that the response curve was independent of the physical range of the stimuli. The differences in the overlapping region of the two distributions illustrate how responses are affected by the context of the distribution of each session, which is the cause of the resulting contraction bias. This bias differs from other “presession” biases, which are based on experiences of longer time scales of a lifetime, such as the bias to slower velocities (Ullman, 1979; Weiss et al., 2002), cardinal axes (Girshick et al., 2011), or color prototypes (Douven, 2019). In contrast to the latter biases, our results are explained by the stimulus statistics, which are learned over the very short-term history of the last few seconds. 
We have written this report in terms of ensemble perception, accepting the view that observers implicitly perceive the mean of sets and judge presence of test items on the basis of their (relative) distance from that mean. This was the point of view of many previous studies, including Ariely (2001, 2008); Chong and Treisman (2003, 2005a, 2005b); Chong, Joo, Emmanouil, and Treisman (2008); Alvarez and Oliva (2008); Corbett and Oriet (2011); Cha and Chong (2018); Khayat and Hochstein (2018, 2019); Semizer and Boduroglu (2021); and Khayat, Fusi, and Hochstein (2021). The averaging model was challenged by an alternative, suggesting that subsampling of one or a few items might be sufficient to lead to similar results, as proposed, for example, by Dakin (2001), Myczek and Simons (2008), Solomon (2010), Maule and Franklin (2016), Solomon and Morgan (2017), and Zepp et al. (2021); see also Brady and Alvarez (2015) and Chetverikov, Campana, and Kristjánsson (2017, 2020). This issue has also been discussed in recent reviews (e.g., Corbett, Utochkin, & Hochstein, 2022; Whitney & Yamanashi Leib, 2018). Evidence against the subsampling model includes studies reporting ensemble perception even when information to perceive individual elements is unavailable due to crowding (Fischer & Whitney, 2011; Parkes et al., 2001; but see Solomon, 2010). 
This issue is especially pertinent for our studies where observers are presented with two test items and asked to decide which was present in the previously presented set (a two-alternative forced-choice decision). We discuss this issue in the Appendix, where we present evidence that randomly selecting one item (out of the set of six) would not produce the same pattern of results as perceiving, inherently, the mean of the set and as found in this report. 
An additional issue that needs to be addressed in future studies is the dissociation between short-term and within-session longer-term biases. The latter is known in literature as the central tendency bias (i.e., the contraction of perceptual judgments toward the mean, or the midpoint of the overall stimulus distribution), as was first described more than a century ago by Hollingworth (1910). Although serial dependence and central tendency have generally been discussed separately (e.g., Crawford, Corbin, & Landy, 2019; Manassi et al., 2017; Son et al., 2021), they are statistically correlated in many cases as they contract perception toward the same direction (see Tong & Dubé, 2022). Whether these biases by priors relating to different time windows originate from different underlying mechanisms, as suggested by the observation that their relative contributions to perception differ in specific populations (Lieder et al., 2019) or from a single mechanism (Boboeva et al., 2022; Tong & Dubé, 2022), one needs to study whether they have different characteristics, weights, and impacts on behavior. 
Conclusion
We conclude that performance of an orthogonal task is influenced by ensemble perception biases due to the within-trial mean (Figure 2) and, to a lesser degree, the means of past trials (Figure 3). We find that this serial dependence effect is present also for ensemble perception in the case of temporal presentation of the ensemble stimuli. Furthermore, the current trial and recent trial effects were found to be correlated—that is, participants who had a relatively large current trial effect also had a relatively large recent trial effect (Figure 4b), suggesting a possible shared mechanism (although this correlation was weaker than that between the effects of the recent and current trial effects, for even vs. odd trials; Figures 4c, d). Finally, we found no indication of an inherent bias toward a particular circle size, so that there was a shift in response distribution equal to the tested shift in stimulus distribution (Figure 5). Further study is needed to distinguish between serial dependence and central tendency effects. 
Acknowledgments
The authors thank Yuri Maximov for programming, analysis, and participant communication. 
Supported by a grant from the Israel Science Foundation (ISF). 
The authors dedicate this study to the memory of Lily Safra, a great supporter of brain research. 
Commercial relationships: none. 
Corresponding author: Shaul Hochstein. 
Email: shaulhochstein@gmail.com. 
Address: ELSC Edmond & Lily Safra Center for Brain Research & Life Sciences Institute, Hebrew University, Jerusalem, Israel. 
References
Ahissar, M. & Hochstein, S. (1993). Attentional control of early perceptual learning. Proceedings of the National Academy of Sciences USA, 90(12), 5718–5722.
Alvarez, G. A., & Oliva, A. (2008). The representation of simple ensemble visual features outside the focus of attention. Psychological Science, 19(4), 392–398. [PubMed]
Alvarez, G. A., & Oliva, A. (2009). Spatial ensemble statistics are efficient codes that can be represented with reduced attention. Proceedings of the National Academy of Sciences USA, 106(18), 7345–7350.
Ariely, D. (2001). Seeing sets: Representation by statistical properties. Psychological Science, 12, 157–162. [PubMed]
Ariely, D. (2008). Better than average? When can we say that subsampling of items is better than statistical summary representations? Perception & Psychophysics, 70(7), 1325–1326. [PubMed]
Ashourian, P., & Loewenstein, Y. (2011). Bayesian inference underlies the contraction bias in delayed comparison tasks. PLoS ONE, 6(5), e19551. [PubMed]
Bauer, B. (2015). A selective summary of visual averaging research and issues up to 2000. Journal of Vision, 15(4), 14, https://doi.org/10.1167/15.4.14. [PubMed]
Bliss, D. P., Sun, J. J., & D'Esposito, M. (2017). Serial dependence is absent at the time of perception but increases in visual working memory. Scientific Reports, 7(1), 1–13. [PubMed]
Boboeva, V., Pezzotta, A., Akrami, A., & Clopath, C. (2022). From recency to the central tendency bias in working memory: A unifying attractor network model. bioRxiv, https://doi.org/10.1101/2022.05.16.491352.
Brady, T. F., & Alvarez, G. A. (2011). Hierarchical encoding in visual working memory: Ensemble statistics bias memory for individual items. Psychological Science, 22(3), 384–392. [PubMed]
Brady, T. F., & Alvarez, G. A. (2015). Contextual effects in visual working memory reveal hierarchically structured memory representations. Journal of Vision, 15(15), 6, https://doi.org/10.1167/15.15.6. [PubMed]
Cha, O., & Chong, S. C. (2018). Perceived average orientation reflects effective gist of the surface. Psychological Science, 29(3), 319–327 [PubMed]
Chetverikov, A., Campana, G., & Kristjánsson, Á. (2016). Building ensemble representations: How the shape of preceding distractor distributions affects visual search. Cognition, 153, 196–210. [PubMed]
Chetverikov, A., Campana, G., & Kristjánsson, Á. (2017). Set size manipulations reveal the boundary conditions of perceptual ensemble learning. Vision Research, 140, 144–156. [PubMed]
Chetverikov, A., Campana, G., & Kristjansson, A. (2020). Probabilistic rejection templates in visual working memory. Cognition, 196, 104075. [PubMed]
Chong, S. C., Joo, S. J., Emmanouil, T.-A., & Treisman, A. (2008). Statistical processing: not so implausible after all. Perception & Psychophysics, 70(7), 1327–1334. [PubMed]
Chong, S. C., & Treisman, A. (2003). Representation of statistical properties. Vision Research, 43, 393–404. [PubMed]
Chong, S. C., & Treisman, A. (2005a). Attentional spread in the statistical processing of visual displays. Perception & Psychophysics, 67(1), 1–13. [PubMed]
Chong, S. C., & Treisman, A. (2005b). Statistical processing: Computing the average size in perceptual groups. Vision Research, 45, 891–900 [PubMed]
Cicchini, G. M., Mikellidou, K., & Burr, D. C. (2018). The functional role of serial dependence. Proceedings of the Royal Society (London) B, 285(1890), 20181722.
Cohen, M. A., Dennett, D. C. & Kanwisher, N. (2016). What is the bandwidth of perceptual experience? Trends in Cognitive Science, 20, 324–335.
Corbett, J. E., & Oriet, C. (2011). The whole is indeed more than the sum of its parts: Perceptual averaging in the absence of individual item representation. Acta Psychologica, 138(2), 289–301. [PubMed]
Corbett, J. E., Utochkin, I., & Hochstein, S. (2022). The pervasiveness of ensemble perception: Not just your average review. In Enns, J. T. (Ed.), Elements in Perception (pp. 1–96). Cambridge, UK: Cambridge University Press.
Cowan, N. (2001). Metatheory of storage capacity limits. Behavioral and Brain Sciences, 24(1), 154–176.
Crawford, L. E., Corbin, J. C., & Landy, D. (2019). Prior experience informs ensemble encoding. Psychonomic Bulletin & Review, 26(3), 993–1000. [PubMed]
Dakin, S. C. (2001). Information limit on the spatial integration of local orientation signals. Journal of the Optical Society of America A, 18(5), 1016–1026.
Douven, I. (2019). Putting prototypes in place. Cognition, 193, 104007. [PubMed]
Fischer, J., & Whitney, D. (2011). Object-level visual information gets through the bottleneck of crowding. Journal of Neurophysiology, 106(3), 1389–1398. [PubMed]
Fischer, J., & Whitney, D. (2014). Serial dependence in visual perception. Nature Neuroscience, 17(5), 738–743. [PubMed]
Fritsche, M., Mostert, P., & de Lange, F. P. (2017). Opposite effects of recent history on perception and decision. Current Biology, 27(4), 590–595.
Girshick, A. R., Landy, M. S., & Simoncelli, E. P. (2011). Cardinal rules: Visual orientation perception reflects knowledge of environmental statistics. Nature Neuroscience, 14(7), 926–932. [PubMed]
Haberman, J., & Whitney, D. (2007). Rapid extraction of mean emotion and gender from sets of faces. Current Biology, 17(17), R751–R753.
Haberman, J., & Whitney, D. (2009). Seeing the mean: Ensemble coding for sets of faces. Journal of Experimental Psychology: Human Perception and Performance, 35(3), 718. [PubMed]
Hochstein, S., & Ahissar, M. (2002). View from the top: Hierarchies and reverse hierarchies in the visual system. Neuron, 36(5), 791–804. [PubMed]
Hollingworth, H. L. (1910). The central tendency of judgment. The Journal of Philosophy, Psychology and Scientific Methods, 7(17), 461–469.
Khayat, N., Fusi, S., & Hochstein, S. (2021). Perceiving ensemble statistics of novel image sets. Attention, Perception, & Psychophysics, 83(3), 1312–1328. [PubMed]
Khayat, N., & Hochstein, S. (2018). Perceiving set mean and range: Automaticity and precision. Journal of Vision, 18(9), 23, https://doi.org/10.1167/18.9.23. [PubMed]
Khayat, N., & Hochstein, S. (2019). Relating categorization to set summary statistics perception. Attention, Perception, & Psychophysics, 81(8), 2850–2872. [PubMed]
Kiyonaga, A., Scimeca, J. M., Bliss, D. P., & Whitney, D. (2017). Serial dependence across perception, attention, and memory. Trends in Cognitive Sciences, 21(7), 493–497. [PubMed]
Liberman, A., Manassi, M., & Whitney, D. (2018). Serial dependence promotes the stability of perceived emotional expression depending on face similarity. Attention, Perception, & Psychophysics, 80(6), 1461–1473. [PubMed]
Lieder, I., Adam, V., Frenkel, O., Jaffe-Dax, S., Sahani, M., & Ahissar, M. (2019). Perceptual bias reveals slow-updating in autism and fast-forgetting in dyslexia. Nature Neuroscience, 22(2), 256–264. [PubMed]
Loewenstein, Y., Raviv, O., & Ahissar, M. (2021). Dissecting the roles of supervised and unsupervised learning in perceptual discrimination judgments. Journal of Neuroscience, 41(4), 757–765. [PubMed]
Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 390(6657), 279–281. [PubMed]
Manassi, M., Liberman, A., Chaney, W., & Whitney, D. (2017). The perceived stability of scenes: serial dependence in ensemble representations. Scientific Reports, 7(1), 1–9. [PubMed]
Manassi, M., Liberman, A., Kosovicheva, A., Zhang, K., & Whitney, D. (2018). Serial dependence in position occurs at the time of perception. Psychonomic Bulletin & Review, 25(6), 2245–2253. [PubMed]
Maule, J., & Franklin, A. (2016). Accurate rapid averaging of multihue ensembles is due to a limited capacity subsampling mechanism. Journal of the Optical Society of America A, 33(3), A22–A29.
Myczek, K., & Simons, D. J. (2008). Better than average: Alternatives to statistical summary representations for rapid judgments of average size. Perception and Psychophysics, 70(5), 772.
Parkes, L., Lund, J., Angelucci, A., Solomon, J. A., & Morgan, M. (2001). Compulsory averaging of crowded orientation signals in human vision. Nature Neuroscience, 4(7), 739–744. [PubMed]
Raviv, O., Ahissar, M., & Loewenstein, Y. (2012). How recent history affects perception: the normative approach and its heuristic approximation. PLoS Computational Biology, 8(10), e1002731. [PubMed]
Raviv, O., Lieder, I., Loewenstein, Y., & Ahissar, M. (2014). Contradictory behavioral biases result from the influence of past stimuli on perception. PLoS Computational Biology, 10(12), e1003948. [PubMed]
Semizer, Y., & Boduroglu, A. (2021). Variability leads to overestimation of mean summaries. Attention, Perception, & Psychophysics, 83(3), 1129–1140. [PubMed]
Solomon, J. A. (2010). Visual discrimination of orientation statistics in crowded and uncrowded arrays. Journal of Vision, 10(14), 19, https://doi.org/10.1167/10.14.19. [PubMed]
Solomon, J. A. & Morgan, M. J. (2017). Orientation-defined boundaries are detected with low efficiency. Vision Research, 138, 66–70. [PubMed]
Solomon, J. A., Morgan, M., & Chubb, C. (2011). Efficiencies for the statistics of size discrimination. Journal of Vision, 11(12), 13, https://doi.org/10.1167/11.12.13. [PubMed]
Son, S., Lee, J., Kwon, O. S., & Kim, Y. J. (2021). Effect of spatiotemporally changing environment on serial dependence in ensemble representations. bioRxiv, https://doi.org/10.1101/2021.11.30.470662.
Suárez-Pinilla, M., Seth, A. K., & Roseboom, W. (2018). Serial dependence in the perception of visual variance. Journal of Vision, 18(7), 4, https://doi.org/10.1167/18.7.4. [PubMed]
Tanrıkulu, Ö. D., Chetverikov, A., & Kristjánsson, Á. (2021). Testing temporal integration of feature probability distributions using role-reversal effects in visual search. Vision Research, 188, 211–226. [PubMed]
Tong, K., & Dubé, C. (2022). A tale of two literatures: A fidelity-based integration account of central tendency bias and serial dependency. Computational Brain & Behavior, 5(1), 103–123.
Ullman, S. (1979). The interpretation of visual motion. Cambridge, MA: MIT Press.
Utochkin, I. S., & Brady, T. F. (2020). Individual representations in visual working memory inherit ensemble properties. Journal of Experimental Psychology: Human Perception and Performance, 46(5), 458. [PubMed]
Weiss, Y., Simoncelli, E. P., & Adelson, E. H. (2002). Motion illusions as optimal percepts. Nature Neuroscience, 5(6), 598–604. [PubMed]
Whitney, D., & Yamanashi Leib, A. (2018). Ensemble perception. Annual Review of Psychology, 69, 105–129. [PubMed]
Xiang, Y., Graeber, T., Enke, B., & Gershman, S. J. (2021). Confidence and central tendency in perceptual judgment. Attention, Perception, & Psychophysics, 83(7), 3024–3034. [PubMed]
Zepp, J., Dubé, C., & Melcher, D. (2021). A direct comparison of central tendency recall and temporal integration in the successive field iconic memory task. Attention, Perception, & Psychophysics, 83(3), 1337–1356. [PubMed]
Appendix: Ensemble perception vs. subsampling
We present evidence that randomly selecting one item (out of the set of six) would not produce the same pattern of results as perceiving, inherently, the mean of the set. While overall performance may not differentiate between these two models, we present detailed analyses of the results of particular trial types that do differentiate. We do not discuss the possibility of selecting two or more items, since this is beyond the scope of the present article, and, as pointed out by Whitney and Leib (2018), as long as more than one item is subsampled, an averaging mechanism is still required. 
In brief, we show that the probability of choosing one test item over the other does not depend on the number of items that are closer to it, as would be expected according to the sampling model (Figure A2, right); rather, the choice of test item monotonously increases when its distance from the average decreases (compared with the other item's distance; Figures A1 and A2, left). For these analyses, we introduce the term “skewness,” the number of set items that are closer to the same test item. Skewness depends on the distribution of set items and on the sizes of the two test items, relative to this distribution. Both of these (distribution and test items) vary from trial to trial. For example, if the two test items are between the central two set items, then half the set items would be closer to one test item and half to the other, or 50% skewness. If four set items are smaller (or alternatively larger) than both test items and two set items larger (or smaller) than both, then skewness would be 67%. If one set item is perceived (sampled), observers would be expected to choose the test item that is closer to the sampled set item. If the choice of sampled set item is random, then the chance of selecting the same test item will be a function of the skewness of the trial. Thus, according to the sampling model, experimental results should depend on trial skewness. To test this hypothesis, we separate our data according to trial skewness and plot results separately for each level of skewness. 
In the graph of Figure A1, we plot Experiment 1 performance, choice of the SEEN test item, as a function of Δ, the difference in distances of the two test items from the mean circle size of the set distribution. This is similar to the graph of text Figure 2b. However, here we separate the data according to trial skewness. The expectation of the ensemble perception model is dependence on Δ, as indeed found in the graph. On the other hand, the sampling model predicts consistent dependence on skewness at each point of the graph, which is not seen. Thus, this graph supports the ensemble perception model. 
Another comparison of the two models is shown in Figure A2. In the left graph, we compare Experiment 1 choice of the item closer to Tmean only at skewness = 67% (where we have the most data per participant), comparing performance for small absolute Δ (1–4) and for large absolute Δ (5–8). Performance is significantly better for large Δ (0.68 ± 0.01 vs. 0.61 ± 0.008; t test p < 0.001; d = 0.62). In the right graph of Figure A2, we compare Experiment 1 choice of the item closer to Tmean only at absolute Δ = 2 (where we have the most data), comparing performance for small skewness (50–58%) and for large skewness (67–83%). While performance is significantly above 0.5 chance level for both skewness levels, the difference between them is not in the direction expected by the sampling model and only slightly significant (0.62 ± 0.015 vs. 0.57 ± 0.014; t test p = 0.02; d = 0.09). 
In conclusion, although we cannot rule out the possibility that observers may also recall one or more specific set items, which may have an auxiliary effect on their responses, in this case, there is clearly predominantly an ensemble mean effect. We conclude that observers implicitly perceive ensemble mean. 
Figure A1.
 
Contrasting the ensemble perception and the sampling models. Fraction of Experiment 1 observer choice of the SEEN item as a function of the difference between the distances of the two test items from the mean of the set distribution. Data points reflect different trial “skewness.” The lack of constant order in these points suggests little effect of memory of a random sample of one set element. The sigmoid performance dependence on Δ supports the ensemble perception model.
Figure A1.
 
Contrasting the ensemble perception and the sampling models. Fraction of Experiment 1 observer choice of the SEEN item as a function of the difference between the distances of the two test items from the mean of the set distribution. Data points reflect different trial “skewness.” The lack of constant order in these points suggests little effect of memory of a random sample of one set element. The sigmoid performance dependence on Δ supports the ensemble perception model.
Figure A2.
 
Contrasting the ensemble perception and the sampling models. Fraction of observer choice of the test item closer to Tmean or (equivalently) to most set individuals for small and large absolute Δ (left) and for small and large skewness (right). The dependence on absolute Δ is highly significant and the dependence on skewness is in the unexpected direction. See Appendix text. Error bars are standard error.
Figure A2.
 
Contrasting the ensemble perception and the sampling models. Fraction of observer choice of the test item closer to Tmean or (equivalently) to most set individuals for small and large absolute Δ (left) and for small and large skewness (right). The dependence on absolute Δ is highly significant and the dependence on skewness is in the unexpected direction. See Appendix text. Error bars are standard error.
Figure 1.
 
Experiment 1 design. (a) Illustration of trial procedure for two consecutive trials—six circles are presented serially in each trial with time intervals (100 ms + 100-ms interstimulus interval), and then, following a masking stimulus, a two-alternative forced-choice membership task is presented; Δ parameters: upper example, trial “t –1,” standard trial with nonequidistant SEEN and NEW test items. (In this example, the NEW and SEEN test items are 3 and 1 units from current trial mean, respectively, so that Δ(Tmean) = 3 – 1 = 2, with the SEEN item closer to the current mean); lower example, trial “t,” an equidistant trial where SEEN and NEW test items are the same distance from the current trial mean, so that the different distances from the recent trial mean, Rmean, can influence current trial choice of test item. (In the example shown, both SEEN and NEW are 2 units from the current trial mean, so that Δ(Tmean) = 2 – 2 = 0; here the difference in distances from the recent (t – 1) mean, Rmean, is Δ(Rmean) = 2 – 6 = –4, with the NEW test item closer to the recent mean). (b) Size probability distributions of stimuli and means of Experiments 1 and 2 (Session 1) are represented by the black and red curves, respectively; stimulus and mean distributions of Experiment 3 (Session 2) are represented by the gray and blue curves, respectively. Stimuli are normally distributed (Gaussian fits), while the trial means are distributed in a trapezoidal shape, with the 10 most central means of each session with uniform probability (dots represent the actual probability of presenting each size). (c) Trial subtypes and conditions. “in” = in trial range, not mean; “out” = out of trial range.
Figure 1.
 
Experiment 1 design. (a) Illustration of trial procedure for two consecutive trials—six circles are presented serially in each trial with time intervals (100 ms + 100-ms interstimulus interval), and then, following a masking stimulus, a two-alternative forced-choice membership task is presented; Δ parameters: upper example, trial “t –1,” standard trial with nonequidistant SEEN and NEW test items. (In this example, the NEW and SEEN test items are 3 and 1 units from current trial mean, respectively, so that Δ(Tmean) = 3 – 1 = 2, with the SEEN item closer to the current mean); lower example, trial “t,” an equidistant trial where SEEN and NEW test items are the same distance from the current trial mean, so that the different distances from the recent trial mean, Rmean, can influence current trial choice of test item. (In the example shown, both SEEN and NEW are 2 units from the current trial mean, so that Δ(Tmean) = 2 – 2 = 0; here the difference in distances from the recent (t – 1) mean, Rmean, is Δ(Rmean) = 2 – 6 = –4, with the NEW test item closer to the recent mean). (b) Size probability distributions of stimuli and means of Experiments 1 and 2 (Session 1) are represented by the black and red curves, respectively; stimulus and mean distributions of Experiment 3 (Session 2) are represented by the gray and blue curves, respectively. Stimuli are normally distributed (Gaussian fits), while the trial means are distributed in a trapezoidal shape, with the 10 most central means of each session with uniform probability (dots represent the actual probability of presenting each size). (c) Trial subtypes and conditions. “in” = in trial range, not mean; “out” = out of trial range.
Figure 2.
 
Experiment 1: effects of current trial test item statistics on performance. (a) Accuracy in the membership task for the various trial subtypes. Each symbol represents the average accuracy of a single participant; horizontal bars denote group averages, and error bars represent standard error of the mean. Performance when the SEEN item equals the trial sequence mean, Tmean, is significantly above baseline and when the NEW item equals Tmean, significantly below baseline; in baseline trials, neither SEEN nor NEW equals Tmean; performance is best when the NEW test item is outside the trial sequence range. (b) Gradual effect by relative difference of absolute distances of the SEEN and NEW test items from the current trial mean (Δ(Tmean)), as defined above and in Figure 1a, with a best-fit sigmoid curve. Performance depends parametrically on the relative proximity of the SEEN test item and relative distance of the NEW test item from the trial sequence mean. When they are the same distance (0 difference), there is no average bias. When the SEEN is more distant, participants fail to detect its presence in the set and score below chance, whereas when it is closer, they score well above chance. Data for this plot were collected from in-range trials only. (c) RT for correct (green) and incorrect (orange) trials in the various trial subtypes (e.g., fastest correct response is when the NEW is out of range). Data collected from four blocks with all distances. Error bars in (a) and (c) are standard error of the mean; in (b), they were <0.02, not shown. *Indicates t test result of p < 0.05. ***Indicates p < 0.001.
Figure 2.
 
Experiment 1: effects of current trial test item statistics on performance. (a) Accuracy in the membership task for the various trial subtypes. Each symbol represents the average accuracy of a single participant; horizontal bars denote group averages, and error bars represent standard error of the mean. Performance when the SEEN item equals the trial sequence mean, Tmean, is significantly above baseline and when the NEW item equals Tmean, significantly below baseline; in baseline trials, neither SEEN nor NEW equals Tmean; performance is best when the NEW test item is outside the trial sequence range. (b) Gradual effect by relative difference of absolute distances of the SEEN and NEW test items from the current trial mean (Δ(Tmean)), as defined above and in Figure 1a, with a best-fit sigmoid curve. Performance depends parametrically on the relative proximity of the SEEN test item and relative distance of the NEW test item from the trial sequence mean. When they are the same distance (0 difference), there is no average bias. When the SEEN is more distant, participants fail to detect its presence in the set and score below chance, whereas when it is closer, they score well above chance. Data for this plot were collected from in-range trials only. (c) RT for correct (green) and incorrect (orange) trials in the various trial subtypes (e.g., fastest correct response is when the NEW is out of range). Data collected from four blocks with all distances. Error bars in (a) and (c) are standard error of the mean; in (b), they were <0.02, not shown. *Indicates t test result of p < 0.05. ***Indicates p < 0.001.
Figure 3.
 
Contraction toward the recent trial mean, Rmean, in equidistant trials. (a) Accuracy in the membership task for the three trial subtypes. Each symbol represents the average accuracy of a single participant; horizontal bars denote group averages, and error bars represent standard error of the mean. Performance when the SEEN item equals the recent trial sequence mean, Rmean, is significantly above baseline, and when the NEW item equals Rmean, it is significantly below baseline; in baseline trials, neither SEEN nor NEW equals Rmean. (b) Rate of selecting the test item closer to the recent mean. Each circle reflects average performance of a single participant; horizontal line corresponds to average across participants; error bars, standard error of the mean. (c) Average accuracy as a function of the difference (Δ) of the test items’ distance from the recent trial mean (Rmean). Red symbols represent data from Experiment 2; blue symbols represent data from Experiment 3; sigmoid curve is calculated across data of both experiments. Standard error of the mean was <0.02.
Figure 3.
 
Contraction toward the recent trial mean, Rmean, in equidistant trials. (a) Accuracy in the membership task for the three trial subtypes. Each symbol represents the average accuracy of a single participant; horizontal bars denote group averages, and error bars represent standard error of the mean. Performance when the SEEN item equals the recent trial sequence mean, Rmean, is significantly above baseline, and when the NEW item equals Rmean, it is significantly below baseline; in baseline trials, neither SEEN nor NEW equals Rmean. (b) Rate of selecting the test item closer to the recent mean. Each circle reflects average performance of a single participant; horizontal line corresponds to average across participants; error bars, standard error of the mean. (c) Average accuracy as a function of the difference (Δ) of the test items’ distance from the recent trial mean (Rmean). Red symbols represent data from Experiment 2; blue symbols represent data from Experiment 3; sigmoid curve is calculated across data of both experiments. Standard error of the mean was <0.02.
Figure 4.
 
Comparing current trial mean and recent trial mean effect on task performance within the subjects of the first experimental session. The data for these analyses are taken from the same 100 participants of Experiments 1 and 2. (a) Sigmoid curve fitting of the fraction of selecting SEEN item as a function of the distances of test items from the current (purple) and recent (red) trial mean. Data are calculated and averaged across participants for each Δ value. (b) Within-subject correlation of the current trial mean effect with the recent trial mean effect. Tmean and Rmean effects are calculated by subtraction of the task performance in trial subtype NEW = mean from SEEN = mean, for each subject. Most data points are above the dashed line of equal Tmean and Rmean effects, showing a larger Tmean effect for most subjects. (c) Test–retest reliability of the Tmean effect. Data are divided into odd blocks versus even blocks. (d) Test–retest reliability of the Rmean effect. Data are divided into odd blocks versus even blocks. Significant Spearman correlation shows within-subject consistency. Each hollow symbol in (b, c, d) corresponds to a single subject; filled circles in (a) represent data averaged across participants; solid line is the correlation between participants' performance.
Figure 4.
 
Comparing current trial mean and recent trial mean effect on task performance within the subjects of the first experimental session. The data for these analyses are taken from the same 100 participants of Experiments 1 and 2. (a) Sigmoid curve fitting of the fraction of selecting SEEN item as a function of the distances of test items from the current (purple) and recent (red) trial mean. Data are calculated and averaged across participants for each Δ value. (b) Within-subject correlation of the current trial mean effect with the recent trial mean effect. Tmean and Rmean effects are calculated by subtraction of the task performance in trial subtype NEW = mean from SEEN = mean, for each subject. Most data points are above the dashed line of equal Tmean and Rmean effects, showing a larger Tmean effect for most subjects. (c) Test–retest reliability of the Tmean effect. Data are divided into odd blocks versus even blocks. (d) Test–retest reliability of the Rmean effect. Data are divided into odd blocks versus even blocks. Significant Spearman correlation shows within-subject consistency. Each hollow symbol in (b, c, d) corresponds to a single subject; filled circles in (a) represent data averaged across participants; solid line is the correlation between participants' performance.
Figure 5.
 
Rate of test item selection as a function of test item size in equidistant trials, in two session distributions of Experiments 2 and 3, with a best-fit Gaussian curve. Red and pink dots and curves correspond to rates of selecting the SEEN (membership accuracy) and NEW (membership error rate) test items in Experiment 2 (with smaller size circles). Blue and light blue points and curves denote the rate of selecting these items in Experiment 3 (with larger circle sizes). Each symbol corresponds to the average selection of a specific size of test item across participants. Curves were calculated over the entire data rather than over the averages at each test item value.
Figure 5.
 
Rate of test item selection as a function of test item size in equidistant trials, in two session distributions of Experiments 2 and 3, with a best-fit Gaussian curve. Red and pink dots and curves correspond to rates of selecting the SEEN (membership accuracy) and NEW (membership error rate) test items in Experiment 2 (with smaller size circles). Blue and light blue points and curves denote the rate of selecting these items in Experiment 3 (with larger circle sizes). Each symbol corresponds to the average selection of a specific size of test item across participants. Curves were calculated over the entire data rather than over the averages at each test item value.
Figure A1.
 
Contrasting the ensemble perception and the sampling models. Fraction of Experiment 1 observer choice of the SEEN item as a function of the difference between the distances of the two test items from the mean of the set distribution. Data points reflect different trial “skewness.” The lack of constant order in these points suggests little effect of memory of a random sample of one set element. The sigmoid performance dependence on Δ supports the ensemble perception model.
Figure A1.
 
Contrasting the ensemble perception and the sampling models. Fraction of Experiment 1 observer choice of the SEEN item as a function of the difference between the distances of the two test items from the mean of the set distribution. Data points reflect different trial “skewness.” The lack of constant order in these points suggests little effect of memory of a random sample of one set element. The sigmoid performance dependence on Δ supports the ensemble perception model.
Figure A2.
 
Contrasting the ensemble perception and the sampling models. Fraction of observer choice of the test item closer to Tmean or (equivalently) to most set individuals for small and large absolute Δ (left) and for small and large skewness (right). The dependence on absolute Δ is highly significant and the dependence on skewness is in the unexpected direction. See Appendix text. Error bars are standard error.
Figure A2.
 
Contrasting the ensemble perception and the sampling models. Fraction of observer choice of the test item closer to Tmean or (equivalently) to most set individuals for small and large absolute Δ (left) and for small and large skewness (right). The dependence on absolute Δ is highly significant and the dependence on skewness is in the unexpected direction. See Appendix text. Error bars are standard error.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×