Growing empirical evidence shows that ensemble information (e.g., the average feature or feature variance of a set of objects) affects visual working memory for individual items. Recently, Harrison, McMaster, and Bays (2021) used a change detection task to test whether observers explicitly rely on ensemble representations to improve their memory for individual objects. They found that sensitivity to simultaneous changes in all memorized items (which also globally changed set summary statistics) rarely exceeded a level predicted by the so-called optimal summation model within the signal-detection framework. This model implies simple integration of evidence for change from all individual items and no additional evidence coming from ensemble. Here, we argue that performance at the level of optimal summation does not rule out the use of ensemble information. First, in two experiments, we show that, even if evidence from only one item is available at test, the statistics of the whole memory set affect performance. Second, we argue that optimal summation itself can be conceptually interpreted as one of the strategies of holistic, ensemble-based decision. We also redefine the reference level for the item-based strategy as the so-called “minimum rule,” which predicts performance far below the optimum. We found that that both our and Harrison et al. (2021)’s observers consistently outperformed this level. We conclude that observers can rely on ensemble information when performing visual change detection. Overall, our work clarifies and refines the use of signal-detection analysis in measuring and modeling working memory.

*d′*:

*z*(H) and

*z*(FA) are

*z*-scores of a hit probability (answering “yes” given that the change is present) and a false alarm probability (answering “yes” given that the change is absent). Here the

*d′*can be interpreted as a measure of separation between the distributions of

*evidence for change*when change is present (signal distribution) and when change is absent (noise distribution).

*d′*

_{i}

*)*, then the full-set discriminability (

*d′*

_{total}

*)*can be expressed using the Pythagorean theorem, as follows (Hautus, Macmillan, & Creelman, 2022):

*n*is the number of items in the full set. This equation implies that

*d*′

_{total}is a distance between two

*n*-variate distributions, one corresponding to all items being sampled from

*n*noise distributions (no-change condition) and another corresponding to all items being sampled from their respective signal distributions (change in all

*n*components). As Harrison et al. (2021) suggest, Equation 2a predicts the

*d′*

_{total}if change detection is based on

*optimal summation*of evidence only from individual items. On the other hand,

*d′*

_{total}based on the summation of evidence from individual items can be defined via the distribution of sums of independent samples drawn from each of the

*n*distributions (we will refer to it as to the statistical solution). Defined this way,

*d′*

_{total}is the sum of individual

*d′*(

*d′*

_{i}

*)*normalized by their pooled standard deviation (assuming the standard deviation of each individual item's distribution is 1):

*d′*

_{total}can be found using a single one-item

*d′*(

*d′*

_{one}) measured for any of the items:

*n*individual items and from ensemble statistics (e.g., tracks if the average color changes across the displays) then ensemble statistics should form at least one more axis in the evidence space. This additional axis should yield an additional benefit to the

*d′*

_{total}, provided non-zero ensemble sensitivity (

*d′*

_{ensemble}). Harrison et al. (2021) express this in the following Pythagorean theorem equation:

_{ensemble}is the standard deviation of the ensemble noise distribution in proportion to the individual item's noise used as the unit of the discriminability space.

*d′*always a measure of ensemble-free memory?

*d′*measured in one-item change detection (

*d′*

_{one}) represents ensemble-free working memory performance. Indeed, if the observer memorizes a set of items and then sees a single test item, the observer cannot estimate a change in an ensemble statistic of the whole set. However, that does not automatically imply that the observer cannot compare the test item to the ensemble representation of the memory set, even if and especially if it is not obvious whether this individual item has changed: “I do not remember whether this particular disc has changed its color but now it looks redder than all original items on average.” If this is the case, then the observed

*d′*

_{one}combines a component coming from that item and another component coming from an ensemble. Therefore the

*d′*

_{one}is not necessarily a baseline for perfectly independent representations of individual items. In the present study, we will show that variation in ensemble properties of a memory set and the direction of change relative to the original feature distribution affect the

*d′*

_{one}in both one-item and full-set change detection.

*d′*

_{ensemble}of change detection can effectively elevate performance above the level optimal summation (Equation 3) if the ensemble dimension is orthogonal to individual-item dimensions in the multidimensional evidence space. This implies that ensemble information should be sampled independently from information used to represent individual items. In other words, the observer has to sample each pre-change and post-change item twice: One set of samples is to encode individual features and another is to encode ensemble summaries. Although such double sampling is not prohibited by the formal model, it appears not a very parsimonious and plausible strategy for working memory, given its limitations. On the other hand, if the item-based and ensemble-based sources of evidence come from the same set of samples, then the sum of individual changes and the ensemble change are perfectly correlated, such that predicted

*d′*

_{total}would always be at the level of optimal summation and can never exceed it. In other words, the model presented in Equation 4 in fact cannot distinguish between optimal summation with and without ensemble summary.

*d′*

_{one}as a model input and obtained the

*d′*

_{total}as a model output. As Figures 1A and 1B show, the model with ensemble memory and double sampling indeed predicted the boost in performance relative to the model with no ensemble memory, whereas the model with ensemble memory and single sampling performed at the same level as the model with no ensemble memory.

_{memory}). This additional independent noise should reduce the correlation between the optimally integrated evidence and evidence from ensemble summary statistics, which, in turn, might yield some benefit from using the summary statistics. On the other hand, some noise can be also applied at the integration stage (for example, when ensemble summary statistics are calculated). This integration noise (we can also add there a memory noise applied to the summary representation during retention and term everything “ensemble noise,” σ

_{ensemble}) counteracts the gain from applying the independent memory noise of the individual representations. The total amount of potential benefit from using ensemble summary statistics in addition to optimal summation, therefore, depends on (1) the ratio between the sampling noise and other sources of memory noise unrelated to sampling but applied to each individual item and (2) the ratio between the individual memory noise and ensemble noise that occurs when ensemble summaries are computed and stored in memory (Figure 1C).

*cov*(

*S*

_{optsum},

*S*

_{ensemble}) is covariation between random variables

*S*

_{optsum}and

*S*

_{ensemble}respectively sampled from the distributions of the optimally summed evidence for individual changes,

*S*

_{optsum}∼

*N*(µ =

*n × d′*

_{one}, σ = √

*n*) and evidence for ensemble summary change,

*S*

_{ensemble}∼

*N*(µ =

*d′*

_{ensemble}× σ

_{ensemble}, σ = σ

_{ensemble}). Model 1 (only optimal summation) is a case of this general model where

*d′*

_{ensemble}= 0 and σ

_{ensemble}= 0 and, hence,

*cov*(

*S*

_{optsum},

*S*

_{ensemble}) = 0. Model 2 (optimal summation with ensemble memory and double sampling) assumes non-zero

*d′*

_{ensemble}and σ

_{ensemble}but

*cov*(

*S*

_{optsum},

*S*

_{ensemble}) = 0, because

*S*

_{optsum}and

*S*

_{ensemble}are sampled independently. Model 3 (optimal summation with ensemble memory and single sampling) is the same as Model 2 but the covariation term is simply the product of the standard deviations of the two distributions,

*cov*(

*S*

_{optsum},

*S*

_{ensemble}) = √

*n ×*σ

_{ensemble}. In Model 4 (optimal summation with ensemble memory, single sampling and independent memory noise), 0 <

*cov*(

*S*

_{optsum},

*S*

_{ensemble}) < √

*n ×*σ

_{ensemble}and the noise of the individual item's evidence for changed is decomposed into two components: One component, σ

_{sampling}, related to the error in feature sampling from presented items and another component, σ

_{memory}, related to the error accumulated during memory retention, such that √[σ

^{2}

_{sampling}+ σ

^{2}

_{memory}] = 1. Because sampling error is perfectly correlated for evidence summation and ensemble summary computation and memory errors are independent, the proportion between σ

^{2}

_{sampling}and σ

^{2}

_{memory}determines the magnitude of the

*cov*(

*S*

_{optsum},

*S*

_{ensemble}). Finally, Model 5 is the same as Model 4, but the σ

_{ensemble}is explicitly allowed to vary in a range far exceeding 1 (the noise associated with sampling and remembering individual items) because of the additional integration error.

*criterion*. That is, on any individual trial, a random number is independently produced for each item from either the signal distribution (change present, mean =

*d′*

_{one}) or the noise distribution (change absent, mean = 0) and then these number are summed up to provide the cumulative evidence for change. This cumulative evidence is then compared against the criterion to deliver a “yes” or a “no” answer. Figures 2A and 2B visualizes this decision rule as a diagonal boundary

*C*in an example two-dimensional space (corresponding to change detection in memory set size 2, although this logic can be extended to any other dimensionalities and set sizes). The boundary

*C*is a locus of the criterion because the sum of coordinates at every point of this diagonal is the same. Each dot in the space is evidence for a change obtained in a single trial; dot coordinates correspond to the amount of evidence for change obtained from each of the items.

*minimum rule*(Hautus et al., 2022). That is, if there is enough evidence for at least one item changing the observer can answer “yes”; otherwise they answer “no”. Figures 2C–2D illustrate this rule, such that any point to the right from the vertical borderline or above the horizontal borderline warrants a “yes” answer and is shown with more saturated colors. Of course, the minimum rule is an extreme model and different observers can use more conservative rules (say, at least two or three items should provide enough evidence for change to warrant a “yes” answer). However, as shown in Hautus et al. (2022), even if the observer uses the most conservative, maximum rule (all items have to show enough evidence for change) that has a tiny effect on the overall

*d′*(

*d′*

_{total}) compared to the minimum rule. What is most important, Hautus et al. (2022) show that any strategy based on the separate criteria for each individual item (as in Figures 2C, 2D) would predict a substantially lower performance than the level of optimum summation (Figures 2A, 2B).

*d′*

_{total}based on the same

*d′*

_{one}, provides a way to distinguish between these decision rules. Because the decision space is linearly separable by the criterion in the optimal decision rule (Figures 2A–B), hit (

*H*

_{total}) and false alarm (

*FA*

_{total}) rates can therefore be defined as normal cumulative density functions Φ of distances between the centers of the corresponding multivariate distributions (which is exactly

*d′*

_{total}) and the criterion (Hautus et al., 2022):

*d′*

_{total}in the result. Therefore the optimal decision rule is indeed a strategy that provides performance at the level of optimal summation.

*H*) or correct rejections in (1-

_{i}*FA*) that, in turn, depend on individual

_{i}*d′*(

*d′*) and individual criteria (

_{i}*C*) as follows:

_{i}*n*is the memory set size and

*i*is the individual item's number. Consequently, if all items have the same one-item

*d′*(

*d′*) and an observer uses the same criterion for all items (

_{one}*C*), then the

_{one}*H*

_{total}and the

*FA*

_{total}are simply:

*d′*

_{total}as a function of the

*d′*

_{one}than the optimal rule model (Equation 3). The difference between the

*d′*

_{total}predicted by the minimum rule model and that predicted by the optimal rule model increases with memory set size. For example, assuming an observer using an unbiased decision criterion from the one-item condition (that is, a criterion that allows to keep the proportion of “yes” answers in that condition at about 0.5), in set size 2, the

*d′*

_{total}equals ∼1.41

*d′*

_{one}for the optimal rule model and ∼1.25

*d′*

_{one}for the minimum rule model; in set size 4, these are 2

*d′*

_{one}and ∼1.62

*d′*

_{one}, respectively; in set size 6, these are ∼2.45

*d′*

_{one}and ∼1.9

*d′*

_{one}, respectively. However, as Figures 2A and 2C show, keeping the unbiased one-item criterion in the full-set condition should substantially increase the overall proportion of “yes” answers. Both the optimal rule and the minimum rule models require the observer to raise their criterion from the one-item level to stay unbiased in the full-set condition (Figures 2B and 2D). As can be seen from Equations 1, 6, and 7, this adjustment of criterion does not change the empirically estimated

*d′*

_{total}if an observer uses the optimal decision rule. However, in the minimum rule model, raising the decision criterion for every single item further decreases the empirically estimated

*d′*

_{total}(whereas the theoretical

*d′*

_{total}as a distance between the multidimensional signal and noise distributions stays the same). For example, in set size 2, the minimum rule model with an unbiased full-set criterion predicts that the

*d′*

_{total}equals ∼1.22

*d′*

_{one}; in set size 4, it predicts that the

*d′*

_{total}equals ∼1.44

*d′*

_{one}, in set size 6, it predicts that the

*d′*

_{total}equals ∼1.57

*d′*

_{one}.

*d′*

_{total}did not exceed the predictions based on the optimal rule model. However, under the minimum rule model that we consider better fitting the strict definition of item-based decisions, the predictions for the

*d′*

_{total}are substantially lower and we will further show that Harrison et al.’s data actually exceed these predictions, suggesting that some decisions could be in fact ensemble based.

*d′*reflect the “ensemble-free” mode of change detection. In these experiments, we manipulated the range of feature variation, which is related to the precision of ensemble statistics that can be extracted from a set (Dakin, 2001; Fouriezos, Rubenfeld, & Capstick, 2008; Im & Halberda, 2013; Morgan, Chubb, & Solomon, 2008; Solomon, 2010; Marchant, Simons, & de Fockert, 2013; Rosenholtz, 2001; Utochkin & Brady, 2020; Utochkin & Tiurina, 2014; Watamaniuk & Sekuler, 1989). This range manipulation is directly linked to the aforementioned notion about item similarity as a determinant of items being perceived as a part of the same ensemble or not (Utochkin, 2015). In the second part, we implemented the two possible decision models (optimal rule and minimum rule) to demonstrate how the same estimated one-item

*d′*-s can yield substantially different predictions on the upper-bound

*d′*

_{total}. We then compared the data (from both Harrison et al., 2021, and our Experiment 1) with the model predictions to answer the question whether these data show evidence against an ensemble component in change detection.

*SD*= 0.7) took part in the experiment for course credits. All participants were tested having normal color vision and normal or corrected-to-normal visual acuity and reported having no neurological problems. Before the experiment, all participants gave informed consent. Harrison et al. (2021) showed that 20 participants is a sufficient sample size to obtain conclusive Bayes factors in a design like ours (using Bayesian statistics). Therefore our sample sizes were informed by this estimate.

*d′*, as in Equation 1. To deal with hit rates of 100% (which makes a

*z*-score undefined), we applied a correction of both hit and false rates suggested by Hautus (1995). Although only one participant showed 100% hits in one condition, this correction was applied to all the data for uniformity.

*t*-tests with a default prior (Rouder, Speckman, Sun, Morey, & Iverson, 2009). Harrison et al. (2021) used these tests to compare observed

*d′*

_{total}against those predicted by the optimal summation model given observed

*d′*

_{one}. We will address this comparison later, when we test our and Harrison et al. (2021)’s data against the two decision models (optimal summation and minimum rule). Here, we focus on estimating the effects of color range of a memory set on change detection. In particular, we ask whether this effect is existent in one-item detection and in full-set detection. Therefore our critical comparisons are between the narrow-range

*d′*and the broad-range

*d′*within the full-set and one-item conditions. The Bayesian t-tests were performed using the package “BayesFactor” for R (Morey, 2018).

*BF*

_{10}= 271.9) and for the full-set condition (

*BF*

_{10}= 29.5). These findings suggest that the distributional properties of the whole memory set and not only individual item discriminability contribute to change detection. In both one-item and full-set trials, it is easier to detect a change if the original memory set consists of highly similar items, as in our narrow-range condition. This can be interpreted in terms of signal-to-noise ratio, when the observer evaluates the amount of change across displays (signal) in relation to feature variability within the display (noise). Roughly, the observer evaluates how much the colors differ between the displays compared to how much they differ within the display. In theory, this strategy can be implemented without computing summary statistics: The observer can estimate pairwise differences between some of the pre-change items and then decide whether the post-change difference or differences are bigger than these original differences in the initial displays. Therefore, if color heterogeneity is small, this signal-to-noise ratio should be larger predicting better performance. When heterogeneity increases, then the signal-to-noise ratio decreases predicting a loss in performance. Alternatively, the advantage of the narrow-range sets can be interpreted as the availability of ensemble information to combine with information about the individuals, as was suggested in the introduction: For example, the impression of the “mean” color is stronger when items are highly similar and it can be easier to compare a post-change impression with a pre-change one both in terms of individual change and change relative to the mean. Whatsoever, any possible interpretation should acknowledge the fact that change detection performance strongly depends on the feature distribution of the memory set, even if it is tested on a single item.

*swap*errors when they occasionally compare a given post-change color with a pre-change color sampled from a wrong location. We will refer to this scenario as “swap” account.

*away*from all four pre-change colors (it can be seen in the in Figure 3A, left panel, where the narrow distributions of pre-change (black marks) and post-change (red marks) colors do not overlap on the color wheel). In contrast, our broad-range displays had a 120° range, in which case the 35° step of change pushed the post-change item away from the target but it could pull it toward some non-targets at the same time (in the same example in Figure 3A, where the broad distributions of pre-change (black marks) and post-change (red marks) strongly overlap on the color wheel). Specifically, the only possibility for the post-change item to move away from all pre-change colors was when the target was an extreme color in a pre-change color distribution and the change was outside this distribution (25% chance); in all other cases, the post-change color got closer to at least one of the non-target colors. In other words, changed colors in the broad-range displays on average were more similar to the pre-change colors than in the narrow-range displays. This could result in poorer discrimination of novel colors as well as in the increased number of swap errors (e.g., Oberauer & Lin, 2017). In Experiment 2, we balanced occurrences of target changes toward or away from the whole color distribution which allowed us to better control similarities between the post-change target color and pre-change non-target colors. With this control, we could test whether this “novel-color” or “swap” scenario could account for the narrow distribution advantage from Experiment 1.

*SD*= 0.45) took part in the experiment. All were tested having normal color vision and normal or corrected-to-normal visual acuity and reported having no neurological problems. At the beginning of the experiment, the participants gave written informed consent. Two participants were not included in the analysis because of low overall performance (less than 60% correct answers).

*BF*

_{10}= 16.7 × 10

^{4}), suggesting that participants were better at detecting item changes if the original memory set was narrow-range (Figure 4B). Second, we found some evidence that the outward changes were detected better than the inward changes (

*BF*

_{10}= 2.85) in the narrow-range color distributions. We also found evidence against a difference between the outward and inward changes in the broad-range stimulus distributions (

*BF*

_{10}= 0.238).

*d*′ in the inward condition were substantially above 0. Indeed, if the observers simply checked all pre-change colors, then they could not discriminate between no-change trials (the post-change color is the same as the pre-change color at the same location) and change trials (the post-change color is almost the same as one of the pre-change colors at a different location). In sum, the pattern of results in Experiment 2 suggests that the range effect on the

*d′*

_{one}is provided by the combination of information from the target item with respect to its location and some integral distributional information that we broadly can refer to as ensemble information.

*d′*

_{one}and

*d′*

_{total}were directly measured. We included the narrow-range and the broad-range conditions as separate data points. For each observer, we calculated two predicted

*d′*

_{total}based on their

*d′*

_{one}. One predicted

*d′*

_{total}was based on the optimal summation model (Equation 3) that is equivalent to the prediction from the optimal decision rule (Equations 6 and 7). Another predicted

*d′*

_{total}was based on the minimum rule model (Equations 10 and 11). To remind, for the minimum rule model, the predicted

*d′*

_{total}depends not only on the

*d′*

_{one}but also on the decision criterion set on evidence obtained from each item (

*C*

_{one}). Therefore, for each participant from our Experiment 1 we used a grid search to fit a

*C*

_{one}whose substitution into Equations 10 and 11 provided the same Yes rate (average of

*H*

_{total}and

*FA*

_{total}) as that observed in this participant. The best-fit pair of

*H*

_{total}and

*FA*

_{total}from Equations 10 and 11 were then substituted into Equation 1 to find a minimum-rule prediction for the

*d′*

_{total}. For Harrison et al. (2021) data set, where only

*d′*

_{one}is available without any information about proportions of hits and false alarms, we simply assumed an unbiased response strategy for all observers and, hence, fit their

*C*

_{one}to Yes rate = 0.5.

*d′*

_{total}against the predictions from the optimal rule model and from the minimum rule model. As in Harrison et al. (2021), the null hypothesis was that the observed

*d′*

_{total}are not greater than the model predictions. Then we calculated a meta-analytic Bayes factor (Morey, 2018; Rouder & Morey, 2011) to evaluate evidence for or against the null hypothesis across the experiments. We found that most of the data supported weak to moderate evidence for the null hypothesis with respect to the optimal rule model (0.07 <

*BF*

_{10}< 0.41; exceptions included the variance-change conditions of Experiments 5 and 6 and the mean-change conditions of Experiment 6 from Harrison et al., 2021:

*BF*

_{10}> 5.4). The meta-analysis showed weak overall evidence for the null hypothesis (

*BF*

_{10}= 0.52). On the other hand, we found strong evidence against the null hypothesis for all data points with respect to the minimum rule model (

*BF*

_{10}> 38). The meta-analysis showed very strong evidence against the null hypothesis (

*BF*

_{10}= 4.4 × 10

^{38}). We can conclude from these analyses that, in most cases, the observers performed no better than a model observer using the optimal decision rule (this is basically consistent with Harrison et al., 2021) but outperformed a model observer using the minimum rule decision rule. This is illustrated in Figure 6, where the data points are mostly concentrated around the prediction line of the optimal rule and, at the same time, they are fairly above the prediction line of the minimum rule.

*d′*= 0 in the “inward narrow” condition of Experiment 2 where the target color shifted from one tail of the color distribution to another but its distance from the mean did not change. It could be a more complex mixture of different strategies in different trials that eventually resulted in the suboptimal performance level compared to the pure optimal-rule model. However, a failure to perform optimally does not automatically imply that observers did not use integrated, ensemble-based information.

*d′*should become smaller.

*z*-distances between the average pre-change feature and the individual (in the one-item condition) or the average (in the full-set condition) post-change feature, which involves the representation of summary statistics. Many other ways of integration can be considered in other models and experimental designs in the future. (Beyond the main scope of the present article, multiple previous studies have concluded that ensemble integration may be actually suboptimal. This suggests that the effective contribution of individual items into the ensemble representation can be unequal (Dakin, 2001).)

*d′*strongly depended on whether the target change was inward or outward, that is, whether it changed its distance from the mean and whether it fell out of the range. In contrast, the direction of change was not important in the wide-range condition, suggesting more of the item-based change detection strategy.

*d′*

_{one}) and further integration when change detection in multiple items is performed (the job that the current version of the model does). As was said above, the main reason for the lack of such a model is the variety of qualitatively different plausible strategies that can explain the modulatory effect of feature distribution on the

*d′*

_{one}. One of the possible directions for the future model development can be focused on both empirical and computational work to disentangle between these candidate strategies.

*d′*in Equations 2a and 2b) is set not only by the difference between the pre-change and the post-change features but also by the direction of change relative to the feature distribution. Hence, a proper prediction for optimal summation or the minimum rule (full-set

_{i}*d′*

_{total}in Equations 2a, 2b, 8, and 9) should take into account

*d′*s measured separately for each item.

_{i}-*d′*-s are equal, and the predictions for the

*d′*

_{total}are exactly as in Equation 3. However, if the individual

*d′-*s are different, then the optimal

*d′*

_{total}should differ from that.

*d′*

_{one}) onto the multidimensional detection space is not sufficient to make straightforward predictions about ensemble-free integration of the set into working memory. Furthermore, we presented some theoretical arguments for why the optimal summation evidence from individual items inherently involves ensemble-based decisions. Overall, when the statistical structure of the memory set and a decision rule are taken into account, the data seem to support the idea that ensemble information is used in visual change detection.

**Data availability:**The data and R codes to analyze and visualize them are publicly available on Open Science Framework at https://osf.io/c3rvn/.

*Vision Research*, 83, 25–39, https://doi.org/10.1016/j.visres.2013.02.018. [CrossRef] [PubMed]

*Trends in Cognitive Sciences*, 15(3), 122–131, https://doi.org/10.1016/j.tics.2011.01.003. [CrossRef] [PubMed]

*Psychological Science*, 15(2), 106–111, https://doi.org/10.1111/j.0963-7214.2004.01502006.x. [CrossRef] [PubMed]

*Proceedings of the National Academy of Sciences,*106(18), 7345–7350, https://doi.org/10.1073/pnas.0808981106. [CrossRef]

*Psychological Science,*12(2), 157–162. [CrossRef] [PubMed]

*Nature Reviews Neuroscience,*7(5), 358–366, https://doi.org/10.1038/nrn1888. [CrossRef] [PubMed]

*Psychological Science*, 18(7), 622–628, https://doi.org/10.1111/j.1467-9280.2007.01949.x. [CrossRef] [PubMed]

*Working memory*. Oxford, UK: Clarendon Press.

*Journal of Neuroscience*, 34(10), 3632–3645. [CrossRef] [PubMed]

*Trends in Cognitive Sciences*, 19(8), 431–438. [CrossRef] [PubMed]

*Psychological Science*, 22(3), 384–392, https://doi.org/10.1177/0956797610397956. [CrossRef] [PubMed]

*Journal of Vision*, 15(15): 6. [CrossRef] [PubMed]

*Journal of Experimental Psychology: Learning, Memory and Cognition*, 41(3), 921–929. [PubMed]

*Psychological Review*, 120(1), 85–109, https://doi.org/10.1037/a0030779. [CrossRef] [PubMed]

*Psychological Science*, 28(1), 12–22, https://doi.org/10.1177/0956797616671524. [CrossRef] [PubMed]

*Acta Psychologica*, 138(2), 289–301, https://doi.org/10.1016/j.actpsy.2011.08.002. [CrossRef] [PubMed]

*The Pervasiveness of Ensemble Perception: Not Just Your Average Review*. In Enns, J. T. (Ed.),

*Elements in Perception*(pp. 1–96). Cambridge University Press.

*Behavioral and Brain Sciences*, 24(1), 87–114, https://doi.org/10.1017/S0140525X01003922. [PubMed]

*Psychological Science*, 15(9), 634–640, https://doi.org/10.1111/j.0956-7976.2004.00732.x. [PubMed]

*Journal of the Optical Society of America A*, 18(5), 1016–1026, https://doi.org/10.1364/JOSAA.18.001016.

*Annual Review of Psychology*, 66, 115–142, https://doi.org/10.1146/annurev-psych-010814-015031. [PubMed]

*Perception & Psychophysics*, 70(3), 456–464, https://doi.org/10.3758/PP.70.3.456. [PubMed]

*Journal of Vision*, 14(9), 22–22, https://doi.org/10.1167/14.9.22. [PubMed]

*Psychonomic Bulletin & Review*, 18(5), 855–859, https://doi.org/10.3758/s13423-011-0125-6. [PubMed]

*Current Biology,*25(24), 3213–3219, http://doi.org/10.1016/j.cub.2015.10.052.

*Cognition*, 214, 104763, https://doi.org/10.1016/j.cognition.2021.104763. [PubMed]

*d′*.

*Behavior Research Methods, Instruments, & Computers*, 27, 46–51.

*Detection Theory: A User's Guide*. Oxfordshire, UK: Routledge.

*Attention, Perception, & Psychophysics*, 75(2), 278–286, https://doi.org/10.3758/s13414-012-0399-4. [PubMed]

*Attention, Perception, & Psychophysics*, 83(3), 1050–1069, https://doi.org/10.3758/s13414-020-02046-7. [PubMed]

*Journal of Experimental Psychology: Learning, Memory, and Cognition*, 26(3), 683–702, https://doi.org/10.1037/0278-7393.26.3.683. [PubMed]

*Journal of Vision,*18(9), 23, https://doi.org/10.1167/18.9.23. [PubMed]

*Journal of Vision*, 15(4), 10, https://doi.org/10.1167/15.4.10. [PubMed]

*Nature*, 390(6657), 279–281, https://doi.org/10.1038/36846. [PubMed]

*Nature Neuroscience*, 17(3), 347–356, https://doi.org/10.1038/nn.3655. [PubMed]

*Acta Psychologica*, 142(2), 245–250, https://doi.org/10.1016/j.actpsy.2012.11.002. [PubMed]

*Journal of Vision*, 15(4), 6, https://doi.org/10.1167/15.4.6. [PubMed]

*Psychological Review*, 63(2), 81–97. [PubMed]

*Journal of Vision*, 8(11), 9.1–9.8, https://doi.org/10.1167/8.11.9. [PubMed]

*Psychonomic Bulletin & Review*, 1–18, https://doi.org/10.3758/s13423-023-02356-5.

*Psychological Review,*124(1), 21–59, https://psycnet.apa.org/doi/10.1037/rev0000044. [PubMed]

*Psychological Review*, 120(2), 297–328, https://doi.org/10.1037/a0031541. [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance*, 46(10), 1127–1147, https://doi.org/10.1037/xhp0000834. [PubMed]

*Nature Neuroscience,*4(7), 739–744. [PubMed]

*Behavior Research Methods,*51(1), 195–203, https://doi.org/10.3758/s13428-018-01193-y. [PubMed]

*Nature Human Behavior*. Advance online publication, https://doi.org/10.1038/s41562-023-01602-z.

*Journal of Experimental Psychology: Human Perception and Performance*, 27(4), 985. [PubMed]

*Psychonomic Bulletin & Review*, 18, 682–689, https://doi.org/10.3758/s13423-011-0088-7. [PubMed]

*Psychonomic Bulletin & Review*, 16(2), 225–237, https://doi.org/10.3758/PBR.16.2.225. [PubMed]

*Journal of Neuroscience*, 37(14), 3913–3925. [PubMed]

*Journal of Neuroscience*, 38(21), 4859–4869. [PubMed]

*Nature Human Behavior*, 4(11), 1156–1172, https://doi.org/10.1038/s41562-020-00938-0.

*Journal of Vision*, 10(14), 1–16, https://doi.org/10.1167/10.14.19. [PubMed]

*Journal of Experimental Psychology: Learning, Memory, and Cognition*, 46(1), 46–59, https://doi.org/10.1037/xlm0000722. [PubMed]

*Nature Neuroscience*, 3(3), 270–276. [PubMed]

*PsyArXiv*. https://psyarxiv.com/ndq9e/.

*Journal of Vision*, 15(4), 1–14, https://doi.org/10.1167/15.4.8.

*Journal of Experimental Psychology: Human Perception and Performance*, 46(5): 458–473, 10.1037/xhp0000727. [PubMed]

*Psychological Review*. Advance online publication, https://doi.org/10.1037/rev0000426.

*Acta Psychologica*, 146, 7–18, https://doi.org/10.1016/j.actpsy.2013.11.012. [PubMed]

*Cognition*, 152, 78–86, http://doi.org/10.1016/j.cognition.2016.01.010. [PubMed]

*Vision Research*, 29(1), 47–59. [PubMed]

*Vision Research,*50(22), 2274–2283, https://doi.org/10.1016/j.visres.2010.04.019. [PubMed]

*Annual Review of Psychology*, 69(1), 105–129, https://doi.org/10.1146/annurev-psych-010416-044232. [PubMed]

*Journal of Vision*, 4(12), 11, https://doi.org/10.1167/4.12.11.

*Cognitive Psychology,*105, 81–114, https://doi.org/10.1016/j.cogpsych.2018.06.001. [PubMed]

*Journal of Experimental Psychology: Human Perception and Performance,*18(1), 34–49, https://doi.org/10.1037/0096-1523.18.1.34. [PubMed]

*Nature*, 453(7192), 233–235, https://doi.org/10.1038/nature06860. [PubMed]

*Psychological Science,*20(4), 423–428, https://doi.org/10.1111/j.1467-9280.2009.02322.x. [PubMed]

*d′*

_{total}should change as a function of the predicting value of

*d′*

_{one}if the observer does or does not encode ensemble summary statistics in addition to optimal summation of evidence from individual summation. To this end, we simulated the ideal observer's performance in the full-set change detection task with set size 4 where individual item changes, when present, also caused change in either mean or variance of the whole set, as in the experiments by Harrison et al. (2021). Within the same set of trials, we implemented five versions of multidimensional SDT (all of which are individual cases of Equation 5 in the main text): (1) Model 1: only optimal summation (no ensemble memory), (2) Model 2: optimal summation + ensemble statistics with double sampling (different sets of samples are used to encode individual items and ensemble summaries), (3) Model 3: optimal summation + ensemble statistics with single sampling (both individual items and ensemble summary statistics are encoded from the same set of samples), (4) Model 4: optimal summation + ensemble statistics with single sampling and independent memory noise not related to sampling, and (5) Model 5: optimal summation + ensemble statistics with single sampling, independent memory noise, and late noise related to the computation of ensemble statistics. An R code to run these simulations and visualize its results (Figure 1) is available online at the same address as the other online materials.

*d′*

_{one}ranging from 0 to 1.5 with a step of 0.1. Instead of using physical spaces to assign stimulus features, we directly worked with the arbitrary signal-detection space where all stimulus differences were defined as the distance between corresponding Gaussian distributions in

*z*-units (σ = 1 for each distribution). Each such Gaussian distribution was used to simulate sampling from individual items. We arbitrarily assigned a base set of features values of −0.6, −0.2, 0.2, and 0.6. Like in Harrison et al. (2021), we altered the base set two different ways to accomplish change either in mean, or in variance. For mean-change trials, we added a

*d′*

_{one}to each of the base values, such that the whole distribution shifted to the right but its variance stayed the same. For variance-change trials, we added a −

*d′*

_{one}to the two negative values of the base set and a

*d′*

_{one}to the two positive values, such that the mean stayed the same but variance increased. On each trial of the change-present condition, the pre-change role was randomly assigned to either the base, or the altered set and then the remaining set took the post-change role. On each trial of the change-absent condition, either the base set or the altered set was used as both pre-change and post-change.

*d′*

_{one}reflects change discriminability between the pre-change and post-change samples, that is, sensitivity to the differences between the two samples, the standard deviations of the distributions from which samples were drawn, σ

_{pre}and σ

_{post}, were 1/√2, such that σ

^{2}

_{post – pre}= σ

^{2}

_{pre}+ σ

^{2}

_{post}= 1. In Models 4 and 5 with additional memory noise independent from that coming from sampling, we first set an arbitrary value of the memory noise, σ

_{memory}such that 0 < σ

_{memory}< 1. We assumed that the overall change-detection noise (σ = 1) is a linear combination of sampling noises in both displays (σ

_{pre}and σ

_{pre}) and independent memory noise, such that σ

^{2}

_{pre}+ σ

^{2}

_{post}+ σ

^{2}

_{memory}= 1. Therefore the sampling noise corrupting each individual pre-change and post change color in Models 4 and 5 was σ

^{2}

_{sampling}= √[(1 – σ

^{2}

_{mem})/2] (assuming σ

_{pre}= σ

_{post}). The first set of samples was used to encode individual feature values in all evidence integration models and for ensemble encoding in the models with ensemble memory and single sampling (Models 3–5). The second set of samples was used for ensemble encoding only in the model with ensemble memory and double sampling (Model 2). We then calculated the average and standard deviation for the pre-change and the post-change sample values (based on the first or the second set of samples).

_{memory}) was added to the difference between the pre-change and post-change samples. In all models including ensemble summary statistics (Models 2–5), we subtracted the pre-change summaries from post-change summaries, as we did for individual samples. In Model 5, the difference between the summaries was additionally corrupted by the ensemble noise (σ

^{2}

_{ensemble}) which accounted for all independent sources of error in ensemble memory independent from sampling.

*d′*

_{one}; the same was true for the summary statistics). To transform this bipolar arrangement to unipolar, we flipped the sign of the post-change—pre-change difference every time when the “ground truth” direction of change was negative, both for individual items and for summary statistics

*d′*

_{total}

*d′*

_{one}. The model answered “yes” if the cumulative evidence was larger than this criterion, and the model answered “no” otherwise. Proportions of “yes” answers across all change-present (hits) and change-absent (false alarms) trials were then used to calculate the

*d′*

_{total}(see Equation 1 in the main text) predicted by each model.