The precision of visual working memory (VWM) representations decreases as time passes. It is often assumed that VWM decay is random and caused by internal noise accumulation. However, forgetting in VWM could occur systematically, such that some features deteriorate more rapidly than others. There exist only a few studies testing these two models of forgetting, with conflicting results. Here, decay of features in VWM was thoroughly tested using signal detection theory methods: psychophysical classification images, internal noise estimation, and receiver operant characteristic (ROC). A modified same–different memory task was employed with two retention times (500 and 4000 ms). Experiment 1 investigated VWM decay using a compound grating memory task, and Experiment 2 tested shape memory using radial frequency patterns. Memory performance dropped some 15% with increasing retention time in both experiments. Interestingly, classification images showed virtually indistinguishable weighting of stimulus features at both retention times, suggesting that VWM decay is not feature specific. Instead, we found a 77% increase in stimulus-independent internal noise at the longer retention time. Finally, the slope of the ROC curve plotted as *z*-scores was shallower at the longer retention time, indicating that the amount of stimulus-independent internal noise increased. Together these findings provide strong support for the idea that VWM decay does not result from a systematic loss of some stimulus features but instead is caused by uniformly increasing random internal noise.

*z*-score) coordinates. The slope of the ROC curve is determined by the ratio of noise variances between different and similar trials (Green & Swets, 1966; Macmillan & Creelman, 2004).

^{2}and mean luminance of 48 cd/m

^{2}. The viewing distance (110 cm) was controlled using a chin rest. Experiments were run on MATLAB (MathWorks, Natick, MA) Psychophysics Toolbox extension version 3.0.11 (Brainard, 1997; Kleiner, Brainard, & Pelli, 2007) and custom-built scripts. Results were analyzed using custom MATLAB scripts. JASP 0.13.1.0 was used for the Bayesian ANOVA analyses and R 4.1.1 (R Foundation for Statistical Computing, Vienna, Austria) for the linear mixed modeling. For the linear mixed model analysis, we used the lme4 and lmerTest packages (Bates, Mächler, Bolker, & Walker, 2015; Kuznetsova, Brockhoff, & Christensen, 2017), which use Satterthwaite's method for

*P*value estimation (Satterthwaite, 1946).

*definitely similar*,

*probably similar*,

*probably different*, or

*definitely different*. Our task differed from a standard same–different task in that the memory and test stimuli were never completely identical; in similar trials a low level of Gaussian distributed phase noise (σ = 5.7°) was added to the test stimulus, whereas in different trials the noise level was high (σ = 57°). Phase noise values of over ±90° were clipped, preventing angular circularity. Observers started with a short practice session (about 50 trials) to familiarize them with the stimuli and task. Then, 1120 trials per retention time (2240 in total) were measured in blocks of 80 trials in four sessions on separate days (seven blocks per session). For the double-pass response consistency procedure, each trial was presented twice (in randomized order) within a block.

*A*), which can be considered a response criterion-free version of probability of correct responses, where 0.5 equals chance level and 1 is perfect performance. We chose to use

_{z}*A*because it is unambiguous and intuitive and does not require assumption of equal variance for the decision variable, which must be taken into account as the variance of the response variable in different trials was higher than in similar trials. The ROC curves were formed using standard methods by first computing the cumulative hit and false-alarm rates (hit meaning a “similar” response in a similar trial and false alarm a “similar” response in a different trial) for each criterion the observer used for rating scale responses. This was done by computing the cumulative probabilities of responding less than or equal to

_{z}*o*for each rating scale option

*o*, separately for similar and different trials. The ROC curves were then transformed to

*z*-scores. On visual inspection, the ROC curves were approximately straight lines, consistent with the assumption that the distribution of the decision variable is Gaussian (Green & Swets, 1966; Macmillan & Creelman, 2004). We then fitted ROC lines to determine slopes

*k*and intercepts

*b.*It can be shown (Swets & Pickett, 1982) that the area under the ROC curve is given by

**w**whose estimate is the CI. We assume that on a given trial

*k*that observers make their decisions using the decision variable

*r*

_{s}_{,}

*, which calculates the weighted sum between the template and squared value (as the sign of the phase angle difference is not relevant) of a phase noise vector*

_{k}**n**that was added to the test stimulus:

*r*is random internal noise that is assumed to be independent and Gaussian.

_{i}*r*

_{s}_{,}

*with a set of internal criteria so that the observer gives a rating of*

_{k}*j*when the decision variable falls between criteria

*c*and

_{j}*c*

_{j}_{+1}. Assuming that internal noise is normally distributed, the probability for the observer to give a “different” response with confidence

*l*is

**ŵ**(i.e., the CI; see Knoblauch & Maloney, 2008). The GzLM is a generalization of the linear regression model where the dependent variable belongs to the exponential family; it uses a nonlinear link function to relate the linear model to the dependent variable and allows the use of a nominal dependent variable with multiple criteria. The model had 10 regressors for the decision weights, three regressors for three internal criteria parameters corresponding to the response categories, and one regressor for the true stimulus category (similar–different). Fitting was done using MATLAB's

*mnrfit*function. Finally, as the weights of a CIs are dependent on internal weights, divided by a scalar that is dependent on internal noise (Ahumada, 2002; Knoblauch & Maloney, 2008), we normalized the regressors by dividing the raw weights by their vector length. This allowed us to investigate memory weights independently of internal noise.

*r*) and on internal noise (

_{e}*r*) so that

_{i}*r*=

_{s}*r*+

_{e}*r*. For simplicity, we can assume that the standard deviation of the model's response to external noise is 1 and internal noise standard deviation is σ

_{i}*. We let*

_{i}*= [–∞,*

**c***c*

_{1},

*c*

_{2},

*c*

_{3}] be the response criteria for four confidence levels and then estimated the ratio of internal to external noise standard deviations, which is the same as σ

*. For two passes of the same stimuli,*

_{i}*r*

_{e}_{1}=

*r*

_{e}_{2}=

*r*. The probability of response pair

_{e}*a*

_{1}=

*l*and

*a*

_{2}=

*k*is

*x*) is the standard cumulative normal distribution. The expectation for the consistency can then be solved by integrating over all external noise values (for details, see Kurki & Eckstein, 2014). As noted previously, strictly speaking the probability density function for

*r*is approximately chi-squared with degrees of freedom

_{e}*k*, depending on how many features are weighted. However, as classification images show that humans use multiple features, and they do not show a difference in the number of features in short and long duration, we simply approximate the distribution of

*r*by a Gaussian distribution.

_{e}*k*is determined by

*k*= σ

*/σ*

_{d}*, the ratio of decision variable standard deviation in different and similar trials (Green & Swets, 1966; Macmillan & Creelman, 2004). Note that similar trials are the “signal” distribution of the SDT model, and different trials that had 10 times more phase noise are the “noise” distribution. Thus, we assume that the variance of the decision variable for the different trials will be greater than in the same trials and*

_{s}*k*to be >1. When inspecting the slopes, note that in many ROC studies the “signal” trials have had more variance than the “noise” trials; thus,

*k*< 1. Also, in recognition memory studies, the signal is typically defined as old trials, where there is more variance resulting in

*k*values less than 1.

^{2}

*and σ*

_{ed}^{2}

*are external noise variance in the different and similar trials, and σ*

_{es}^{2}

*is the internal noise variance at retention time*

_{it}*t*, assumed to be the same in similar and different trials.

^{2}

*grows with retention time, but σ*

_{it}^{2}

*and σ*

_{ed}^{2}

*remain the same, and*

_{es}*k*approaches 1. On the other hand, if internal noise does not change with retention time, there should be no change in

*k*over short durations. We estimated

*k*for each retention time using the standard method with least-squares linear regression.

^{2}

*, it would, in principle, also be possible to directly test the random decay model by predicting the ROC slopes using Equation 5, as all terms could be estimated from double-pass data. However, as noted before, the external noise-to-internal noise ratio could not be reliably estimated in similar trials, and the external noise standard deviation was 10 times higher in different trials. We therefore approximated the prediction of the model by using an approximation where external noise has negligible (zero) contribution to responses in similar trials by setting the parameter σ*

_{it}^{2}

*to 0. That is, performance would be completely limited by internal noise. This enabled us to predict the ROC curve slopes and compare them with observed values.*

_{es}*A*) decreased about 15% when compared between retention times of 500 ms and 4000 ms (

_{z}*M*

_{500}= 0.82,

*SD*

_{500}= 0.05;

*M*

_{4000}= 0.70,

*SD*

_{4000}= 0.08), with

*t*(8) = 7.18,

*p*< 0.001, and

*d*= 1.78. Measured as

*d*′

*(Green & Swets, 1966; Macmillan & Creelman, 2004), a version of*

_{e}*d*′ adapted to situations where response variables in the signal and distractor have different variances as was the case here, performance dropped approximately 42% with the longer retention time (

*d*′

_{e}_{500}= 1.50;

*d*′

_{e}_{4000}= 0.89), and the difference between retention times was statistically significant in a repeated measures

*t*-test, where

*t*(8) = 11.08,

*p*< 0.001, and

*d*= 2.10. Because all trials were presented twice for the double-pass method, we also compared performance in the first and second passes of the trials. There was no difference in

*A*in the first pass (

_{z}*M*

_{1}= 0.77,

*SD*

_{1}= 0.06) or second pass (

*M*

_{2}= 0.77,

*SD*

_{2}= 0.05), with

*t*(8) = 0.09,

*p*= 0.927, and

*d*= 0.03.

*t*-tests where the

*p*values were false discovery rate (FDR) corrected for multiple comparisons (Benjamini & Hochberg, 1995). Seven out of 10 SF components at the 500-ms and five out of 10 components at the 4000-ms retention time had

*p*values smaller than 0.05 (indicated by asterisks in Figure 3A). This result implies that the VWM for compound gratings can maintain a broad range of SF components. Retention time did not change weighting in any apparent fashion; both retention times showed similar weights. Figure 3B shows CI weights for four representative observers. Although there was greater variation compared to the average results, slightly larger weights were concentrated in lower frequencies.

*F*(1,8) = 0.14,

*p*= 0.714, and η

_{p}^{2}= 0.018; for SF,

*F*(1,8) = 1.02,

*p*= 0.431, and η

_{p}^{2}= 0.113; and for interaction,

*F*(1,8) = 0.68,

*p*= 0.721, and η

_{p}^{2}= 0.079. The lack of main effect for SF shows that there was no evidence of differences in the weighting of various SF components; that is, no evidence that forgetting in VWM resulted from a change in memory tuning. The lack of a main effect for retention time indicates that there is no evidence of overall magnitude of weights changing with time, which was expected, because of normalization. The lack of interaction suggests that there is no evidence of weighting changing with retention time.

*M*

_{500}= 1.30,

*SD*

_{500}= 0.15;

*M*

_{4000}= 2.21,

*SD*

_{4000}= 1.10), where

*t*(8) = 2.36,

*p*= 0.046, and

*d*= 1.16. In a one-way paired-samples Bayesian

*t*-test, where the alternative hypothesis was that internal noise level increased at the 4000-ms retention time, the Bayes factor was 8.144 in favor of the alternative hypothesis. This result provides evidence for the idea that internal noise increases over time and causes forgetting.

*A*) in the memory task dropped approximately 14% between the 500-ms (

_{z}*M*= 0.95,

*SD*= 0.02) and 4000-ms (

*M*= 0.82,

*SD*= 0.07) retention times, where

*t*(8) = 6.18,

*p*< 0.001, and

*d*= 2.70 (Figure 5A). For

*d*′

*, the difference in performance between retention times was about 45% (*

_{e}*d*′

_{e}_{500}= 2.58;

*d*′

_{e}_{4000}= 1.41), and it was statistically significant in a repeated-measures

*t*-test:

*t*(8) = 8.67,

*p*< 0.001, and

*d*= 3.17. There was no difference in performance in the first (

*M*

_{1}= 0.90,

*SD*

_{1}= 0.03) and second (

*M*

_{2}= 0.90,

*SD*

_{2}= 0.03) passes of the trials:

*t*(8) = 1.70,

*p*= 0.128, and

*d*= 0.57.

*t*-tests with FDR correction for multiple comparisons. Five out of 10 RF components at the 500-ms and six out of 10 RF components at the 4000-ms retention times had memory weights significantly different from zero (

*p*< 0.05) (Figure 6A). As in Experiment 1, the CIs appeared to have similar shapes at short and long retention times.

*F*(1,8) = 0.44,

*p*= 0.525, and η

_{p}^{2}= 0.052; however, the main effect for the RF component was statistically significant, with

*F*(1,8) = 4.44,

*p*< 0.001, and η

_{p}^{2}= 0.357. Thus, there was a memory tuning in RF component weights. Critically, the interaction between retention time and RF components was not significant, with

*F*(1,8) = 1.47,

*p*= 0.177, and η

_{p}^{2}= 0.155), implying that there is no evidence of RF components decaying selectively at the 4000-ms retention time. We confirmed this observation with a two-way Bayesian repeated-measures ANOVA; the inclusion Bayes factor was 1076.210 for radial frequency, indicating that there was a strong effect for RF. The inclusion Bayes factors for retention time and interaction of retention time and RF were 0.204 and 0.202, respectively. Based on this analysis, there was no effect of retention time or the interaction of retention time and RF on CI weighting, indicating that forgetting is not dependent on the RF of the shape components.

*M*= 0.94,

*SD*= 0.57) to the 4000-ms (

*M*= 1.72,

*SD*= 0.93) retention time (Figure 5B). The increase was statistically significant, with paired samples

*t*-test

*t*(8) = 2.47,

*p*= 0.038, and

*d*= 1.00. According to a one-tailed Bayesian paired-sample

*t*-test, the Bayes factor was 4.328, suggesting that the alternative hypothesis was more likely to hold than the null hypothesis. This favors the idea that internal noise level was affected by retention time.

*z*-scores (see the individual ROCs in the Appendix). The mean slope for the observers’ ROC curves for the 500-ms retention time (across experiments) was 1.32 (

*SD*= 0.26), which differed significantly from 1, with

*t*(17) = 5.17,

*p*< 0.001, and

*d*= 1.23. The slope for the ROC curve for the 4000-ms retention time was shallower:

*M*= 1.15 (

*SD*= 0.19). The average intercepts of the model were 2.26 (

*SD*= 0.93) for the 500-ms retention time and 1.17 (

*SD*= 0.60) for the 4000-ms retention time. We ran a linear mixed model analysis with experiment, retention time, and their interaction as fixed effects and observer as a random effect to compare ROC slopes in different conditions. We found an effect in ROC slopes for both fixed effects: For retention time, β = –0.26,

*t*(17) = –3.68, and

*p*= 0.002; for experiment, β = –0.30,

*t*(26.95) = –3.19, and

*p*= 0.004. There was no effect on the interaction of retention time and experiment, with β = 0.19,

*t*(16) = 1.84, and

*p*= 0.085, indicating that there was no evidence of ROC slopes behaving differently in Experiments 1 and 2. The effect on the intercept was statistically significant, with β = 1.47,

*t*(26.95) = 21.94, and

*p*< 0.001, reflecting the fact that performance declined with the longer retention time.

*) is 0. Predicted slopes were 1.48 for the 500-ms retention time and 1.20 for the 4000-ms retention time. We found that the prediction was reasonably good, with relatively high correlation between observers’ predicted and observed slopes (ρ = 0.557;*

_{es}*p*< 0.001). In Figure 7B, there seems to be five outliers with an observed slope greater than 1.5; to check, we calculated correlation without these outliers. We found that correlation between observed and predicted slopes remained high (ρ = 0.531;

*p*= 0.003). We also checked that the correlation could not be explained merely by performance (

*A*) in the memory task. We found that the partial correlation controlling for

_{z}*A*was lower, but it remained statistically significant (ρ = 0.386;

_{z}*p*= 0.039).

*z*-scores. The slope of the ROC curves, which reflects the ratio of combined external and internal noise in different versus similar trials, decreased with retention time. As external noise exhibited much greater variance in the different trials, we expected this ratio to be well above 1, as was found for short retention time. A decrease of the slope closer to 1 gives further evidence to the idea that performance is limited by an increase in stimulus-independent noise. As we consistently found that the ROC slopes in long and short retention times differed, our analysis supports a model where forgetting is caused by an increase in internal noise. Moreover, we found reasonably good agreement in slopes when estimated from the double-pass and ROC data. This further suggests that a straightforward increase in internal noise with retention time could explain the findings without any extra assumptions. On the other hand, it is not easy to explain these findings through systematic forgetting with no change in internal noise.

*z*-transformed ROC curves (see evidence for the HT model in Rouder, Morey, Cowan, Zwilling, Morey, & Pratte, 2008), a finding that contradicts our observation of linear

*z*-transformed ROC curves. Thus, our results do support a model where there is at least a signal detection component in VWM, but we cannot rule out the possibility that our assumptions do not hold. Then again, we did find increasing internal noise based on the double-pass analysis, which does not make assumptions about the nature of VWM.

*Journal of Vision,*2(1):8, 121–131, https://doi.org/10:1167/2.1.8. [CrossRef]

*The Journal of the Acoustical Society of America,*49, 1751, https://doi.org/10.1121/1.1912577. [CrossRef]

*Journal of Statistical Software,*67(1), 1–48. https://doi.org/10.18637/jss.v067.i01. [CrossRef]

*Journal of Neuroscience,*34(10), 3632–3645, https://doi.org/10.1523/JNEUROSCI.3204-13.2014. [CrossRef]

*Trends in Cognitive Sciences,*19(8), 431–438, https://doi.org/10.1016/j.tics.2015.06.004. [CrossRef]

*Journal of Vision,*9(10):7, 1–11, https://doi.org/10.1167/9.10.7. [CrossRef]

*Science,*321(5890), 851–854, https://doi.org/10.1126/science.1158023. [CrossRef]

*Vision Research,*48(21), 2336–2344, https://doi.org/10.1016/j.visres.2008.07.015. [CrossRef]

*Vision Research,*49(8), 843–850, https://doi.org/10.1016/j.visres.2009.03.001. [CrossRef]

*Vision Research,*47(11), 1518–1522, https://doi.org/10.1016/j.visres.2007.01.006. [CrossRef]

*Journal of the Royal Statistical Society: Series B (Methodological),*57(1), 289–300, https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.

*Journal of Experimental Psychology: Human Perception and Performance,*23(2), 353–369, https://doi.org/10.1037/0096-1523.23.2.353.

*Journal of the Optical Society of America A,*5(4), 617–627, https://doi.org/10.1364/JOSAA.5.000617. [CrossRef]

*The Journal of Physiology,*197(3), 551–566, https://doi.org/10.1113/jphysiol.1968.sp008574. [CrossRef]

*PLoS One,*13(7), e0200292, https://doi.org/10.1371/journal.pone.0200292. [CrossRef]

*Behavioral and Brain Sciences,*24(4), 87–185, https://doi.org/10.1017/S0140525x01003922.

*Annual Review of Psychology,*31, 309–341, https://doi.org/10.1146/annurev.ps.31.020180.001521. [CrossRef]

*Current Directions in Psychological Science,*11(1), 19–23, https://doi.org/10.1111/1467-8721.00160. [CrossRef]

*Perspectives on Psychological Science,*13(2), 190–193, https://doi.org/10.1177/1745691617720478. [CrossRef]

*Journal of Vision,*11(12):3, 1–12, https://doi.org/10.1167/11.12.3.

*Journal of Experimental Psychology: Human Perception and Performance*, 37(4), 1051–1064, https://doi.org/10.1037/a0023091. [PubMed]

*The Journal of Physiology,*252(3), 627–656, https://doi.org/10.1113/jphysiol.1975.sp011162. [CrossRef]

*Psychological Science,*16(10), 769–774, https://doi.org/10.1111/j.1467-9280.2005.01612.x. [CrossRef]

*Visual pattern analyzers*. New York: Oxford University Press.

*Psychological Review,*71(5), 392–407, https://doi.org/10.1037/h0044520. [CrossRef]

*Signal detection theory and psychophysics*. New York: Wiley.

*Human memory and cognitive capabilities: Mechanisms and performances*(pp. 173–187). Amsterdam: Elsevier Science.

*Proceedings of the National Academy of Sciences, USA,*117(15), 8391–8397, https://doi.org/10.1073/pnas.1918143117. [CrossRef]

*Proceedings of the National Academy of Sciences, USA,*105(19), 6829–6833, https://doi.org/10.3758/s13423-014-0699-x. [CrossRef]

*Perception,*36, 1–16, https://doi.org/10.1177/03010066070360S101.

*Journal of Vision,*8(16):10, 1–19, https://doi.org/10.1167/8.16.10. [CrossRef]

*Journal of Vision,*14(11):6, 1–18, https://doi.org/10.1167/14.11.6. [CrossRef]

*Journal of Statistical Software,*82(13), 1–26, https://doi.org/10.18637/jss.v082.i13. [CrossRef]

*Vision Research,*48(20), 2106–2127, https://doi.org/10.1016/j.visres.2008.03.006. [CrossRef]

*Journal of Vision,*15(7):1, 1–19, https://doi.org/10.1167/15.7.1. [CrossRef]

*Nature,*390(6657), 279–281, https://doi.org/10.1038/36846. [CrossRef]

*Trends in Cognitive Sciences,*17(8), 391–400, https://doi.org/10.1016/j.tics.2013.06.006. [CrossRef]

*Detection theory: A user's guide*. New York: Psychology Press.

*Trends in Neurosciences,*23(6), 247–251, https://doi.org/10.1016/S0166-2236(00)01569-1. [CrossRef]

*Scandinavian Journal of Psychology,*50(6), 535–542, https://doi.org/10.1111/j.1467-9450.2009.00783.x. [CrossRef]

*Psychological Research,*62(2–3), 81–92, https://doi.org/10.1007/s004260050043.

*European Journal of Cognitive Psychology,*2(4), 345–362, https://doi.org/10.1080/09541449008406212. [CrossRef]

*Journal of Experimental Psychology: Human Perception and Performance,*22(1), 202–212, https://doi.org/10.1037/0096-1523.22.1.202.

*Journal of Cognitive Psychology,*32(4), 391–408, https://doi.org/10.1080/20445911.2020.1767627. [CrossRef]

*Journal of Vision,*11(5):2, 1–25, https://doi.org/10.1167/11.5.2. [CrossRef]

*Psychological Review,*124(1), 21–59, https://doi.org/10.1037/rev0000044. [CrossRef]

*Journal of Experimental Psychology: Learning Memory and Cognition,*43(4), 528–536, https://doi.org/10.1037/xlm0000328.

*Proceedings of the National Academy of Sciences, USA,*112(23), 7321–7326, https://doi.org/10.1073/pnas.1422169112. [CrossRef]

*Journal of Experimental Psychology: Learning Memory and Cognition,*46(1), 60–76, https://doi.org/10.1037/xlm0000719.

*Proceedings of the National Academy of Sciences, USA,*105(16), 5975–5979, https://doi.org/10.1073/pnas.0711295105. [CrossRef]

*Vision Research,*50(6), 623–629, https://doi.org/10.1016/j.visres.2010.01.014. [CrossRef]

*Biometrics Bulletin,*2(6), 110, https://doi.org/10.2307/3002019. [CrossRef]

*Journal of Neuroscience,*38(21), 4859–4869, https://doi.org/10.1523/JNEUROSCI.3440-17.2018. [CrossRef]

*Nature Human Behaviour,*4(11), 1156–1172, https://doi.org/10.1038/s41562-020-00938-0. [CrossRef]

*Perception and Psychophysics,*66(7), 1202–1205, https://doi.org/10.3758/BF03196846. [CrossRef]

*Evaluation of diagnostic systems: Methods from signal detection theory*. New York: Academic Press.

*Vision Research,*38(22), 3555–3568, https://doi.org/10.1016/S0042-6989(98)00039-X. [CrossRef]

*Psychological Bulletin,*133(5), 800–832, https://doi.org/10.1037/0033-2909.133.5.800. [CrossRef]

*Psychological Science,*20(4), 423–428, https://doi.org/10.1111/j.1467-9280.2009.02322.x. [CrossRef]

*z*-transformed ROC curves

*z*-transformed ROC curves for all observers in Experiments 1 and 2, respectively. By visual inspection, no apparent nonlinearity can be seen in the

*z*-transformed ROC curves. Further, we found that the linear model provided very good fits for all observers, as

*R*

^{2}values in Experiment 1 were on average 0.986 (range, 0.968–0.998;

*SD*= 0.011) and 0.994 (range, 0.970–1.000;

*SD*= 0.009) for the 500-ms and 4000-ms retention times, respectively. The corresponding values in Experiment 2 were 0.981 (range, 0.937–0.999;

*SD*= 0.025) and 0.993 (range, 0.978–0.999;

*SD*= 0.007).