Temporary storage of information in visual short-term memory (VSTM) is a key component of many complex cognitive abilities. However, it is highly limited in capacity. Understanding the neurophysiological nature of this capacity limit will require a valid animal model of VSTM. We used a multiple-item color change detection task to measure macaque monkeys' VSTM capacity. Subjects' performance deteriorated and reaction times increased as a function of the number of items in memory. Additionally, we measured the precision of the memory representations by varying the distance between sample and test colors. In trials with similar sample and test colors, subjects made more errors compared to trials with highly discriminable colors. We modeled the error distribution as a Gaussian function and used this to estimate the precision of VSTM representations. We found that as the number of items in memory increases the precision of the representations decreases dramatically. Additionally, we found that focusing attention on one of the objects increases the precision with which that object is stored and degrades the precision of the remaining. These results are in line with recent findings in human psychophysics and provide a solid foundation for understanding the neurophysiological nature of the capacity limit of VSTM.

*Maccaca mulatta*) aged 4 to 5 years and weighing 11 to 13 kg at the time of the experiments. Subjects' daily fluid intake was regulated in order to maintain motivation to perform the task. During testing, subjects sat in a primate chair facing a 19-inch LCD computer screen placed at a distance of 32 cm. A pair of computers running NIMH Cortex (http://www.cortex.salk.edu) controlled the timing and presentation of stimuli. We monitored eye movements using an infrared camera with ISCAN software. All procedures were in accord with the National Institute of Health guidelines and the recommendations of the U.C. Berkeley Animal Care and Use Committee.

*L a**

*b** color space. We fixed the luminance at

*L*= 70 and varied the

*a** and

*b** parameters to produce 24 different colored squares that span a rectangle in this cross-section of the color space (Figure 1, inset). All 24 colors could be used both as sample and as test colors. The distance between two colors in this color space is given by

*d*=

*a*

_{1}*,

*b*

_{1}*) and (

*a*

_{2}*,

*b*

_{2}*) are the parameters for colors 1 and 2, respectively. We avoided potential similarity between the colors in the sample array by enforcing a lower bound on the distance between all the colors

*i*and

*j*in the array to

*d*

_{ i,j }≥ 50. Additionally, we restricted the distance between the sample and the test colors (Δ

*E*) to the set Δ

*E*∈ (0, 40, 50, 60, 70, 80, 90, 100).

*μ*and variance

*σ*

^{2}. Equation 1 assumes that subjects have a noisy internal representation of change and will respond “change” whenever the internal representation exceeds a criterion given by

*μ*. We incorporated a guessing parameter

*g*into this model to account for the fact that on some proportion of trials subjects might fail to detect a change and instead guess that a change has occurred. On a certain proportion of trials,

*g*, the subject randomly guesses, while on the remaining proportion of trials, 1 −

*g*, the subject's behavior can be described by the model given by Equation 1. This leads to the second model:

*b,*which captures any bias that subjects may have to moving the lever in one direction:

*σ*

^{2}. To obtain reliable parameter estimates, we pooled trials together across sessions.

*σ*

_{0}(Zhang & Luck, 2008). In this model, the probability of responding change when the number of “slots” available,

*k*, is greater or equal or less than the number of items in the display,

*N*, is given by (Cowan & Rouder, 2009):

*k*≥

*N*, subjects store all the items in VSTM with precision

*σ*

_{0}, and if there are more slots than items, then an item can be stored in more than one slot. On the other hand, when

*k*<

*N*, the subject stores only a fraction of the items (

*k*/

*N*) in VSTM with precision

*σ*

_{0}and guesses if the probed item is not in memory. Additionally, to allow for a fair comparison with the resources model (Equation 3), we incorporated a bias parameter to account for any bias that the subjects have for moving the lever in one direction. Thus, the full form of the slots model we used is given by

*p*-value as the proportion of shuffled differences whose value was greater or equal than the value of the observed difference.

*E*. Figure 2 shows that both subjects could remember all colors at a high level of performance. The mean performance across sessions was 89 ± 3% and 80 ± 5% for subjects G and I, respectively (black error bars in Figure 2). For all colors, performance was generally better for larger values of Δ

*E*.

*E*= 100) in order to determine subjects' maximal performance in this task. Figures 3a and 3b show that for both subjects as the set size increased, performance decreased significantly (1-way ANOVA; subject G:

*F*

_{3,223}= 280,

*p*< 1 × 10

^{−15}, subject I:

*F*

_{3,187}= 39.0,

*p*< 1 × 10

^{−15}). For set size one, performance was close to 90% for subject G and 79% for subject I. For both subjects, each additional item significantly lowered performance (post-hoc Tukey–Kramer test,

*p*< 0.05), with the exception of the drop from three items to four items for subject G.

*F*

_{3,223}= 82.7,

*p*< 1 × 10

^{−15}, subject I:

*F*

_{3,187}= 18.9,

*p*< 1 × 10

^{−9}) and the false alarm rate significantly increased with increasing set size (1-way ANOVA, subject G:

*F*

_{3,223}= 37.4,

*p*< 1 × 10

^{−15}, subject I:

*F*

_{3,187}= 3.27,

*p*< 0.025). For subject G, each additional object significantly decreased the hit rate and increased the false alarm rate with the exception of increasing from three to four items (post-hoc Tukey–Kramer tests,

*p*< 0.05). For subject I, the hit rate for set size one was significantly higher than all other set sizes and the hit rate for set size two was significantly higher than set size four (post-hoc Tukey–Kramer tests,

*p*> 0.05); other differences were not significant. Additionally, for subject I, only the false alarm rate for set size four was significantly different from set sizes one and two (post-hoc Tukey–Kramer tests,

*p*< 0.05).

*E*= 100. As the number of items in memory increased, the subjects took longer to respond (Kruskal–Wallis test, both subjects

*p*< 1 × 10

^{−16}). For both subjects, each additional item, up to set size three, significantly slowed reaction times (post-hoc Tukey–Kramer test,

*p*< 0.05). The addition of a fourth item did not affect reaction times.

*k*=

*N*(

*h*−

*f*), where

*h*and

*f*are the hit and false alarm rates, respectively, and

*N*is the number of items in the display. We applied this formula to our data using trials with the highest discriminability between sample and test colors (Δ

*E*= 100) and obtained an estimate for subject G of

*k*= 0.88 and

*k*= 0.83 for subject I. This seems to contradict previous reports that macaques have a capacity of around 2 items. However, this fixed-capacity model of VSTM makes the assumption that information is stored in VSTM in discrete fashion, i.e., that an item is either in memory or not in memory. An alternative account of the nature of VSTM argues that this need not be the case, and instead, it proposes that VSTM is a continuous resource that is split among the different items at the expense of representational fidelity. To distinguish between these two accounts, we looked at how the fidelity of the representations varied as a function of the number of items in memory as we parametrically varied the difference between the sample and test colors.

*E*

*E*= 100). We would expect performance on trials with small Δ

*E*to be worse than trials with large Δ

*E*because a small difference between sample and test colors makes it more difficult for subjects to detect a color change. For both subjects, and at all set sizes, performance was at chance for the lowest value of Δ

*E*(Figure 4).

*E,*as set size increased, performance dropped significantly below chance. This suggests that subjects were unable to discriminate between colors separated by a distance of 40 units or less and were judging both colors to be the same (thus making more incorrect responses than by simply guessing). As the sample and test colors became more discriminable (higher Δ

*E*), performance improved for set size one and eventually plateaued at around 93% for subject G and 84% for subject I. For trials with two and three items, the subjects performed better than chance only when the difference between sample and test colors was large (binomial test; subject G:

*p*< 0.05 for set size two at Δ

*E*= 80, for set sizes two and three at Δ

*E*= 100; subject I:

*p*< 0.05 for set sizes two and three at Δ

*E*≥ 80).

*E*) and fitted the models specified by Equations 1–5 to this probability. To obtain as reliable an estimate as possible, we pooled our data across sessions and subjects. We calculated the AIC and the corresponding Akaike weights for each fit (Table 1). For the variable-precision models (Equations 1–3), we found that the evidence overwhelmingly favors Model 3 relative to Models 1 and 2 for all set sizes above 1. Since the Akaike weights overwhelmingly favor Model 3, we base our estimates of precision on Model 3, which we refer to as the “variable-precision” model, and do not consider Models 1 and 2 further.

Set size | ||||
---|---|---|---|---|

1 | 2 | 3 | 4 | |

Model 1 | 9738 (0.0) | 13,652 (0.0) | 13,050 (0.0) | 5656 (0.0) |

Model 2 | 9696 (0.28) | 13,649 (0.0) | 13,035 (0.0) | 5645 (0.0) |

Model 3 | 9697 (0.16) | 13,279 (1.0) | 12,752 (1.0) | 5514 (1.0) |

Slots (k = 1) | 9696 (0.28) | 13,617 (0.0) | 12,969 (0.0) | 5547 (0.0) |

Slots (k not fixed) | 9696 (0.28) | 13,606 (0.0) | 12,967 (0.0) | 5548 (0.0) |

*k*below or above the number of items in the display, a value for

*k*is needed in order to properly fit it to the data. Since all of our behavioral measures show a sharp decline for set sizes larger than 1, we tested a model in which the parameter

*k*was fixed at a value of exactly 1. We set the value of

*σ*

_{0}to the value estimated using only set size one trials. Table 1 shows the AIC values and Akaike weights for this model. We found that the variable-precision model (Model 3) outperformed the fixed-precision slots model. In an attempt to improve the slots model fit, we did not restrict

*k*= 1, and instead, we recalculated the fit with

*k*as a free parameter determined by MLE. We found the MLE estimate

*k*= 1.12. The AIC value (Table 1) indicates that using this value gives a better fit compared to

*k*= 1, even with the penalty of an extra parameter. The improved fit of this model, however, was still worse compared to the variable-precision model.

*k*gave the best fit of the two fixed-precision slots models, we fit these two models separately to both subjects (Figure 5) and calculated the goodness of fit of the model. The fit for both models was significant in both subjects (

*F*-test,

*p*< 0.05) for set sizes one, two, and three. Furthermore, the fit was consistently better for the variable-precision model relative to the fixed-precision model in both subjects.

*σ*

^{2}) of the variable-precision model. For both subjects, the precision became smaller as a function of set size (Figure 6). Subjects showed a significant decrease in precision for trials with more than one item (

*σ*

^{2}estimates for set size >1 fall outside the 99% confidence interval for set size one). This indicates that as soon as a second item was added to memory, the representation of both objects was significantly degraded. Adding a second item to the display caused the precision to drop to almost half of the maximum for subject G and to about 70% of the maximum for subject I. Adding third and fourth items caused the precision to drop further in both subjects.

*p*> 0.1). For set sizes two and above, subjects could make congruent, incongruent, or no microsaccades. We found that there was a significant improvement in performance on congruent trials (permutation test,

*p*< 0.01) and a significant decrease in performance on incongruent trials (permutation test,

*p*< 0.001) for both set sizes two and three.

*p*< 1 × 10

^{−3}). Additionally, for set sizes two and three, an incongruent microsaccade resulted in a significantly smaller precision compared to congruent microsaccades (permutation test,

*p*< 1 × 10

^{−3}).

*d*′, (which is used as a measure of precision) decreases in a power law fashion. Similarly, our results show that the precision of VSTM representations dramatically decreases as soon as more than one item is loaded into memory. These results point to a model in which VSTM can be thought of as a limited resource that can be flexibly allocated to encode multiple items in memory. According to this view, there is a limited pool of resources that is shared out between remembered items such that adding more items to memory reduces the resources allocated to each item. This reduction is manifested as decreased precision of stored memories, which adversely affects behavioral performance in these tasks. This account of VSTM does not necessarily require that items be stored in an all-or-none fashion leading to a fixed capacity. Rather, this model allows for the flexible distribution of resources among the items in memory so that information about all or most of the items is represented, albeit not very detailed information.