The recent controversy surrounding the nature of working memory has focused on what the
precision of recall reveals about the mechanisms underlying memory (Alvarez & Cavanagh,
2004; Awh et al.,
2007; Bays & Husain,
2008,
2009; Cowan & Rouder,
2009; Wilken & Ma,
2004; Zhang & Luck,
2008). In this study, we examined performance on a task in which subjects were asked to recall the color of an object displayed at a specified location. Our findings show that the precision with which subjects report this color declines with increasing number of objects in the memory array. This finding is consistent with a model of visual working memory in which a common resource must be shared out between all items in the display (Bays & Husain,
2008). In this model, the precision with which an item is stored depends on the fraction of the total resource allocated to its storage. Because observers do not know which item will be probed when they view an array, the resource will, on average, be shared out equally among all items; hence, performance declines with increasing number of items.
Contrary to this view, performance on this same task has recently been put forward as strong evidence for the existence of a fixed number of discrete object representations or “slots” in visual working memory (Zhang & Luck,
2008). The observed effect of the number of objects on precision—and in particular the large difference in precision between one and two item arrays (
Figure 1b)—cannot be reconciled with the traditional model in which each item is stored in a separate slot (Cowan,
2005; Luck & Vogel,
1997; Pashler,
1988; Vogel et al.,
2001). Instead, Zhang and Luck (
2008) propose a modification to the original slot model whereby slots can “double up” and store the same item, combined with an averaging process to obtain a single estimate per item. This modification allows the slot model to behave like a quantized version of the resource model and hence exhibit the same dependence of precision on the number of items stored, albeit at substantial cost to the parsimony and conceptual power of the original model. One might question the utility of the slot concept if it must be modified so that there is now no longer a one-to-one correspondence between a slot and a visual object that is represented.
Despite making many similar predictions for behavior, this modified slot model remains fundamentally different from the resource model and has radically different implications for how the brain solves the problem of storing visual information. Understanding the nature of visual short term memory is crucial to understanding how observers perceive the world (O'Regan,
2001; Simons & Rensink,
2005), deploy attention to visual items (Awh & Jonides,
2001; Bundesen & Habekost,
2008; de Fockert et al.,
2001; Lepsien & Nobre,
2007; Soto & Humphreys,
2006), or dynamically acquire information about a scene from glimpses obtained between eye movements (Henderson,
2008; Irwin,
1991). The color report task provides a key paradigm to consider and test these opposing views.
One crucial distinction that is retained by Zhang and Luck's (
2008) modified scheme is that the slot model, unlike the resource model, predicts a fixed upper limit on the number of items that can be simultaneously held in memory. In their analysis of the color report task, Zhang and Luck considered responses that could not be explained by simple Gaussian variability in memory for the target color (
Figure 1c, top) to be due to random guesses (
Figure 1c, middle). These random responses were interpreted as evidence for just such an upper limit on the number of items stored. According to this interpretation, random responses occur on trials where no information is stored about the probed item because the number of array items exceeds the maximum number of items that can be stored. As substantial numbers of these responses are observed even with array sizes as small as three items (Zhang & Luck,
2008,
2009), this interpretation implies that the average capacity limit is about two.
However, one critical factor that has previously been overlooked on this task (Wilken & Ma,
2004; Zhang & Luck,
2008,
2009) is the need for subjects to remember the
locations of the array items as well as their color. Subjects are instructed to report the color of only one of the items held in memory: the item that matched the location of the probe. Therefore, subjects must compare the probe location with the remembered location of each array item to determine which color to report. The resource model predicts that locations stored in working memory will be corrupted by noise, in the same way as colors. Therefore, observers will sometimes incorrectly identify which item was at the probed location and mistakenly report the remembered color of one of the non-probed items (
Figure 1c, bottom).
Our analysis confirms that subjects are more likely to be biased in their responses by the colors of non-probed items than by chance alone (
Figure 2b). Importantly, when responses to the non-targets are taken into account, we have shown that the majority of responses Zhang and Luck (
2008) interpreted as random guesses are in fact due to errors in memory for location, as predicted by a resource model (
Figure 3).
The resource model proposes that the precision with which an item is stored is determined by the fraction of total memory resources allocated to it. This may have a very simple neural interpretation in terms of population coding: because there is substantial noise in the activity of any individual neuron, the precision of the population estimate of a sensory feature is determined by the number of neurons involved in encoding it (Dayan & Abbott,
2001; Seung & Sompolinsky,
1993; Vogels,
1990). The tuning-curve properties of neurons do not allow a single cell to simultaneously encode two different feature values; therefore, the distribution of a common memory resource in this model may, at the simplest level, correspond to the assignment of a finite pool of memory neurons to encode the different feature values in a scene. An alternative proposal, which makes very similar predictions, is that the resource corresponds to a limit on the total
number of spikes expended maintaining a scene in memory (Ma & Huang,
2009).
Previous results suggest that visual features on different dimensions do not compete for representation in working memory (Luck & Vogel,
1997; Wheeler & Treisman,
2002), so we predict that storage of colors and locations will depend on separate resources. Nonetheless, as the number of items stored in memory increases, the resource model predicts that error will increase in the stored representations of
both color and location. This was indeed observed: both variability in memory for color and frequency of errors due to memory for location increased with increasing array size (
Figure 3).
An additional source of error that may also contribute to the non-target responses is “misbinding” (Robertson,
2003; Treisman,
1998; Treisman & Schmidt,
1982; Wolfe & Cave,
1999) in which, for example, the colors of two items become inadvertently
switched in memory. In this situation, even if the subject correctly identifies which item was at the probed location, he or she will still respond with one of the non-target colors. Misbinding, in healthy people, has generally been observed only with very brief presentations (e.g., Treisman & Schmidt,
1982), implying that it is an error of encoding rather than memory, in which case we do not expect these errors to contribute substantially to our results at any but the shortest exposures. However, even if some of the responses Zhang and Luck (
2008) viewed as random are in fact due to misbinding rather than location errors, this does not support the interpretation that some items have not been stored, and so is equally inconsistent with a slot model.
A small proportion of apparently random responses could not be explained by either uncertainty in color or location. However, the frequency of these unexplained responses proved highly dependent on the presentation duration of the memory array (
Figure 3i). This suggests that these errors occurred when the exposure time was too short for all the visual information in the array to be encoded into working memory (Bundesen,
1998). While previous studies have observed no advantage of increasing array duration above 100 ms for unmasked displays (Luck & Vogel,
1997; Vogel et al.,
2001), these tests were based on detection of supra-threshold changes in color and were therefore insensitive to the precision with which items were stored.
The encoding errors observed in this study showed a dependence on the number of items in the array, suggesting that individual items or features must compete for entry into memory. This finding is consistent with previous change detection results using brief masked displays (Vogel, Woodman, & Luck,
2006; Woodman & Vogel,
2005). Competition may simply result from the need to serially allocate attention to each item in order to encode it into memory (Desimone & Duncan,
1995; Treisman,
1998): if multiple items are presented very briefly, some may not have been attended by the time the display is blanked. Alternatively, encoding may depend on a resource-limited parallel process similar to the one proposed here for storage.
At the longest exposure times, encoding errors were minimal and the distribution of responses was explained by a combination of errors in memory for color and location, as predicted by the resource model. We conclude that the high frequency of “guessing” reported on this task by Zhang and Luck (
2008), and taken to indicate an upper limit on storage, was in fact the result of two factors. First, very brief presentation of the memory array may have led to incomplete encoding of some items, independent of errors in storage. Second, Zhang and Luck's analysis considered only variability in the response feature (color) and overlooked the possibility of errors in the feature by which responses were cued (location).
In summary, we have found no evidence to support a fixed upper limit on the number of visual items that can be held in working memory, despite examining the same task previously used to argue for a “slot” model (Zhang & Luck,
2008). Our findings are equally inconsistent with the several “hybrid” models that have been proposed (Alvarez & Cavanagh,
2004; Awh et al.,
2007; Xu & Chun,
2005) in which a fixed upper limit of three or four items coexists with a variable limit on total “information load” or object complexity. These models similarly predict a rapid increase in random responses once the upper limit is exceeded, a prediction that is incompatible with the current results. Instead, performance on the color report task is best explained in terms of a common working memory resource that must be distributed increasingly finely as the number of visual items increases.
The symmetry and simplicity of the memory arrays makes equal distribution of resources to each item the most likely strategy on this task. However, resources can be allocated more flexibly: in a task where attention was drawn to one item in an array by a flash, memory resources were preferentially allocated to enhance representation of the salient item—at the cost of reducing the resolution with which other items were stored (Bays & Husain,
2008). Outside of the laboratory, the complexity of natural scenes is likely to preclude an even distribution of resources, and resource allocation may similarly prioritize storage of salient or goal-relevant visual objects (Itti & Koch,
2001).