Free
Article  |   March 2015
The cost of misremembering: Inferring the loss function in visual working memory
Author Affiliations
Journal of Vision March 2015, Vol.15, 2. doi:10.1167/15.3.2
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Chris R. Sims; The cost of misremembering: Inferring the loss function in visual working memory. Journal of Vision 2015;15(3):2. doi: 10.1167/15.3.2.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Visual working memory (VWM) is a highly limited storage system. A basic consequence of this fact is that visual memories cannot perfectly encode or represent the veridical structure of the world. However, in natural tasks, some memory errors might be more costly than others. This raises the intriguing possibility that the nature of memory error reflects the costs of committing different kinds of errors. Many existing theories assume that visual memories are noise-corrupted versions of afferent perceptual signals. However, this additive noise assumption oversimplifies the problem. Implicit in the behavioral phenomena of visual working memory is the concept of a loss function: a mathematical entity that describes the relative cost to the organism of making different types of memory errors. An optimally efficient memory system is one that minimizes the expected loss according to a particular loss function, while subject to a constraint on memory capacity. This paper describes a novel theoretical framework for characterizing visual working memory in terms of its implicit loss function. Using inverse decision theory, the empirical loss function is estimated from the results of a standard delayed recall visual memory experiment. These results are compared to the predicted behavior of a visual working memory system that is optimally efficient for a previously identified natural task, gaze correction following saccadic error. Finally, the approach is compared to alternative models of visual working memory, and shown to offer a superior account of the empirical data across a range of experimental datasets.

Introduction
The theoretical construct of a loss function features prominently in research on perception and motor control (Körding, 2007; Körding & Wolpert, 2004; Ma, 2012; Maloney & Zhang, 2010; Wolpert & Landy, 2012). Informally, a loss function encapsulates the idea that the brain prefers certain outcomes over others, and in the presence of noise or stochasticity (Green & Swets, 1989), would prefer to make certain types of errors rather than others. For example, in reaching to grasp or touch an object, overshooting the intended target may be more costly than undershooting, such as when a child accidentally knocks over a glass of milk. In cases such as this, motor noise and error have strong implications for movement planning and execution (Harris & Wolpert, 1998; Knill, Bondada, & Chhabra, 2011; Liu & Todorov, 2007; Trommershäuser, Landy, & Maloney, 2006; Trommershäuser, Maloney, & Landy, 2003), as well as the coordination of eye movements and motor control (Battaglia & Schrater, 2007; Sims, Jacobs, & Knill, 2011). Even in the absence of external constraints, the metabolic expenditure of movement also imposes costs that influence planned movements (Donelan, Kram, & Kuo, 2002; Huang, Kram, & Ahmed, 2012). Experimenter-imposed loss functions have also been demonstrated to influence behavior in perceptual judgment tasks that do not involve motor control or planning (Landy & Mamassian, 2007; Whiteley & Sahani, 2008). 
This paper examines the implicit loss function of visual working memory. Like motor control, visual working memory is a system that is fundamentally subject to noise and error. If briefly shown a visual feature, such as an oriented line or color patch, the subsequent visual memory of the feature will be imprecise. Presumably, misremembering the orientation of a line segment by 6° is worse than a memory error of 3°. But how much worse? Different loss functions imply different relationships between physical error (the difference between the true feature and recalled feature), and the subjective cost or disutility of error. A fundamental result demonstrated in this paper is that when visual working memory is constrained in capacity, the form of the loss function has strong implications for the optimal distribution of memory errors. 
Given the importance of visual working memory in many natural tasks (Hollingworth, Richard, & Luck, 2008), there is good reason to believe that the shape of the error distribution in visual working memory is not arbitrary, but rather adaptive for some loss function, either through the course of evolution, development, or learning in the context of particular tasks. In parallel, in recent years there has been considerable interest in developing computational models that seek to predict the nature of errors in visual working memory. It has been demonstrated that a simple model of visual memory—one that assumes a Gaussian-like distribution of errors—offers a relatively poor account of human memory performance (Bays, 2014; Fougnie, Suchow, & Alvarez, 2012; van den Berg, Awh, & Ma, 2014; van den Berg, Shin, Chou, George, & Ma, 2012; Zhang & Luck, 2008). Recent computational models of visual working memory have sought to explain this property in terms of variability in memory precision (Fougnie et al., 2012; van den Berg et al., 2014; van den Berg et al., 2012), or as a consequence of limits in decoding from populations of spiking neurons (Bays, 2014). In the present paper, I present a complementary perspective: Visual memory errors may be structured in a manner that reduces the expected cost to the organism in behaviorally relevant tasks. This explanation is situated at the computational level of analysis (Marr, 1982), and provides an explanation for behavior in terms of the goals of the organism and the nature of the problem that is being solved. It is worth emphasizing at the outset that an explanation at the computational level does not necessarily contradict explanations at the algorithmic or mechanistic level; particular neural mechanisms or limitations might be adaptive for natural tasks. Hence, the current approach contributes an ecological interpretation to memory error in visual working memory, and emphasizes the task-directed nature of visual working memory (see also Hayhoe, Shrivastava, Mruczek, & Pelz, 2003). This enables existing models of visual working memory to be evaluated in terms of their inherent rationality and fitness for a task, in addition to their ability to fit empirical data. 
Concretely, the present paper makes several contributions to research in visual working memory. First, for a task that is sufficiently well-specified—meaning an explicit cost function can be stated—it is possible to derive the corresponding optimal visual working memory system for that task (described in Methods). This result follows from the application of a branch of information theory known as rate–distortion theory (Berger, 1971), which concerns the design of optimal, but capacity-limited information channels. 
Second, by adopting inverse decision theory (Körding, 2007), I show that it is possible to estimate the loss function of visual working memory based on empirical data collected in a standard visual working memory paradigm (Experiment 1). The empirically derived loss function is substantially different from that which has implicitly been assumed by many other models of visual working memory. 
Third, it is demonstrated that a biologically important task, identifying a previously seen object among distractors, naturally predicts a visual memory system that qualitatively resembles human visual memory performance (Experiment 2). The ability to locate a target object based on visual memory is a critical function of the visual system, and may serve as the basis for visual search (Woodman & Luck, 2004) as well as gaze correction following saccadic error (Hollingworth et al., 2008). The present paper demonstrates that an optimal memory system for this class of tasks naturally predicts error distributions that have a sharper peak, and heavier tails, compared to a Gaussian distribution. Both of these properties are observed in human visual memory error distributions (Bays, 2014; Fougnie et al., 2012; van den Berg et al., 2012; Zhang & Luck, 2008). Hence, the present paper contributes the first rational explanation for this phenomenon. 
Finally, I demonstrate that a simple model developed in the decision-theoretic framework offers a precise quantitative explanation for human performance across seven previously published datasets (Experiment 3). This model is shown to compare favorably to a large number of alternative models of visual working memory (van den Berg et al., 2014), while incorporating fewer mechanisms and assumptions. 
In the following section, I briefly review research on two factors known to influence memory error in visual working memory. These factors motivate an explanation of visual working memory in terms of information-theoretic and decision-theoretic constructs, including that of a loss function. 
Understanding error in visual working memory
Unlike a photograph, the contents of visual working memory do not reflect a veridical and complete representation of the visual world (Rensink, 2002). A primary goal of research in visual working memory is to understand how and why this memory system is limited, both at the neural and behavioral level. While the field remains a long way from this goal, considerable progress has been made in recent years (for a recent review, see Brady, Konkle, & Alvarez, 2011). In particular, a growing consensus in the literature indicates that visual working memory performance is determined by the interaction of at least two factors: the number of features that are concurrently stored (the set size effect), and the statistical complexity of visual information. 
Regarding the set size effect, numerous experiments on the recall of simple visual features—such as oriented line segments, Gabor patches, or color patches—have shown that there is a tradeoff between the number of items stored in visual working memory, and the precision of the memory representation. People can store more items in visual working memory, but with lower precision; alternatively, memory can store fewer items with greater precision (Bays & Husain, 2008; Palmer, 1990; van den Berg et al., 2012; Wilken & Ma, 2004; Zhang & Luck, 2008). Although there is much debate concerning the precise nature of this tradeoff (see Luck & Vogel, 2013; Ma, Husain, & Bays, 2014), the basic effect is well established. In many existing models of visual working memory, this change in precision is modeled by increasing the variance of memory noise. Van den Berg et al. (2014) compared 32 distinct models of visual working memory for features in a circular domain (such as line orientation, or color defined within a circular space). In all models considered, the memory estimate of a feature was assumed to be a sample from a von Mises distribution (a circular analog to the Gaussian distribution), and the key question was how the variance of this noise distribution changed with increasing set size. The theoretical justification for a von Mises distribution (vs. some other noise distribution) was not considered, though one class of models (known as the variable precision model, details discussed further below) was motivated by the observation that a von Mises distribution, without positing additional mechanisms, offers a poor fit to the empirical data. 
Regarding the effects of statistical complexity on memory error, the brain is able to exploit regularities in visual features in order to increase performance. For example, oak leaves all share a basic pattern: the angular orientation and size of the lobes are not entirely unpredictable, but tend to fall within a limited range. Similarly, certain colors are more likely for leaves than others (shades of red and green vs. purple and blue). Implicit or explicit knowledge of these statistical regularities aids visual working memory. In particular, introducing statistical correlations among different features increases memory performance (Brady, Konkle, & Alvarez, 2009), and increases or decreases in the variance of features also impacts the precision of visual memory representations (Sims, Jacobs, & Knill, 2012). Expertise within a domain (such as the ability to recognize birds or cars) also aids visual memory for objects taken from the domain of expertise (Curby, Glazek, & Gauthier, 2009). Alvarez and Cavanagh (2004) adopted a different metric for visual object complexity based on reaction time in a visual search task, but also found that complexity and memory performance are closely connected. All of these findings suggest a close link between statistical learning ability and visual working memory (Orhan, Sims, Jacobs, & Knill, 2014). 
Recently, it has been shown (Sims et al., 2012) that the mathematical framework of information theory (Shannon & Weaver, 1949) offers a parsimonious explanation for simultaneous effects of set size and statistical complexity in visual memory performance. According to this framework, visual working memory is a capacity-limited information storage system, where capacity can be quantified and measured in units of bits.1 Intuitively, increasing the number of items stored concurrently leaves less capacity available to code each item, and increasing the statistical complexity of visual features requires a greater capacity in order to maintain the same level of memory precision. 
A fundamental aspect of this theoretical framework is the hypothesis that human visual memory approximates an efficient information storage system. According to this hypothesis, visual working memory is limited in capacity, but yet simultaneously efficient, in the sense of making the most of its limited resources (Orhan et al., 2014). This same principle has been highly productive in sensory neuroscience, where it is forms the basis of the efficient-coding hypothesis (Barlow, 1961; Geisler, 2008; Simoncelli & Olshausen, 2001). In the present paper, I explore a corresponding efficient memory hypothesis. In particular, if visual working memory is efficient, then by definition it must be efficient according to some particular loss function. In previous work (Sims et al., 2012), the assumption was made that the brain attempts to minimize a quadratic loss function in visual working memory (minimizing the squared error between actual, and remembered visual features). This assumption simplified the mathematical development of the model, but was not motivated by any theoretical consideration. Thus, the empirical loss function remains an open question. 
In the next section, I formally define the connection between information-theoretic memory capacity, the statistics of visual features, and loss functions. I then demonstrate how these three factors define an optimally efficient visual working memory. 
Information-theoretic and decision-theoretic foundations of visual working memory
If x indicates a particular visual feature (such as a spatial position or orientation), and y indicates the recalled value for the feature, then a loss function is a mathematical function that assigns a cost to the outcome where x is remembered as y: ρ(x, y) → [0,∞). Simple choices for the loss function are linear, ρ(x, y) = |yz|, or quadratic functions, ρ(x, y) = (yz)2. Further, the present paper will restrict its attention to symmetric difference loss functions, such that ρ(x, y) = f(z), where z = |yx| is the absolute memory error.2 Note that the mathematical framework in general is not limited by this restriction, and examining asymmetric loss functions represents an interesting avenue for further exploration. 
With a particular loss function defined, the presumed goal of visual working memory is to minimize expected loss:  where p(x) describes the statistical distribution of visual features, and q(y|x) gives the conditional probability distribution for memory; that is, the probability of recalling a visual feature x as the value y. This approach closely follows a large body of previous work that has defined visual perception in the framework of Bayesian decision theory (Ma, 2012). In subsequent use, I will refer to the distribution as an information channel or memory channel.  
The expected loss is trivially minimized when there is no memory error (y = x for all stimuli x). However, when the feature x is continuously distributed, this goal is unachievable, even in principle (this is a fundamental result of information theory; Shannon & Weaver, 1949). Mathematically, if p(x) and q(y|x) define an information source and information channel, then the average amount of information transmitted by this channel is given by the mutual information,    
Intuitively, one can think of this quantity as the average amount of uncertainty that is reduced about the value x after observing the channel output. When the logarithm is taken base 2, this quantity is measured in units of bits. The maximum rate of information transmission, across all possible distributions p(x), defines the capacity of a channel, C. For a fixed information source, the channel can be measured via Equation 2 to transmit at an information rate RC
With these elements in place, an optimally efficient memory channel (labeled q*) is given by  where Lρ(q) and I(q) refer to the expected loss, and mutual information associated with the channel q(y|x), given by Equations 1 and 2, respectively. This equation states that an optimal memory channel is one that minimizes the expected loss according to a particular loss function, while subject to the constraint that the amount of memory that it can store or transmit is at or below a specified limit. This equation is also the basis for the mathematical field of rate–distortion theory (Berger, 1971), which concerns the design and analysis of optimal, but lossy information channels.  
The above equation defines an optimal memory channel. However, solving this equation is typically not straightforward. When the information source is continuously distributed, there exist analytical solutions only for special cases. Previous work (Sims et al., 2012) exploited one special case, applicable when the input distribution p(x) is Gaussian, and the loss function is assumed to be quadratic. However, a general solution is needed if the goal is to estimate the empirical loss function, rather than assume a particular function. 
Fortunately, the above equations can be solved tractably for the discrete case—that is, when p(x) and q(y|x) are discrete rather than continuous probability distributions. While any convex optimization algorithm will work in principle,3 algorithms have been constructed that are particularly efficient for this application. One elegant algorithm (Blahut, 1972) can be used to efficiently solve for the optimal memory channel for a given information source, loss function, and constraint on information rate. This approach is illustrated in Figure 1
Figure 1
 
Illustration of an efficient memory system as minimizing expected loss under a constraint on memory capacity. Rate–distortion theory (Berger, 1971) defines the minimum channel capacity necessary to achieve a desired level of performance. This is illustrated by a rate–distortion curve, shown in red. No physical system can exist in the region below this curve. An optimally efficient memory system is one that minimizes expected loss, subject to a constraint on memory capacity (shown by the horizontal line).
Figure 1
 
Illustration of an efficient memory system as minimizing expected loss under a constraint on memory capacity. Rate–distortion theory (Berger, 1971) defines the minimum channel capacity necessary to achieve a desired level of performance. This is illustrated by a rate–distortion curve, shown in red. No physical system can exist in the region below this curve. An optimally efficient memory system is one that minimizes expected loss, subject to a constraint on memory capacity (shown by the horizontal line).
A fundamental property of any information channel is its rate–distortion function, illustrated by the red curve in Figure 1. This function defines the minimum information rate necessary to achieve a desired level of memory error (or distortion) according to a specified loss function. Decreasing expected loss (moving to the left along the x-axis) requires a corresponding increase in the rate of information transmission by the channel, as illustrated by the curve. If an upper bound is placed on channel capacity (illustrated by the horizontal line and shaded region), then an optimally efficient memory system is defined by the intersection of these two lines, illustrated by the plot marker in Figure 1. Blahut (1972, figure 3) derived an iterative algorithm for computing the rate–distortion curve for arbitrary discrete loss functions. This algorithm can be used to search for the point along the rate–distortion curve that satisfies Equation 3. In order to apply this algorithm to typical visual working memory experiments, it is only necessary to discretize the stimuli and responses with a suitably small bin size. 
To illustrate this approach, I defined a discrete uniform distribution over the interval [0,2π] using 180 equal-sized bins. I then constructed optimal memory channels for storing samples from this distribution, where optimality was defined according to four different loss functions: an inverted cosine, ρ(z) ∼ (1 − cos[z]); linear, ρ(z) ∼ z; a step function, Display FormulaImage not available , and a quadratic function, ρ(z) ∼ z2. Note that only relative cost matters, so that the predictions are invariant to multiplying the loss function by a constant. Each loss function was therefore normalized to the range [0,1]. I then computed the optimal memory channel for each of these loss functions, assuming two different constraints on memory capacity, either 1 or 3 bits. The results of this analysis are shown in Figure 2, which plots each loss function (left column), along with the predicted memory error distribution (the probability distribution for the quantity yx).  
Figure 2
 
Comparison of four different loss functions (left column) and the resulting predictions for the optimal distribution of memory errors (right column). From top to bottom, the loss functions are an inverted cosine, linear, step, and quadratic functions. Each panel on the right shows the predicted behavior of an optimally efficient memory system, assuming a capacity constraint of either 1 or 3 bits.
Figure 2
 
Comparison of four different loss functions (left column) and the resulting predictions for the optimal distribution of memory errors (right column). From top to bottom, the loss functions are an inverted cosine, linear, step, and quadratic functions. Each panel on the right shows the predicted behavior of an optimally efficient memory system, assuming a capacity constraint of either 1 or 3 bits.
As can be seen in the figure, when visual working memory is limited in capacity, the particular choice of loss function has strong implications for the shape of the memory error distribution. Each of the distributions in Figure 2 represents optimal performance, but they differ in the loss function for which they are optimal. A useful (but inexact) analogy is that of squeezing a balloon. Changing the loss function allows one to decrease the probability of certain errors, at the expense of increasing the probability of others. The memory capacity determines the maximum achievable performance of the model, but the loss function determines the particular pattern of errors that minimize costs. Although the inverted cosine (top row) and quadratic loss functions (bottom row) yield similar memory error distributions, they are subtly different: a cosine loss function leads to a sharper peak, and heavier tails than a quadratic loss function. The linear loss function (second row) yields an optimal memory distribution that exhibits even heavier tails. These differences in predicted behavior are potentially informative, as previously it has been noted (Fougnie et al., 2012; van den Berg et al., 2014; van den Berg et al., 2012) that the empirical error distribution in visual working memory is also more sharply peaked than the predictions of most existing models of visual working memory. 
Estimating empirical loss functions via inverse decision theory
So far it has been shown that a given loss function, probability distribution over visual features, and constraint on memory capacity jointly define an optimally efficient distribution of memory errors. However, in some tasks the loss function may be undefined, such as in the delayed estimation paradigm (Zhang & Luck, 2008). Alternatively, the empirical loss function may deviate from that specified by the task. The latter possibility is especially likely in cases where laboratory tasks have limited relevance to natural tasks. Rather than assuming a particular loss function, one can infer or estimate an empirical loss function along with memory capacity. This is achieved by starting with the empirical observation of a pattern of memory errors, and assuming that the observed behavior is optimally efficient for some loss function. This approach, known as inverse decision theory, has previously been used to characterize the loss function of sensorimotor learning (Körding & Wolpert, 2004). 
The likelihood of a given dataset, under a particular loss function ρ and constraint on memory capacity R, is given by the optimal memory channel, q*(y | x; ρ,R). By searching through the space of possible loss functions, one can determine the function that maximizes the likelihood of the observed data. 
One complication in applying this approach is that that the distribution q* is defined over discrete values. If the available data are continuous, a probability density can be obtained by using a piecewise uniform distribution with density    
The bracket notation [·] indicates a binned version of the data, and w indicates the bin width. With a likelihood function defined, it is possible to recover the loss function by maximum likelihood estimation or Bayesian inference. To reduce the complexity of the inference process, one can specify a parameterized family of loss functions, ρ(z;θ⃗), where θ⃗ are the parameters. An ideal candidate should be flexible (able to capture a wide range of different loss functions), while having a small number of parameters (to facilitate the inference process). In this paper, I adopted the following parametric family of loss functions:    
This function is a reparameterization of that proposed by Gonzalez and Wu (1999) to flexibly model distortions in human probability weighting. The parameter μ determines the error magnitude z at which the loss reaches half of the maximum value, while β controls the steepness of the function around the point μ. Figure 3 illustrates a number of different loss functions constructed from this family by varying the parameters μ and β. By varying the parameters, it is also possible to exactly meet or closely approximate each of the loss functions shown in Figure 2. With a parametric loss function defined, it is straightforward to estimate the parameters of this function, along with memory capacity, via maximum likelihood estimation. Appendix A reports a parameter recovery analysis, in which artificial datasets are generated, and the model-fitting procedure is examined to determine how well it is able to recover the parameters used to generate the data. 
Figure 3
 
A flexible parametric family of loss functions, after the function used by Gonzalez and Wu (1999). The left panel fixes the parameter Image not available while varying Image not available from 0.5 to 5. The right panel fixes Image not available while varying Image not available from 0.1 to Image not available .
Figure 3
 
A flexible parametric family of loss functions, after the function used by Gonzalez and Wu (1999). The left panel fixes the parameter Image not available while varying Image not available from 0.5 to 5. The right panel fixes Image not available while varying Image not available from 0.1 to Image not available .
Experiment 1: Measuring the empirical loss function in visual working memory
An experiment was conducted to examine visual working memory error for two different visual feature dimensions: color and orientation. The goal of the experiment was to apply the formal tools of information theory and inverse decision theory to recover the empirical loss function of visual working memory. 
The experiment closely followed the methodology of a previous study (Fougnie, Asplund, & Marois, 2010; experiment 1). In their experiment, Fougnie et al. examined how the precision of visual working memory varied depending on whether subjects were asked to remember the color, orientation, or both features of visual objects. It was found that when participants were asked to store both color and orientation of an object, memory precision was impaired relative to the case where only a single feature dimension was stored. From the perspective of the theoretical framework developed here, these results are interesting for a number of reasons. First, what is the loss function of visual working memory for a particular visual feature? To date, this question has not been examined. By looking at performance in a visual memory task that involves multiple visual features, it is also possible to examine whether the brain adopts distinct loss functions for color versus orientation. Additionally, the conjunction condition (when subjects are asked to remember both color and orientation) poses an interesting question. Are differences in memory precision due to a change in capacity, change in loss function (the relative weight given to different types of memory errors), or some combination of both? Intuitively, one might expect that attending to two visual features would leave less memory capacity available to store each one. However, predictions for possible changes in loss function are less clear-cut. Does changing the number of attended features influence the relative importance of committing different magnitudes of memory error? 
Methods
Participants
Twelve undergraduate students at the University of Rochester participated in the experiment in exchange for monetary compensation. All were naïve to the purpose of the study. 
Apparatus and stimuli
Participants were seated 40 cm from a 20-inch monitor with resolution set to 1,280 × 1,024 pixels. A chin rest was used to maintain stable head position. Participants were eye-tracked (EyeLink II; SR Research, Mississauga, ON, Canada) to ensure that no eye movements were made during stimulus presentation; saccades were detected online and trials with eye movements were repeated with new stimuli. 
The stimuli consisted of two isosceles triangles (height = 1 cm, vertex angle = 30°), presented at an eccentricity of 6° visual angle, and 45° elevation above the fixation point (Figure 4). The orientation of each triangle was sampled independently from a circular uniform distribution. The color for each triangle was sampled from a circular uniform distribution defined in the CIE L*a*b color space, centered at (L = 54, a = 18, b = −8) with constant luminance, and radius (in the a–b plane) = 59. 
Figure 4
 
(a) Stimuli consisted of two colored isosceles triangles presented at an eccentricity of 6° of visual angle from a fixation point. Stimuli were displayed for 1 s, followed by a blank retention interval. (b) During color probe trials, a color wheel was displayed at the former location of one of the triangles. Participants indicated their response by using the mouse to adjust the orientation of the white tick mark to match their memory for the triangle's color. (c) During orientation probe trials, a black ring was used to probe for orientation memory.
Figure 4
 
(a) Stimuli consisted of two colored isosceles triangles presented at an eccentricity of 6° of visual angle from a fixation point. Stimuli were displayed for 1 s, followed by a blank retention interval. (b) During color probe trials, a color wheel was displayed at the former location of one of the triangles. Participants indicated their response by using the mouse to adjust the orientation of the white tick mark to match their memory for the triangle's color. (c) During orientation probe trials, a black ring was used to probe for orientation memory.
Procedure
Trials in the experiment consisted of three types of blocks: color memory, orientation memory, and conjunction blocks. For the color and orientation blocks, participants were instructed to remember only the color or orientation of the triangles. Accordingly, participants were tested only on that feature. In conjunction blocks, participants were told to remember both the color and orientation of each triangle, and were randomly tested on either color or orientation. Each block consisted of 150 trials. Participants completed two experimental sessions, on separate days. Each session consisted of one orientation block, one color block, and two conjunction blocks. The order of the blocks within each session was randomized. 
On each trial, two triangles were displayed for 1 s. After stimulus presentation, a blank screen was displayed for 500 ms, followed by a response wheel (Figure 1b and c). The response wheel was centered on the location of one of the two previously displayed triangles. Participants were instructed to use the computer mouse to adjust a white tick mark to match their memory for either the color or orientation of the triangle that was previously displayed at that location. 
Results and discussion
The primary behavioral measure from this experiment is memory error, defined as the angular difference between the true feature value for a probed item, and the recalled value. Figure 5a plots the histogram of memory errors, pooled across all subjects, for each feature dimension (color vs. orientation) and in each memory load condition (single feature vs. conjunction). I also computed the circular variance and circular kurtosis (Fisher, 1995) of the error distribution for each subject. Changes in these summary statistics were tested by means of a 2 (feature) Display FormulaImage not available 2 (memory load) ANOVA. Mean values for these summary statistics are shown in Figure 6.  
Figure 5
 
(a) Histogram of memory recall errors, pooled across all subjects. The red and blue distributions correspond to the single feature and conjunction conditions of the experiment, respectively. (b) Difference in frequency of error between the single feature and conjunction conditions. Difference histograms were obtained by subtracting (conjunction – single feature) histograms in (a).
Figure 5
 
(a) Histogram of memory recall errors, pooled across all subjects. The red and blue distributions correspond to the single feature and conjunction conditions of the experiment, respectively. (b) Difference in frequency of error between the single feature and conjunction conditions. Difference histograms were obtained by subtracting (conjunction – single feature) histograms in (a).
Figure 6
 
Summary statistics computed from the memory error distribution for each subject in each condition. Left panel: circular variance. Right panel: circular kurtosis. The plot markers indicate mean values, averaged across subjects; error bars indicate 95% confidence intervals. The red lines are based on simulated data generated from the best-fitting model.
Figure 6
 
Summary statistics computed from the memory error distribution for each subject in each condition. Left panel: circular variance. Right panel: circular kurtosis. The plot markers indicate mean values, averaged across subjects; error bars indicate 95% confidence intervals. The red lines are based on simulated data generated from the best-fitting model.
Figures 5 and 6 illustrate that the shape of the error distribution differs substantially between color and orientation, with the error distribution for orientation more strongly peaked (higher kurtosis) than color, F(1, 11) = 80.31, p < 0.001, while also having lower variance, F(1, 11) = 51.31, p < 0.001. In addition, there were subtle differences between the error distributions for the single feature and conjunction conditions. To illustrate these differences, Figure 5b plots the difference in frequency of error between the single feature and conjunction conditions (conjunction – single feature). Compared to the single feature condition, the conjunction condition exhibits a relative decrease in small errors, with a corresponding increase in the “shoulders” of the error distribution. In terms of summary statistics (Figure 6), these changes correspond to both an increase in variance, F(1, 11) = 12.83, p < 0.001, and a decrease in kurtosis in the conjunction conditions, F(1, 11) = 38.71, p < 0.001, compared to the single feature conditions. 
What factors can explain these changes in memory performance? In particular, are the differences due to changes in memory capacity across conditions, changes in the loss function, or some combination of both? To answer these questions, I fit variants of the model described by Equations 4 and 5 to the data from each participant, estimating the parameters by maximum likelihood. The data were discretized into 1,000 equal-width bins in order to apply the decision-theoretic framework. To investigate how model parameters did or did not vary across conditions, four different model factors were considered: 
  •  
    Factor A: Is memory capacity constant across features (color vs. orientation)?
  •  
    Factor B: Is memory capacity constant across load conditions (single vs. conjunction)?
  •  
    Factor C: Is the loss function constant across features?
  •  
    Factor D: Is the loss function constant across load conditions?
Each binary factor distinguishes two different models, and the combination of all possible factors leads to a space of 16 distinct models. The models were evaluated based on their AIC score (Akaike, 1974), a measure of goodness-of-fit that penalizes models with more free parameters:    
In the above equation, n is the number of parameters in the model, and log(L) is the maximum log-likelihood value for the model. The model with the lowest relative AIC score is the preferred explanation for the data, and differences in AIC score can be interpreted in terms of the relative strength of alternative explanations. Burnham and Anderson (2004) suggest as a rough guideline that models with a difference in AIC value (ΔAIC) ≤ 2 have substantial support or evidence, models with 4 ≤ ΔAIC ≤ 7 have limited support, and models with ΔAIC ≥ 10 have “essentially no support” compared to the preferred model. As an additional check on the robustness and consistency of the model comparison, the 16 models were also evaluated, using two-fold cross validation, by successively fitting each model to one half of the data via maximum likelihood and examining the log-likelihood of the held-out data. The results of these analyses are provided in Table 1 and illustrated in Figure 7
Figure 7
 
(a) AIC scores for each model relative to the best model (model 2). Scores are averaged across participants. (b) The average AIC score for each model factor, computed based on the mean AIC score for all models with a given factor equal to “Yes,” minus the mean AIC score across all models where the same factor is “No.” In both panels, lower values indicate a preferred model.
Figure 7
 
(a) AIC scores for each model relative to the best model (model 2). Scores are averaged across participants. (b) The average AIC score for each model factor, computed based on the mean AIC score for all models with a given factor equal to “Yes,” minus the mean AIC score across all models where the same factor is “No.” In both panels, lower values indicate a preferred model.
Table 1
 
Comparison of 16 models, varying across four binary factors. The models are evaluated based on the Akaike information criterion (AIC) score relative to the best model, as well as two-fold cross validation (CV) of maximum likelihood. For AIC, smaller values indicate a preferred model. For cross validation, higher values indicate a favored model (higher log-likelihood of the validation data).
Table 1
 
Comparison of 16 models, varying across four binary factors. The models are evaluated based on the Akaike information criterion (AIC) score relative to the best model, as well as two-fold cross validation (CV) of maximum likelihood. For AIC, smaller values indicate a preferred model. For cross validation, higher values indicate a favored model (higher log-likelihood of the validation data).
Model Factor Number of model parameters ΔAIC CV log-likelihood Model rank (AIC) Model rank (CV)
A B C D
1 N N N N 12 2.74 −848.69 2 2
2 N N N Y 8 0.00 −846.12 1 1
3 N N Y N 8 9.31 −850.31 4 5
4 N N Y Y 6 7.75 −849.39 3 3
5 N Y N N 10 16.13 −850.72 6 6
6 N Y N Y 6 13.29 −849.52 5 4
7 N Y Y N 6 23.36 −854.10 8 8
8 N Y Y Y 4 21.78 −853.17 7 7
9 Y N N N 10 76.06 −882.39 10 10
10 Y N N Y 6 73.49 −881.12 9 9
11 Y N Y N 6 88.37 −887.72 14 14
12 Y N Y Y 4 86.72 −886.66 12 12
13 Y Y N N 9 87.22 −887.30 13 13
14 Y Y N Y 5 85.26 −885.89 11 11
15 Y Y Y N 5 100.39 −892.88 16 16
16 Y Y Y Y 3 98.79 −891.84 15 15
The results show that model 2 offers the strongest explanation for the data according to both AIC score as well as cross-validation. According to this model: (a) memory capacity is different for different visual features, (b) memory capacity differs with feature load, (c) the loss function differs for separate visual features, but (d) the loss function remains the same for each feature regardless of memory load. It is not surprising that memory capacity differs with memory load (see Sims et al., 2012). The finding that memory capacity differs for color and orientation is not predicted by information theory, but neither is this result explainable by alternative models of visual working memory. Allred and Flombaum (2014) have recently argued that studies of color memory may overlook or conflate important effects in color perception, such as the assumption of a perceptually uniform color space. These factors may contribute to the finding of lower effective capacity for color than orientation. 
The mean parameters estimated according to the best model are reported in Table 2. Although the evidence favoring this model is modest (ΔAIC relative to the next best model = −2.74), it is worth noting that the assumptions behind the model are also somewhat extreme: namely, that the loss function is exactly identical across conditions, ruling out even unsystematic variation. A conservative interpretation of the present results is that they demonstrate a failure to reject the null hypothesis that the loss function is invariant to changes in memory load. 
Table 2
 
Mean parameter estimates (averaged across subjects) for each model parameter in each condition of Experiment 1, according to the best-fitting model (model 2). Values in parentheses indicate standard deviations.
Table 2
 
Mean parameter estimates (averaged across subjects) for each model parameter in each condition of Experiment 1, according to the best-fitting model (model 2). Values in parentheses indicate standard deviations.
Memory load Visual feature R, bits μ β
Single feature Color 1.42 (0.27) 0.85 (0.12) 1.79 (0.22)
Orientation 2.20 (0.63) 0.68 (0.09) 1.98 (0.30)
Conjunction Color 1.13 (0.38) 0.85 (0.12) 1.79 (0.22)
Orientation 1.86 (0.66) 0.68 (0.09) 1.98 (0.30)
A two-way repeated measures ANOVA (visual feature × load condition) was performed to compare the estimated parameters from model 2. Memory capacity was significantly lower for color than orientation, F(1, 11) = 40.65, p < 0.001, and lower for the conjunction condition compared to the single feature condition, F(1, 11) = 40.65, p < 0.001. Memory capacity decreased by 20% for color and 15% for orientation in the conjunction condition compared to the single feature condition. A simple model of visual memory, according to which a single memory capacity is evenly shared across all encoded objects, might predict that capacity should decrease by half in the conjunction condition (as the number of attended features is doubled). However, this prediction is complicated by several factors. First, perceptual noise and response noise contribute to response variability, but these are independent of memory load. Hence, capacity estimates in the single feature condition may underestimate total memory capacity. In addition, it is possible that subjects sometimes encoded both the color and orientation of the objects in the single feature condition. For both reasons, the total capacity of visual memory is likely higher than observed in the single feature condition. The obtained results do, however, rule out strict independence between visual working memory for distinct features. Encoding both the color and orientation of a visual feature decreases the memory precision with which either can be recalled. 
The estimated loss functions for each subject are shown in Figure 8. The functions are highly similar across subjects for each feature, but differ subtly between features. In particular, the loss function is steeper for orientation than for color, as confirmed by ANOVA on the estimated parameters of the loss function: the parameter μ significantly differed between color and orientation, F(1, 11) = 17.23, p < 0.002, while there were no significant differences in the β parameter. 
Figure 8
 
Inferred loss functions for each participant. (a) Loss functions estimated for color. (b) Loss functions estimated for orientation. The black curves are the estimated loss function for each subject. An inverted cosine loss function is shown in red for comparison.
Figure 8
 
Inferred loss functions for each participant. (a) Loss functions estimated for color. (b) Loss functions estimated for orientation. The black curves are the estimated loss function for each subject. An inverted cosine loss function is shown in red for comparison.
A separate analysis of the parameters from model 1 (which allowed the loss function to differ between load conditions as well as features) found that the estimated loss function did not significantly differ between the single feature and conjunction conditions, F(1, 11) = 0.11, p = 0.074, ns]. Hence, model comparison based on AIC scores, and comparison of the parameters in the unconstrained model lead to the same conclusion. 
Figure 8 also plots an inverted cosine loss function for visual comparison (red curves). A cosine loss function is an important benchmark for comparison, as it can be shown (Harremoës, 2010) that this loss function predicts a von Mises error distribution. In other words, if memory were optimally designed for minimizing a cosine loss function, then it should exhibit von Mises distributed error. Numerous models of visual working memory assume a von Mises error distribution in memory for circular features, or assume a more complex model that builds on this basic assumption (e.g., Anderson, Vogel, & Awh, 2011; Bays, Catalao, & Husain, 2009; Fougnie et al., 2012; van den Berg et al., 2014; van den Berg et al., 2012; Zhang & Luck, 2008). The discrepancy illustrated in Figure 8 questions the appropriateness of this assumption. The empirical loss function of visual working memory is steeper for small errors, and saturates for large errors. 
As a further means of assessing the quantitative fit of the model, the maximum likelihood parameter estimates were used to generate a simulated dataset. Summary statistics (circular variance and kurtosis) were then computed for the simulated data. These values are compared to the empirical data in Figure 6 (red lines). The near-perfect overlap illustrates that the model is able to closely capture the shape of the empirical error distribution. 
Fougnie et al. (2010) found that attending to, and remembering an object's color and orientation led to a decrease in memory precision compared to the case of remembering only color or orientation. Intuitively, attending to multiple features simultaneously might require sharing a single, fixed memory capacity across each feature. A strong form of this hypothesis would predict that estimated capacity for a given feature should decrease by half as the number of encoded features doubles. However, other possibilities are also conceivable. One such possibility is that attending to multiple features influences the loss function for each feature, but has no effect on memory capacity. The results obtained in the experiment argue against this interpretation. Storing multiple features in visual working memory significantly reduced the capacity with which each was encoded, but had no detectable influence on the shape of the loss function. The observed changes in capacity also argue against previous studies that have claimed that different feature dimensions do not compete for storage in visual working memory (e.g., Luck & Vogel, 1997; Olson & Jiang, 2002). The most likely explanation for this discrepancy is that the current study (along with that by Fougnie et al., 2010) examined memory precision, as opposed to relatively coarse change detection accuracy. 
Experiment 2: From natural tasks to optimal memory systems
The delayed estimation task employed in Experiment 1 is a widely used paradigm in visual working memory research since it enables the experimenter to obtain a continuous measure of memory variability. Although both useful and widely used as a laboratory paradigm, the delayed estimation task has limited ecological validity. In particular, subjects may be instructed to remember visual features “as closely as possible,” but the task does not define the consequences of misremembering. In other words, delayed estimation is an ill-defined task for visual memory. A related point is that, in this paradigm, the contents of visual memory are not used to support or carry out any particular task. Unlike this situation, in most natural tasks visual working memory is used to support ongoing behavior and achieve behaviorally relevant goals (Brouwer & Knill, 2007, 2009; Hayhoe et al., 2003). In natural tasks, memory error translates into failure or difficulty in achieving one's goals, defining a natural cost for misremembering. 
Hollingworth et al. (2008) examined one task in which visual working memory can be shown to play a critical role: gaze correction following saccadic error. In both laboratory and natural tasks, human eye movements are frequently inaccurate, and may miss the intended target on 30% to 40% of saccades (Becker, 1991). When this occurs, the visual system rapidly and automatically generates a corrective saccade. In order to identify where to direct the corrective saccade, the visual system must rapidly compare current visual input with a memory-based representation of the original intended saccade target (Hollingworth et al., 2008). This is a particularly important role for visual working memory, as over the course of a single day the visual system will need to generate thousands of corrective saccades. How should the visual memory for the saccade target be encoded in order to maximize accuracy in this important task? 
Methods
Here I consider a highly simplified version of the gaze correction problem, where objects are defined by a unitary circular-distributed feature, such as angular orientation or color sampled from a color wheel. Following an inaccurate saccade, visual gaze lands between two objects: one is the intended saccade target, and the other is a distractor. This situation is illustrated schematically in Figure 9. Note that this is also closely analogous to the experimental paradigm employed by Hollingworth et al. (2008). The primary difference is that by using a gaze-contingent display, Hollingworth et al. were able to experimentally induce saccade error. 
Figure 9
 
Schematic illustration of the challenge facing the visual system following an inaccurate saccade. Visual working memory representations of the intended target can be used to determine where to direct a corrective saccade (Hollingworth et al., 2008).
Figure 9
 
Schematic illustration of the challenge facing the visual system following an inaccurate saccade. Visual working memory representations of the intended target can be used to determine where to direct a corrective saccade (Hollingworth et al., 2008).
The task for the visual system is to determine which of the two objects was the intended target. A corrective saccade will be initiated towards this target. If the goal of the visual system is to accurately direct eye movements, then the cost function for this task should be based around minimizing the probability of misidentification. Note that a similar mechanism could also support a wide range of other natural visual tasks, such as visual search or visually comparing objects when they must be fixated sequentially. 
More formally, let x indicate the true feature value for the target, and y refer to the memory representation of x. The feature value for the distractor is indicated by xd. Under these circumstances, an identification error will occur when the angular difference between y and xd is less than the difference between y and x. If the orientation of the distractor is independent of the orientation of the target, the cost function (i.e., the probability of error) for this task can be explicitly derived:    
If the distribution of distractors is uniform, this integral results in the linear cost function, ρ(z) = z/π. This states that the probability of making an identification error increases linearly with the magnitude of memory error (given by z). The corresponding optimal visual memory channel for minimizing this cost function is shown in the second row, right column of Figure 2. As previously noted, this distribution exhibits a sharper peak, and heavier tails than a von Mises distribution matched in variance—a property that is in qualitative agreement with human visual working memory performance (Bays, 2014; van den Berg et al., 2012). Equation 7 yields the task-defined loss function when there is a single distractor item. This loss function can be extended in a straightforward manner to handle the case when there are multiple distractors. Appendix B provides analytical expressions for the relevant loss functions for target identification with up to four distractors. 
Results and discussion
Figure 10 illustrates the task-defined loss function for one through four distractors (left panel), along with the optimally efficient memory channel for each loss function (right panel) assuming a capacity limit of 1 bit. The qualitative features of each error distribution are largely the same, showing a predicted error distribution that is sharper and heavier-tailed compared to a von Mises or Gaussian distribution of equivalent variance. This qualitative pattern is entirely independent of free parameters, and is instead a consequence of the properties of the natural task. As the number of distractors increases, the optimal error distribution becomes more sharply peaked, with a corresponding increase in the probability of exhibiting large errors. 
Figure 10
 
(a) Task-defined loss functions for an object reidentification task with varying numbers of distractor items (one through four distractors shown as separate curves). (b) Corresponding optimally efficient memory channels for each loss function, assuming a capacity limit of 1 bit.
Figure 10
 
(a) Task-defined loss functions for an object reidentification task with varying numbers of distractor items (one through four distractors shown as separate curves). (b) Corresponding optimally efficient memory channels for each loss function, assuming a capacity limit of 1 bit.
Note that this analysis simplifies the problem facing the visual system, by excluding the role of perceptual noise and response noise. These factors could be incorporated by convolving the memory error distribution with other noise sources. In addition, the present analysis of the gaze correction problem does not uniquely predict the optimal distribution of memory errors in visual working memory, since the precise distribution of errors depends on memory capacity, the number of possible distractors, as well as the statistical distribution of distractors. A more complex analysis might set about deriving a loss function based on direct measurement of properties from natural scenes (Geisler, 2008) and properties of saccadic error in natural tasks. 
As it stands, an optimal visual memory system for gaze correction is a reasonable model of human visual working memory performance in standard laboratory delayed estimation tasks. The empirical loss functions previously estimated (Figure 8) are qualitatively similar to those derived in Figure 10: the cost of memory error rises steeply for small errors (and more steeply than predicted by a cosine loss function) and is negatively accelerating for larger memory errors. This result is the first to provide an adaptive explanation for the shape and distribution of memory errors in delayed estimation tasks. 
The four loss functions shown in Figure 10 were also evaluated based on their ability to quantitatively fit the data from Experiment 1. A separate capacity parameter was fit for each visual feature (color vs. orientation) and load condition (single feature vs. conjunction), resulting in four free parameters. As before, the models were fit to the data from each subject separately, and evaluated based on their relative mean AIC score. The ΔAIC values for the loss functions assuming one through four distractors, respectively, were 116.82, 46.14, 65.45, and 127.17, with scores relative to the best model reported in Table 1. Hence, the best-fitting loss function assumes that visual working memory is most efficient at discriminating a target from two distractors (green curves in Figure 10). Based on the AIC values, the theoretically derived loss functions do not explain human performance as well as the empirically measured loss functions in Figure 8
Critically, however, all four loss functions explain the empirical data substantially better than simply fitting a von Mises distribution to the error distribution for each memory load and visual feature condition. This simple von Mises model (with four free parameters; the precision of the von Mises distribution for each condition) results in a value of ΔAIC = 281.75. In other words, a von Mises distribution would be rejected relative to a model based on any of the loss functions shown in Figure 10
The present analysis assumes that the visual system has no other cues for saccade target identification, such as the distance of the fovea from each item. Hollingworth et al. (2008) found that spatial cues did not aid gaze correction for simple unitary visual features, but did improve the accuracy of corrective saccades when more natural (complex) visual features were examined. The goal of the present analysis is not to provide a complete model of the mechanisms involved in gaze correction. Rather, this analysis illustrates more generally that by taking into account the costs imposed by memory error in natural tasks, it is possible to derive adaptive explanations for the nature of the error distribution in visual working memory. 
Experiment 3: Comparison to alternative models of delayed estimation
In recent years, a number of different models and explanations have been proposed to account for the pattern of memory error and variability observed in delayed estimation tasks (see recent reviews in Luck & Vogel, 2013; Ma, Husain, & Bays, 2014). A key issue in this debate is how and why memory variability increases with increasing memory load—a behavioral phenomenon known as the set size effect. Information theory naturally predicts this qualitative pattern, as illustrated in Figure 2. In fact, the only assumption necessary to explain a decrease in memory precision is that, as more visual features are encoded in memory, less capacity is available to encode each one. An information-theoretic model has previously been shown to offer a close quantitative fit to human performance (Sims et al., 2012). However, this previous work assumed a particular loss function (minimizing the squared error in memory). In addition, several alternative explanations for the set size effect have also been proposed. 
Recently, Bays (2014) proposed a model of visual working memory that explains memory error as a consequence of noise in populations of spiking neurons. In this model, independent populations of visual memory neurons encode each stimulus item, and divisive normalization (Carandini & Heeger, 2012) accounts for the decrease in memory precision with increasing set size. This model not only explains the change in memory variance with set size, but also offers a parsimonious explanation for the deviation of the empirical memory error distribution from a simple von Mises distribution. 
Fougnie et al. (2012), van den Berg et al. (2012), and van den Berg et al. (2014) demonstrated that an alternative mechanism is also capable of explaining these properties. According to the variable precision model (van den Berg et al. 2012), visual working memory is a doubly stochastic process. Each memory item is recalled as a sample from a von Mises distribution, but the precision of this recall distribution is itself a stochastic variable (modeled as a gamma distribution). This variability in encoding precision leads to a memory error distribution that deviates from a von Mises distribution, even while individual items are von Mises distributed. 
Van den Berg et al. (2014) conducted a systematic evaluation of 32 distinct models of visual working memory, using data from ten different experiments. Their variable precision model, combined with Poisson variability in the number of items encoded on each trial, was found to offer the best explanation for the combined results of ten different delayed estimation experiments. The full details of this model, termed the VP-P model, can be found in van den Berg et al. (2014). A minor extension to this model, labeled the VP-P-NT model, allowed for the possibility of spatial binding errors (i.e., reporting a feature value for the wrong item) and was found to slightly improve the fit of the model. However, the present paper focuses on the simpler version of the model (VP-P rather than VP-P-NT), since spatial binding errors were found to be infrequent and did not substantively alter the model comparison results. The most critical factor in explaining human performance was the assumption of variability in memory precision. Summarizing their model comparisons, van den Berg et al. (2014) wrote 
“Although our results strongly support the notion of variability in mnemonic precision, they do not address the origins of this variability. Many sources are conceivable: fluctuations in attention over trials (Cohen & Maunsell, 2010; Nienborg & Cumming, 2009), fluctuations in attention over space (Lara & Wallis, 2012), differences in precision across stimulus values (Bae, Wilson, & Flombaum, 2013; Girshick, Landy, & Simoncelli, 2011), and variability in memory decay rates (Fougnie, Suchow, & Alvarez, 2012b). It is likely that multiple factors contribute, and distinguishing them will be challenging.” (p. 142) 
The goal of the present analysis is not to challenge this conclusion, but rather introduce the consideration of another means of understanding memory variability: as an ecological or rational adaptation to the costs of memory error. Error is unavoidable for capacity-limited systems, but the nature of this error may be structured in a manner that reduces the cost to the organism. In order to demonstrate the sufficiency of this explanation, it is necessary to show that a model constructed around minimizing expected loss is also capable of explaining human memory performance. 
Methods
The ten reference datasets employed by van den Berg et al. (2014) have been made publicly available.4 Each dataset consists of the results from a standard delayed estimation visual memory task, with set size varying between one and eight items. Here, I examine how a decision-theoretic model compares to the VP-P model. I focus on seven of the 10 available datasets, listed in the left column of Table 3. Five of the experiments examine visual memory for color values uniformly sampled from a color wheel, while two of the experiments examine visual memory for orientation. The remaining three datasets analyzed by van den Berg et al. (2014) are not considered in the present paper since they employ visual features distributed in the range 0°–180° rather than 0°–360°; fitting these datasets is straightforward but requires specifying a modified family of loss functions. The seven experiments differ by numerous factors, such as the visual eccentricity and stimulus presentation time; complete methodological details can be found in the references listed in Table 3
Table 3
 
Experimental datasets used for comparing the decision-theoretic (DT) model to the variable precision model (VP-P; van den Berg et al. 2014). Models are compared based on relative AIC values.
Table 3
 
Experimental datasets used for comparing the decision-theoretic (DT) model to the variable precision model (VP-P; van den Berg et al. 2014). Models are compared based on relative AIC values.
Reference Feature Set sizes Model ΔAIC
DT VP-P
Wilken & Ma, 2004 Color 1, 2, 4, 8 0.00 +14.28
Zhang & Luck, 2008 Color 1, 2, 3, 6 0.00 +5.04
Bays, Catalao, & Husain, 2009 Color 1, 2, 4, 6 0.00 +0.56
Anderson, Vogel, & Awh, 2011 Orientation 1–4, 6, 8 +1.13 0.00
Anderson & Awh, 2012 Orientation 1–4, 6, 8 0.00 +0.24
van den Berg et al., 2012 Color (scrolling) 1–8 0.00 +5.75
van den Berg et al., 2012 Color (wheel) 1–8 0.00 +6.45
 Mean ΔAIC (all experiments) 0.0 +4.45
The decision-theoretic model applied to these datasets is identical to the best-fitting model reported in Experiment 1. In particular, it is assumed that the memory capacity available to encode an item potentially varies with set size, while the loss function is assumed to remain fixed. This results in a model with K + 2 free parameters, where K indicates the number of set size conditions, and the remaining two parameters (μ and β) characterize the loss function. Notably, the decision-theoretic model does not assume any variability in model parameters, such as trial-to-trial variability in capacity, the number of items encoded, the parameters of the loss function, or additive response noise. Rather, it is assumed that the variability in observed responses is entirely due to the rational minimization of expected loss for a given capacity constraint and loss function. Model parameters were fit separately to the data from each participant by maximum likelihood estimation. The data were discretized into 1,000 bins before fitting the model. 
Results and discussion
The decision-theoretic model was compared to the VP-P model in terms of its ability to fit the empirical data quantitatively across the seven datasets. AIC scores for the VP-P model (van den Berg et al., 2014) were provided by R. van den Berg. Table 3 compares ΔAIC scores for each model and experiment. Compared to the VP-P model, a decision-theoretic account offers a modest advantage in explaining human visual working memory performance (mean ΔAIC = −4.45). This advantage comes in spite of having fewer assumptions and embedded mechanisms. In particular, the only assumptions necessary to provide a close quantitative account of human memory performance are that visual memory is limited in capacity, and that it seeks to make efficient use of that capacity by minimizing the cost of memory error subject to some loss function. 
The estimated capacity for each experiment and set size condition is plotted in Figure 11a. Individual subjects are shown by plot markers, the solid black line shows the across-subjects mean capacity. For all experiments, the estimated information-theoretic capacity decreases monotonically with increasing set size, a finding that is in line with previous work (Sims et al., 2012). 
Figure 11
 
(a) Estimated memory capacity as a function of set size across seven delayed estimation datasets. Plot markers correspond to individual subjects. Black lines indicate the across-subjects average capacity. The smooth red curves indicate the predicted falloff in capacity according to a power law with exponent Image not available . (b) Estimated loss functions for each participant (black curves) in each dataset. The thick red curve shows an inverted cosine loss function as a reference for comparison across datasets.
Figure 11
 
(a) Estimated memory capacity as a function of set size across seven delayed estimation datasets. Plot markers correspond to individual subjects. Black lines indicate the across-subjects average capacity. The smooth red curves indicate the predicted falloff in capacity according to a power law with exponent Image not available . (b) Estimated loss functions for each participant (black curves) in each dataset. The thick red curve shows an inverted cosine loss function as a reference for comparison across datasets.
Several previous models of visual working memory have assumed that response precision decreases as a power-law function of set size, or the number of items encoded (Bays & Husain, 2008; Keshvari, van den Berg, & Ma, 2013; van den Berg et al., 2012; van den Berg et al., 2014). Intuitively, the same relationship might be predicted for information-theoretic capacity: doubling the number of features might be predicted to halve the allocated capacity to each item, resulting in a power law with exponent −1. However, there are many factors that complicate this simple relationship. For example, motor variability is independent of set size, and so will have a comparatively larger effect at small set sizes. The perceptual discriminability of features may also depend in a complicated fashion on the number of visual features in a display (Allred & Flombaum, 2014). 
Perhaps because of these complicating factors, the measured falloff in capacity with increasing set size is not fully described by a simple power law. For each condition, I multiplied the average capacity for that condition, by the set size (=average estimated bits per item × number of items in the display). I then took the maximum value of this measure across all set sizes as a lower bound on total memory capacity, Rtotal. The red curves in Figure 11a plot Rtotal/k, where k indicates set size. This curve is the predicted drop-off in visual memory capacity according to a power law with exponent = −1. As can be seen, the theoretical prediction resembles the estimated capacity for large set sizes fairly closely, but substantially overestimates capacity in the single item conditions. In other words, given subjects' performance in the larger set size conditions, they should have performed better than they did in remembering a single item. This discrepancy may be explained simply by incorporating sensory noise and response noise into the model (as these factors will have the largest impact, relatively speaking, in the small set size conditions). However, this hypothesis remains to be tested in future research. 
Figure 11b shows the estimated loss function for each subject in each experiment (black curves), along with an inverted cosine loss function as a standard of reference (thick red curve). With one notable exception, the shape of the loss functions is generally consistent across subjects as well as across experiments. The exception is the loss functions estimated from the Wilken and Ma (2004) dataset. Compared to the remaining datasets, the estimated loss function negatively accelerates even for small errors. Functionally, this difference leads to a much sharper peak in the predicted distribution of memory errors (similar to the error distributions illustrated in Figure 10). Curiously, this is also the dataset with the largest difference in AIC scores between the decision-theoretic model and the VP-P model (Table 3). This suggests a general pattern of behavior that the VP-P model is unable to replicate. The shape of the loss function is also consistent across subjects in the Wilken and Ma (2004) experiment, pointing to a systematic difference between this experiment and the remaining six datasets, rather than a peculiarity specific to a small number of subjects. Hence, investigating the extent to which the loss function of visual memory is sensitive or adaptive to properties of a particular task or experiment represents an important goal for future research. 
In the opening of this section, it was emphasized that the goal of the analysis is not to challenge previous conclusions (Fougnie et al., 2012; van den Berg et al., 2014) regarding the presence of variability in visual working memory. Rather, the goal was to introduce a new approach for understanding the nature of that variability. Information theory dictates that memory error is unavoidable for physical information systems that attempt to transmit continuous-valued signals. However, information theory also demonstrates that the distribution of errors may be controlled in order to minimize the negative consequences of error. Minimizing the cost of memory error is a reasonable goal for visual working memory, so it seems appropriate to characterize the brain in terms of its implicit loss function for visual features. The results in this section demonstrate that this adaptive explanation is also able to capture human performance at a detailed quantitative level that compares favorably to the best existing alternative models. Hence, considering the role of loss functions can substantially improve our ability to account for empirical data while simultaneously enriching the theoretical vocabulary for understanding and explaining human visual memory. 
Conclusions
What is the cost of misremembering? This is a question that fundamentally shapes visual working memory, but has not previously been considered. In human motor control, the theoretical construct of a loss function encapsulates our understanding of how and why the brain prefers certain movement patterns over others, and prefers certain movement errors over others. Like motor control, visual working memory is a system that is fundamentally constrained by noise and error. Rather than treating this noise in an atheoretical manner, I argue that significant progress can be made by assuming that the brain is sensitive to the consequences of visual memory error. If this modest hypothesis is correct, then it necessitates the existence of some loss function for visual working memory: a mathematical entity that quantifies the relative costs of making different kinds of memory errors. This function need not be explicitly represented in the brain, but rather is defined implicitly by the pattern of memory errors that the brain does and does not make. 
In the present paper, I have formally defined the construct of a loss function as it relates to visual working memory, and placed it within the theoretical framework of information theory. This framework has previously been demonstrated (Sims et al., 2012) to be useful in explaining several phenomena of human visual working memory, including effects of set size and statistical complexity on visual memory precision. However, previous work has necessitated the use of unfounded assumptions regarding the loss function for visual memory. Indeed, implicit assumptions about the loss function are widespread in models of visual working memory. Any model that assumes the distribution of visual memory error is Gaussian-like (or its circular equivalent, the von Mises distribution) implicitly makes an assumption about the types of memory errors that the brain is attempting to minimize. However, the loss function itself has not previously been a focus for investigation. 
By adopting inverse decision theory (Körding & Wolpert, 2004), I demonstrated the feasibility of estimating loss functions from behavior observed in typical visual working memory experiments (Experiment 1). Based on these data, the empirical loss function for both orientation and color increases rapidly for small errors, but saturates at larger error. This loss function is more complicated than the one implicitly assumed by most existing models of visual working memory. The results of the experiment also demonstrate that the empirical loss function can differ for distinct visual features (color versus orientation). There seems little reason to believe that a memory error of 2° in angular orientation, and an error of 2° in color as defined by an arbitrary circular space, should be treated equivalently by the visual system. The present approach allows confirming this intuition, but more importantly provides a quantitative assessment of precisely how memory error in different visual features may be treated by the brain. Consequently, incorporating the construct of a loss function enables a better quantitative account of empirical data, as well as a richer theoretical vocabulary for describing human memory. 
Working in the opposite direction (Experiment 2), I also demonstrated that it is possible to derive predictions for human performance by starting from properties of natural tasks where visual working memory serves an important role. In particular, I demonstrated that an optimal system for gaze correction following saccadic error (Hollingworth et al., 2008) leads to a predicted visual working memory system that qualitatively captures several important aspects of human memory performance. Gaze correction presents a compelling argument for the function of visual working memory, as it is a problem that is encountered thousands of times per day. The results of the second experiment provide the first ecological explanation for the distribution of memory errors typically observed in human visual working memory. 
In Experiment 3, I examined the ability of this framework to quantitatively explain the results from seven previously published datasets in the visual working memory literature. In all cases, the decision-theoretic approach offers a quantitative explanation for the data that compares favorably to many alternative models of visual working memory (van den Berg et al., 2014). What implication does this result have? In particular, is visual memory error best explained purely by internal mechanisms such as variability in memory precision, or rather ecological accounts based on minimizing expected costs? I argue that this question poses a false dichotomy. 
Identifiability and levels of analysis
The mathematical framework described in this paper represents an explanation for behavior at the computational level (Marr, 1982). An explanation for behavior in terms of its rationality or ecological fitness does not disprove a particular mechanistic explanation, as ecological costs might also influence the evolution of neural mechanisms. At the same time, the results underscore that it may be difficult or impossible to definitively identify hypothesized mechanisms based solely on behavioral data. This is the identifiability problem raised by Anderson (1990). 
Incorporating known biological or neural processing limitations as additional constraints on the development of theories is an important response to the lack of identifiability. The recent model developed by Bays (2014) is one example of this approach. In this model, independent populations of visual memory neurons encode each stimulus item, and divisive normalization (Carandini & Heeger, 2012) accounts for the decrease in memory precision with increasing set size. One of the key features of this model is its ability to produce heavy-tailed error distributions, a consequence in the model of a low signal-to-noise ratio in spiking activity. The strength of this model stems not just from its ability to fit empirical data, but also from the plausibility of its component neural mechanisms (see also Franconeri, Alvarez, & Cavanagh, 2013). 
Just as neural evidence provides an important constraint on the development of theories, the rationality or fitness of a mechanism for the natural world provides an additional source of constraint. This is the approach taken in the present paper, and more generally, in the ideal observer framework (Geisler, 2008). These two approaches are not conflicting but rather offer complementary means of understanding the function of the visual system. The logical next step is the development of rational process models (e.g., Griffiths, Vul, & Sanborn, 2012) that explain visual working memory behavior simultaneously in rational and mechanistic terms. Biological constraints can be incorporated into the present framework, for example, as additional constraint terms in Equation 3, or as explicit assumptions about the mechanisms responsible for memory error. Additionally, costs need not be external. Metabolic costs associated with neural processing (Laughlin, van Steveninck, & Anderson, 1998) can be incorporated into the definition of a loss function, and properties of neural coding might shed light on why memory capacity is as high or as low as it is. However, I emphasize that ecological costs for visual working memory error, and the influence of the statistics of natural tasks, are neglected topics in visual working memory research. 
Variability in memory precision
In recent years, several groups have demonstrated the importance of considering intertrial variability in memory performance (Fougnie et al., 2012; van den Berg et al., 2014; van den Berg et al., 2012). For example, while a von Mises distribution might offer a poor fit to the distribution of memory errors, a mixture of von Mises distributions with varying precision parameters offers a far better explanation. Such variability has a plausible basis in terms of fluctuations in attention, the number of items encoded, or even trial-to-trial variation in the memorability or discriminability of stimuli. On the surface, the decision-theoretic account developed in the present paper appears to ignore these additional sources of variability. However, this is not quite accurate. 
Memory capacity, in an information-theoretic sense, is defined by the average reduction in uncertainty across a large number of trials (this follows from the definition of mutual information, Equation 2). Information channels with higher variance (e.g., noise) will generally have a lower capacity. Hence, variability in memory performance is already encapsulated within the assumption of a fixed and constant capacity. The present paper differs from previous approaches chiefly in that it does not posit any additional assumptions regarding the precise nature of variability. Rather, the nature of variability is constrained only by the goal of minimizing costs. These minimal assumptions are shown to be sufficient to produce an excellent account of a wide range of data. However, these results do not rule out additional mechanistic assumptions regarding memory variability. 
Previous work using the information-theoretic approach (Sims et al., 2012) tested a model that incorporated trial-to-trial variability in the number of items encoded, dividing information-theoretic capacity evenly between all encoded items. The present paper opted to test a simpler version. This was done to limit the computational demands associated with fitting the model, as well as to maintain focus on the theoretical construct of the loss function as it relates to visual working memory. However, accounting for additional sources of variability may influence the shape of the recovered loss function. For example, the functions reported in Figure 8 and Figure 11 are likely “blurred” by sensory noise and response noise, factors that properly should be separated from the construct of a loss function. If there is variability in memory capacity (more formally, this is equivalent to an information channel with nonstationary capacity) this may also result in the conflation of the visual system's “intended” loss function, and the measured loss function. 
Limits of rational explanations
The current work is far from unique in explaining aspects of cognitive or neural processing in terms of a rational adaptation to properties of the external environment (Anderson, 1990; Chater & Oaksford, 1999; Doya, Ishii, Pouget, & Rao, 2007; Geisler, 2008). One critique of this general approach is that it may hide flexibility or arbitrariness in the selection of tasks for which the cognitive system is assumed to be adapted (Marcus & Davis, 2013). Such concerns potentially carry over to the interpretation of a loss function as indicating optimality or near-optimality in visual working memory. 
A similar set of concerns exists for a large body of research in human motor control that seeks to explain properties of human movement in terms of minimizing a loss function. One popular model (Flash & Hogan, 1985) assumes that hand movements are planned to minimize the square of movement jerkiness (the time derivative of acceleration). An important question is whether this assumption is meaningful when made a posteriori to the observation that human movements happen to appear smooth (Engelbrecht, 2001). More recent models assume that the goal in motor planning and execution is to minimize costs that are more closely tied to the task being performed (Knill et al., 2011; Liu & Todorov, 2007; Sims et al., 2011; Todorov, 2004). However, in all of these models, the loss function that behavior is assumed to minimize is to some extent underconstrained by the task, and must be fit to properties of behavior. While unavoidable, and not limited to rational or probabilistic models, this aspect of model development has important implications. In particular, for every behavior, there exists some loss function for which that behavior is optimal. If pursued recklessly, a posteriori theorizing can easily turn into circular reasoning. 
To what extent, then, is an empirically measured loss function for visual working memory a meaningful and real construct? As a purely descriptive tool, there is no circularity in “fitting” a loss function to human data. The resulting construct represents a compact and valid mathematical description of human memory in terms of the task for which it is optimal. This empirically derived loss function can be used to predict the extent to which the visual system will exhibit optimal or suboptimal performance in a new task. Novel descriptive properties of behavior can also be useful for constraining the development of future models. Körding and Wolpert (2004) empirically estimated the loss function implicit in sensorimotor control, and found that the result differed substantially from many existing models in motor control. Similarly, in the case of visual working memory, the majority of existing models assume that memory error follows a Gaussian-like distribution, or is somehow derived from this distribution by introducing additional mechanisms. The empirically derived loss functions demonstrate the inappropriateness of this assumption. 
A closely related concern is that a model with sufficient flexibility in its loss function could provide a close quantitative fit to any pattern of data. In other words, is the current model falsifiable? The present analysis considered only loss functions that are plausible a priori—for example, excluding loss functions in which it is preferable to make larger errors compared to smaller errors. Hence, in literal terms, the model is not capable of reproducing arbitrary error distributions. However, to give a better answer to the question of falsifiability, it is necessary to be clear about the claims that the model does, and does not make. In particular, the current model does not make assumptions about neural mechanisms, or contradict existing implementation-level theories. Rather, the model can be understood in terms of a weak claim, and a stronger claim regarding how costs influence visual working memory. 
The weak claim is that the shape of the error distribution in visual working memory is adaptive for some loss function. This claim is probably not falsifiable, but as discussed above, this doesn't negate the utility or descriptive validity of the approach. The stronger claim is that the implicit loss function of visual working memory is shaped by the costs of memory error in natural tasks. Experiment 2 represents the first empirical test of this claim, but there is substantial room for future work. One important avenue for future research is developing techniques to measure the loss function of visual working memory based directly on measured performance in biologically relevant tasks, rather than separately estimating a loss function from a delayed estimation task. The finding that the measured loss function for visual working memory is substantially suboptimal in a biologically important task that a person is performing, as they are performing it, would constitute strong evidence against the framework. 
In summary, a loss function is not simply a source of free parameters for model fitting, but instead represents a theoretically grounded and interpretable construct. Unlike alternative models, a loss function is uniquely able to connect the distribution of memory errors to the costs of memory error in natural tasks. This leads to novel and testable predictions. In the present paper I have focused on the following questions: For what class of tasks is visual working memory efficient? and Can natural tasks be used to generate predictions or better understand for the properties of visual working memory? Critically, gaze correction (Hollingworth et al., 2008) was identified as a “model task” for understanding visual working memory a priori to the present results. In addition, the results are not intended to demonstrate that visual working memory is generically “optimal” or “near-optimal.” Indeed, one of the strengths of this framework is its ability to identify tasks where human performance might be predicted to be substantially suboptimal. Future work might examine how other natural tasks, such as the use of visual memory for online movement control (Brouwer & Knill, 2007, 2009) relate to behavior observed in more typical “laboratory” tasks. 
Adaptability of the loss function
What factors shape the loss function of visual working memory? At present, there are no definitive answers to this question. In particular, it remains largely unanswered the extent to which a loss function is fixed by evolution, neural constraints, or development, or whether it is subject to adaptation and learning in the context of novel tasks. Several previous experiments have demonstrated that the precision of visual memory can be influenced by task and attentional factors (Gorgoraptis, Catalao, Bays, & Husain, 2011; Maxcey-Richard & Hollingworth, 2012; but see Marshall & Bays, 2013). Other research has shown that perceptual expertise can have a large impact on visual working memory performance (Curby et al., 2009; Herzmann & Curran, 2011). The present results unlock a new means of understanding these effects. 
If the external environment imposes nontrivial costs on visual memory error, then it is reasonable to hypothesize that these costs may also shape the neural mechanisms that underlie visual working memory. In other words, neural spiking activity may “explain” visual memory error, but the properties of the external world may also contribute to explanations of spiking activity. Dean, Harper, and McAlpine (2005) demonstrated that mammalian auditory neurons rapidly adapt their coding properties (including changes in gain) to relatively complex statistics of auditory input in order to maximize information transmission. This result demonstrates that properties of early perceptual processing are highly adaptive and sensitive to the structure of the external world. It is natural to hypothesize that this same plasticity may extend upstream to perceptual memory. A fascinating area for future investigation is the extent to which neural population activity in visual working memory might also be shaped by changes in externally defined costs. 
Finally, the present results also underscore the need to substantially revise and enrich current models of visual working memory capacity. Understanding the plasticity of visual working memory (Orhan et al., 2014) is at least as important as understanding its limits. The majority of existing models have focused primarily on set size effects for unitary visual features. However, the behavioral phenomena of visual working memory are much richer (Suchow, Fougnie, Brady, & Alvarez, 2014). Set size, statistical structure and complexity, and attentional manipulations all influence memory precision (Brady et al., 2009; Gorgoraptis et al., 2011; Sims, Jacobs, & Knill, 2012), and the present empirical results also demonstrate the need to better understand how visual memory capacity is, or isn't, shared across distinct visual feature dimensions. Other recent models of visual working memory have also argued for the need to move beyond item limits or simple resource pool models, and instead have advocated studying visual working memory as richly and hierarchically structured (Brady & Tenenbaum, 2013; Orhan & Jacobs, 2013). An important area for future research is integrating such models, which attempt to capture a broader range of phenomena in visual working memory, with the neural computations that might plausibly underlie the behavior. 
Figure 12
 
(a) Parameter recovery analysis. Each panel plots the actual parameter value (abscissa) against the recovered maximum likelihood parameter value (ordinate) for 1,400 simulated datasets. Clockwise from top left: Image not available , Image not available , Image not available , Image not available . Correlation coefficients are shown in the upper left of each panel. (b) Histogram of reconstruction error in the estimated loss function, across all datasets.
Figure 12
 
(a) Parameter recovery analysis. Each panel plots the actual parameter value (abscissa) against the recovered maximum likelihood parameter value (ordinate) for 1,400 simulated datasets. Clockwise from top left: Image not available , Image not available , Image not available , Image not available . Correlation coefficients are shown in the upper left of each panel. (b) Histogram of reconstruction error in the estimated loss function, across all datasets.
Acknowledgments
I would like to thank Robert Jacobs, Daryl Fougnie, and two anonymous reviewers for valuable comments and feedback. I also thank Ronald van den Berg for sharing the VP-P model results described in Experiment 2. David Knill passed away during the preparation of this manuscript. He was a collaborator, mentor, sounding board, and tremendous source of inspiration for this research. This paper is dedicated to his memory. 
Commercial relationships: none. 
Corresponding author: Chris R. Sims. 
Email: chris.sims@drexel.edu. 
Address: Department of Psychology, Drexel University, Philadelphia, PA. 
References
Akaike H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19 (6), 716–723. [CrossRef]
Allred S. R. Flombaum J. I. (2014). Relating color working memory and color perception. Trends in Cognitive Sciences, 18 (11), 562–565. [CrossRef] [PubMed]
Alvarez G. Cavanagh P. (2004). The capacity of visual short-term memory is set both by visual information load and by number of objects. Psychological Science, 15 (2), 106–111. [CrossRef] [PubMed]
Anderson J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Erlbaum.
Anderson D. E. Awh E. (2012). The plateau in mnemonic resolution across large set sizes indicates discrete resource limits in visual working memory. Attention, Perception, & Psychophysics, 74, 891–910. [CrossRef]
Anderson D. E. Vogel E. K. Awh E. (2011). Precision in visual working memory reaches a stable plateau when individual item limits are exceeded. Journal of Neuroscience, 31 (3), 1128–1138. [CrossRef] [PubMed]
Barlow H. (1961). Possible principles underlying the transformation of sensory messages. In Rosenblith W. A. (ed.), Sensory communication (pp. 217–234). Cambridge, MA: MIT Press.
Battaglia P. W. Schrater P. R. (2007). Humans trade off viewing time and movement duration to improve visuomotor accuracy in a fast reaching task. Journal of Neuroscience, 27 (26), 6984–6994. [CrossRef] [PubMed]
Bays P. M. (2014). Noise in neural populations accounts for errors in working memory. Journal of Neuroscience, 34 (10), 3632–3645. [CrossRef] [PubMed]
Bays P. M. Catalao R. F. G. Husain M. (2009). The precision of visual working memory is set by allocation of a shared resource. Journal of Vision, 9 (10): 7, 1–11, http://www.journalofvision.org/content/9/10/7, doi:10.1167/9.10.7. [PubMed] [Article] [PubMed]
Bays P. Husain M. (2008). Dynamic shifts of limited working memory resources in human vision. Science, 321, 851–854. [CrossRef] [PubMed]
Becker W. (1991). Saccades. In Carpenter R. H. S. (Ed.), Vision and visual dysfunction: Vol. 8. Eye movements (pp. 93–137). London, UK: Macmillan.
Berger T. (1971). Rate distortion theory: A mathematical basis for data compression. Englewood Cliffs, NJ: Prentice-Hall.
Blahut R. E. (1972). Computation of channel capacity and rate-distortion functions. IEEE Transactions on Information Theory, 18 (4), 460–473. [CrossRef]
Brady T. F. Konkle T. Alvarez G. A. (2009). Compression in visual working memory: Using statistical regularities to form more efficient memory representations. Journal of Experimental Psychology: General, 138 (4), 487–502. [CrossRef] [PubMed]
Brady T. F. Konkle T. Alvarez G. A. (2011). A review of visual memory capacity: Beyond individual items and toward structured representations. Journal of Vision, 11 (5): 4, 1–34, http://www.journalofvision.org/content/11/5/4, doi:10.1167/11.5.4. [PubMed] [Article]
Brady T. F. Tenebaum J. B. (2013). A probabilistic model of visual working memory: Incorporating higher order regularities into working memory capacity estimates. Psychological Review, 120 (1), 85. [CrossRef] [PubMed]
Brouwer A.-M. Knill D. C. (2007). The role of memory in visually guided reaching. Journal of Vision, 7 (5): 6, 1–12, http://www.journalofvision.org/content/7/5/6, doi:10.1167/7.5.6. [PubMed] [Article] [PubMed]
Brouwer A.-M. Knill D. C. (2009). Humans use visual and remembered information about object location to plan pointing movements. Journal of Vision, 9 (1): 24, 1–19, http://www.journalofvision.org/content/9/2/24, doi:10.1167/9.1.24. [PubMed] [Article]
Burnham K. P. Anderson D. R. (2004). Multimodel inference: Understanding AIC and BIC in model selection. Sociological Methods & Research, 33 (2), 261–304. [CrossRef]
Carandini M. Heeger D. J. (2012). Normalization as a canonical neural computation. Nature Reviews Neuroscience, 13, 51–62. [CrossRef]
Chater N. Oaksford M. (1999). Ten years of the rational analysis of cognition. Trends in Cognitive Sciences, 3 (2), 57–65. [CrossRef] [PubMed]
Curby K. Glazek K. Gauthier I. (2009). A visual short-term memory advantage for objects of expertise. Journal of Experimental Psychology: Human Perception and Performance, 35 (1), 94–107. [CrossRef] [PubMed]
Dean I. Harper N. S. McAlpine D. (2005). Neural population coding of sound level adapts to stimulus statistics. Nature Neuroscience, 8 (12), 1684–1689. [CrossRef] [PubMed]
Donelan J. M. Kram R. Kuo A. D. (2002). Mechanical work for step-to-step transitions is a major determinant of the metabolic cost of human walking. The Journal of Experimental Biology, 205, 3717–3727. [PubMed]
Doya K. Ishii S. Pouget A. Rao R. P. N. (Eds.). (2007). Bayesian brain: Probabilistic approaches to neural coding. Cambridge, MA: MIT Press.
Engelbrecht S. E. (2001). Minimum principles in motor control. Journal of Mathematical Psychology, 45, 497–542. [CrossRef] [PubMed]
Fisher N. I. (1995). Statistical analysis of circular data. Cambridge, UK: Cambridge University Press.
Flash T. Hogan N. (1985). The coordination of arm movements: An experimentally confirmed mathematical model. The Journal of Neuroscience, 5 (7), 1688–1703. [PubMed]
Fougnie D. Asplund C. L. Marois R. (2010). What are the units of storage in visual working memory? Journal of Vision, 10 (12): 27, 1–11, http://www.journalofvision.org/content/10/12/27, doi:10.1167/10.12.27. [PubMed] [Article]
Fougnie D. Suchow J. W. Alvarez G. A. (2012). Variability in the quality of visual working memory. Nature Communications, 3: 1229, 1–8, doi:10.1038/ncomms2237.
Franconeri S. L. Alvarez G. A. Cavanagh P. (2013). Flexible cognitive resources: Competitive content maps for attention and memory. Trends in Cognitive Sciences, 17 (3), 134–141. [CrossRef] [PubMed]
Geisler W. S. (2008). Visual perception and the statistical properties of natural scenes. Annual Review of Psychology, 59, 167–192. [CrossRef] [PubMed]
Gonzalez R. Wu G. (1999). On the shape of the probability weighting function. Cognitive Psychology, 38, 129–166. [CrossRef] [PubMed]
Gorgoraptis N. Catalao R. F. Bays P. M. Husain M. (2011). Dynamic updating of working memory resources for visual objects. Journal of Neuroscience, 31, 8502–8511. [CrossRef] [PubMed]
Green D. M. Swets J. A. (1989). Signal detection theory and psychophysics. Los Altos, CA: Peninsula Publishing.
Griffiths T. L. Vul E. Sanborn A. N. (2012). Bridging levels of analysis for probabilistic models of cognition. Current Directions in Psychological Science, 21 (4), 263–268. [CrossRef]
Harremoës P. (2010). Information theory for angular data. In 2010 IEEE Information Theory Workshop (ITW), 181–185.
Harris C. M. Wolpert D. M. (1998). Signal-dependent noise determines motor planning. Nature, 394, 780–784. [CrossRef] [PubMed]
Hayhoe M. M. Shrivastava A. Mruczek R. Pelz J. B. (2003). Visual memory and motor planning in a natural task. Journal of Vision, 3 (1): 6, 49–63, http://www.journalofvision.org/content/3/1/6, doi:10.1167/3.1.6. [PubMed] [Article] [PubMed]
Herzmann G. Curran T. (2011). Experts' memory: An ERP study of perceptual expertise effects on encoding and recognition. Memory & Cognition, 39, 412–432. [CrossRef] [PubMed]
Hollingworth A. Richard A. M. Luck S. J. (2008). Understanding the function of visual short-term memory: Transsaccadic memory, object correspondence, and gaze correction. Journal of Experimental Psychology: General, 137 (1), 163–181. [CrossRef] [PubMed]
Huang H. J. Kram R. Ahmed A. A. (2012). Reduction of metabolic cost during motor learning of arm reaching dynamics. Journal of Neuroscience, 32 (6), 2182–2190. [CrossRef] [PubMed]
Keshvari S. van den Berg R. Ma W. J. (2013). No evidence for an item limit in change detection. PLoS Computational Biology, 9 (2), e1002927. [CrossRef] [PubMed]
Knill D. C. Bondada A Chhabra M. (2011). Flexible, task-dependent use of sensory feedback to control hand movements. Journal of Neuroscience, 31 (4), 1219–1237. [CrossRef] [PubMed]
Körding K. (2007). Decision theory: What “should” the nervous system do? Science, 318, 606–610. [CrossRef] [PubMed]
Körding K. Wolpert D. M. (2004). The loss function of sensorimotor learning. Proceedings of the National Academy of Sciences, USA, 101 (26), 9839–9842. [CrossRef]
Landy M. S. Mamassian P. (2007). Visual estimation under risk. Journal of Vision, 7 (6): 4, 1–15, http://www.journalofvision.org/content/7/6/4, doi:10.1167/7.6.4. [PubMed] [Article] [PubMed]
Laughlin S. B. van Steveninck R. R. D. Anderson J. C. (1998). The metabolic cost of neural information. Nature Neuroscience, 1 (1), 36–41. [CrossRef] [PubMed]
Liu D. Todorov E. (2007). Evidence for the flexible sensorimotor strategies predicted by optimal feedback control. Journal of Neuroscience, 27 (35), 9354–9368. [CrossRef] [PubMed]
Luck S. J. Vogel E. K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 390 (6657), 279–281. [CrossRef] [PubMed]
Luck S. J. Vogel E. K. (2013). Visual working memory capacity: From psychophysics and neurobiology to individual differences. Trends in Cognitive Sciences, 17 (8), 391–400. [CrossRef] [PubMed]
Ma W. J. (2012). Organizing probabilistic models of perception. Trends in Cognitive Sciences, 16 (10), 511–518. [CrossRef] [PubMed]
Ma W. J. Husain M. Bays P. M. (2014). Changing concepts of working memory. Nature Neuroscience, 17 (3), 347–356. [CrossRef] [PubMed]
Maloney L. T. Zhang H. (2010). Decision-theoretic models of visual perception and action. Vision Research, 50 (23), 2362–2374. [CrossRef] [PubMed]
Marcus G. F. Davis E. (2013). How robust are probabilistic models of higher-level cognition? Psychological Science, 24 (12), 2351–2360. [CrossRef] [PubMed]
Marr D. (1982). Vision. San Francisco, CA: Freeman.
Marshall L. Bays P. M. (2013). Obligatory encoding of task-irrelevant features depletes working memory resources. Journal of Vision, 13 (2): 21, 1–13, http://www.journalofvision.org/content/13/2/21. doi:10.1167.13.2.21. [PubMed] [Article]
Maxcey-Richard A. M. Hollingworth A. (2013). The strategic retention of task-relevant objects in visual working memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39 (3), 760. [CrossRef] [PubMed]
Olson I. R. Jiang Y (2002). Is visual short-term memory object based? Rejection of the “strong-object” hypothesis. Perception & Psychophysics, 64 (7), 1055–1067. [CrossRef] [PubMed]
Orhan A. E. Jacobs R. A. (2013). A probabilistic clustering theory of the organization of visual short-term memory. Psychological Review, 120 (2), 297. [CrossRef] [PubMed]
Orhan E. Sims C. R. Jacobs R. A. Knill D. C. (2014). The adaptive nature of visual working memory. Current Directions in Psychological Science, 23 (3), 164–170. [CrossRef]
Palmer J. (1990). Attentional limits on the perception and memory of visual information. Journal of Experimental Psychology: Human Perception and Performance, 16 (2), 332–350. [CrossRef] [PubMed]
Rensink R. A. (2002). Change detection. Annual Review of Psychology, 53, 245–277. [CrossRef] [PubMed]
Shannon C. E. Weaver W. (1949). The mathematical theory of communication. Champaign, IL: University of Illinois Press.
Simoncelli E. P. Olshausen B. (2001). Natural image statistics and neural representation. Annual Review of Neuroscience, 24, 1193–1216. [CrossRef] [PubMed]
Sims C. Jacobs R. Knill D. (2011). Adaptive allocation of vision under competing task demands. Journal of Neuroscience, 31 (3), 928–943. [CrossRef] [PubMed]
Sims C. Jacobs R. Knill D. (2012). An ideal observer analysis of visual working memory. Psychological Review, 119 (4), 807–830. [CrossRef] [PubMed]
Suchow J. W. Fougnie D. Brady T. F. Alvarez G. A. (2014). Terms of the debate on the format and structure of visual memory. Attention, Perception, & Psychophysics, 76 (7), 2071–2079. [CrossRef]
Todorov E. (2004). Optimality principles in sensorimotor control. Nature Neuroscience, 7 (9), 907–915. [CrossRef] [PubMed]
Trommershäuser J. Landy M. S. Maloney L. T. (2006). Humans rapidly estimate expected gain in movement planning. Psychological Science, 17 (11), 981–988. [CrossRef] [PubMed]
Trommershäuser J. Maloney L. T. Landy M. S. (2003). Statistical decision theory and trade-offs in the control of motor response. Spatial Vision, 16 (3), 255–275. [CrossRef] [PubMed]
van den Berg R. Awh E. Ma W. (2014). Factorial comparison of working memory models. Psychological Review, 121 (1), 124–149. [CrossRef] [PubMed]
van den Berg R. Shin H. Chou W.-C. George R. Ma W. J. (2012). Variability in encoding precision accounts for visual short-term memory limitations. Proceedings of the National Academy of Sciences, USA, 109 (22), 8780–8785. [CrossRef]
Whiteley L. Sahani M. (2008). Implicit knowledge of visual uncertainty guides decisions with asymmetric outcomes. Journal of Vision, 8 (3): 2, 1–15, http://www.journalofvision.org/content/8/3/2, doi:10.1167/8.3.2. [PubMed] [Article] [PubMed]
Wilken P. Ma W. (2004). A detection theory account of change detection. Journal of Vision, 4 (12): 11, 1120–1135, http://www.journalofvision.org/content/4/12/11, doi:10.1167/4.12.11. [PubMed] [Article] [PubMed]
Wolpert D. M. Landy M. S. (2012). Motor control is decision-making. Current Opinion in Neurobiology, 22, 996–1003. [CrossRef] [PubMed]
Woodman G. F. Luck S. J. (2004). Visual search is slowed when visuospatial working memory is occupied. Psychonomic Bulletin & Review, 11 (2), 269–274. [CrossRef] [PubMed]
Zhang W. Luck S. (2008). Discrete fixed-resolution representations in visual working memory. Nature, 453, 233–236. [CrossRef] [PubMed]
Footnotes
1  Although “bits” are commonly associated with digital computers and binary coding, this unit of measure is equally applicable to analog systems, and defining capacity in this way does not make any assumptions about the nature of information coding. By analogy, a foot is a unit of length, but this does not require an item measuring 0.75 feet in length be constructed out of (fractions of) physical “feet.”
Footnotes
2  When the features x and y are circular quantities, |yx| should be interpreted as the smallest positive angular difference (e.g., |0° − 350°| − 10°).
Footnotes
3  This is feasible because mutual information is a convex function of q.
Footnotes
4  At the time of writing, the datasets and accompanying model code can be obtained from the website of Ronald van den Berg, http://www.ronaldvandenberg.org/code.html.
Appendix A: Parameter recovery analysis
In this appendix, I examine the accuracy and reliability of the mathematical procedure for estimating memory capacity and loss functions. To do this, I constructed a large number (1,400) of simulated datasets. Each dataset consisted of 300 simulated trials of a typical delayed recall visual memory experiment with set size of two items (300 trials is also the same number of trials in each condition of Experiment 1 reported in the main text). Each dataset was constructed by randomly sampling memory capacity ( Display FormulaImage not available ) and the two parameters of a loss function ( Display FormulaImage not available and Display FormulaImage not available ). In addition, I examined the robustness of the model estimation procedure to the presence of nontarget report errors (Bays et al., 2009). On a fraction of trials, governed by the parameter Display FormulaImage not available , the model reported a visual feature for the nontarget item. Hence, the likelihood function for the model used to generate each dataset is given by  where Display FormulaImage not available and Display FormulaImage not available are the actual feature value for the target item and distractor, and Display FormulaImage not available is the remembered feature value.  
The goal of the analysis is to attempt to recover the parameters used to generate each dataset by maximum likelihood estimation. The range of parameters used in constructing the datasets was as follows: Display FormulaImage not available ; Display FormulaImage not available ; Display FormulaImage not available ; Display FormulaImage not available . These parameter ranges were chosen as they encompass the range of parameters estimated from subjects in Experiment 1.  
Figure 12a plots the scatter in reconstruction error for each of the four parameters of the model. The abscissa shows the actual parameter used to generate the dataset, and the ordinate plots the recovered (maximum likelihood) parameter estimate. Accuracy was extremely high in recovering memory capacity ( Display FormulaImage not available . Recovery performance for the parameters of the loss function was moderately lower (for Display FormulaImage not available , for Display FormulaImage not available ). However, this assessment may be overly critical, as it is possible for different combinations of parameter values to yield highly similar loss functions. To assess this latter possibility, I defined reconstruction error as the integrated squared difference between the true and recovered loss functions:    
This quantity ranges from 0 (perfect reconstruction) to Display FormulaImage not available (integrating a maximal difference of 1 across the entire range of Display FormulaImage not available ). The mean reconstruction error was extremely low: 0.062 (SD = 0.16). Figure 12b plots the histogram of reconstruction errors across all 1,400 artificial datasets, using 2,000 equal-width bins in the range Display FormulaImage not available . The histogram shows that, with high probability, the reconstructed loss function matched the actual loss function used to generate the data with low error. Thus, although the recovered parameters of the loss function might differ moderately in absolute magnitude, this had little effect on the accuracy of the estimated loss function.  
Appendix B: Task-defined loss functions for gaze correction
Experiment 2 in the main text describes the optimally efficient visual memory system for the problem of gaze correction following saccadic error (Hollingworth et al., 2008). The problem for the visual system is to reidentify an intended saccade target by comparing to a visual working memory representation. In this task, a correct identification will be made whenever the visual memory representation is closer to the actual target than any alternative distractor items. This enables deriving the loss function for this family of tasks. 
With a single distractor item, an equivalent expression to Equation 7 for the loss function is given by  where Display FormulaImage not available represents the true visual feature value for the target item, Display FormulaImage not available is the feature value for the distractor, and Display FormulaImage not available is the noisy visual working memory representation. The double brackets [·] refer to the Iverson bracket, which equals 1 if the contents of the bracket are logically true, and 0 otherwise. The integral marginalizes over all possible values of the distractor, which in this case is uniformly distributed on the circle. As in the main text, the difference Display FormulaImage not available should be read as the smallest absolute angular difference. Minimizing this loss function is equivalent to minimizing the probability of making an incorrect corrective saccade. In the circular domain this integral has the simple analytic solution Display FormulaImage not available , where Display FormulaImage not available .  
For the case where there are two distractors, the loss function is easily extended:  with Display FormulaImage not available and Display FormulaImage not available referring to the two distractors. The analytical solution in this case is given by    
By analogy, the loss function can also be extended to handle more than two distractors. For three distractors,    
For four distractors,    
The present paper does not examine the case of more than four distractor items. If the number of distractors is itself a random variable, the expected loss function could be marginalized over the loss function conditioned on each case. The result would appear as a weighted combination of the loss functions shown in Figure 10
Figure 1
 
Illustration of an efficient memory system as minimizing expected loss under a constraint on memory capacity. Rate–distortion theory (Berger, 1971) defines the minimum channel capacity necessary to achieve a desired level of performance. This is illustrated by a rate–distortion curve, shown in red. No physical system can exist in the region below this curve. An optimally efficient memory system is one that minimizes expected loss, subject to a constraint on memory capacity (shown by the horizontal line).
Figure 1
 
Illustration of an efficient memory system as minimizing expected loss under a constraint on memory capacity. Rate–distortion theory (Berger, 1971) defines the minimum channel capacity necessary to achieve a desired level of performance. This is illustrated by a rate–distortion curve, shown in red. No physical system can exist in the region below this curve. An optimally efficient memory system is one that minimizes expected loss, subject to a constraint on memory capacity (shown by the horizontal line).
Figure 2
 
Comparison of four different loss functions (left column) and the resulting predictions for the optimal distribution of memory errors (right column). From top to bottom, the loss functions are an inverted cosine, linear, step, and quadratic functions. Each panel on the right shows the predicted behavior of an optimally efficient memory system, assuming a capacity constraint of either 1 or 3 bits.
Figure 2
 
Comparison of four different loss functions (left column) and the resulting predictions for the optimal distribution of memory errors (right column). From top to bottom, the loss functions are an inverted cosine, linear, step, and quadratic functions. Each panel on the right shows the predicted behavior of an optimally efficient memory system, assuming a capacity constraint of either 1 or 3 bits.
Figure 3
 
A flexible parametric family of loss functions, after the function used by Gonzalez and Wu (1999). The left panel fixes the parameter Image not available while varying Image not available from 0.5 to 5. The right panel fixes Image not available while varying Image not available from 0.1 to Image not available .
Figure 3
 
A flexible parametric family of loss functions, after the function used by Gonzalez and Wu (1999). The left panel fixes the parameter Image not available while varying Image not available from 0.5 to 5. The right panel fixes Image not available while varying Image not available from 0.1 to Image not available .
Figure 4
 
(a) Stimuli consisted of two colored isosceles triangles presented at an eccentricity of 6° of visual angle from a fixation point. Stimuli were displayed for 1 s, followed by a blank retention interval. (b) During color probe trials, a color wheel was displayed at the former location of one of the triangles. Participants indicated their response by using the mouse to adjust the orientation of the white tick mark to match their memory for the triangle's color. (c) During orientation probe trials, a black ring was used to probe for orientation memory.
Figure 4
 
(a) Stimuli consisted of two colored isosceles triangles presented at an eccentricity of 6° of visual angle from a fixation point. Stimuli were displayed for 1 s, followed by a blank retention interval. (b) During color probe trials, a color wheel was displayed at the former location of one of the triangles. Participants indicated their response by using the mouse to adjust the orientation of the white tick mark to match their memory for the triangle's color. (c) During orientation probe trials, a black ring was used to probe for orientation memory.
Figure 5
 
(a) Histogram of memory recall errors, pooled across all subjects. The red and blue distributions correspond to the single feature and conjunction conditions of the experiment, respectively. (b) Difference in frequency of error between the single feature and conjunction conditions. Difference histograms were obtained by subtracting (conjunction – single feature) histograms in (a).
Figure 5
 
(a) Histogram of memory recall errors, pooled across all subjects. The red and blue distributions correspond to the single feature and conjunction conditions of the experiment, respectively. (b) Difference in frequency of error between the single feature and conjunction conditions. Difference histograms were obtained by subtracting (conjunction – single feature) histograms in (a).
Figure 6
 
Summary statistics computed from the memory error distribution for each subject in each condition. Left panel: circular variance. Right panel: circular kurtosis. The plot markers indicate mean values, averaged across subjects; error bars indicate 95% confidence intervals. The red lines are based on simulated data generated from the best-fitting model.
Figure 6
 
Summary statistics computed from the memory error distribution for each subject in each condition. Left panel: circular variance. Right panel: circular kurtosis. The plot markers indicate mean values, averaged across subjects; error bars indicate 95% confidence intervals. The red lines are based on simulated data generated from the best-fitting model.
Figure 7
 
(a) AIC scores for each model relative to the best model (model 2). Scores are averaged across participants. (b) The average AIC score for each model factor, computed based on the mean AIC score for all models with a given factor equal to “Yes,” minus the mean AIC score across all models where the same factor is “No.” In both panels, lower values indicate a preferred model.
Figure 7
 
(a) AIC scores for each model relative to the best model (model 2). Scores are averaged across participants. (b) The average AIC score for each model factor, computed based on the mean AIC score for all models with a given factor equal to “Yes,” minus the mean AIC score across all models where the same factor is “No.” In both panels, lower values indicate a preferred model.
Figure 8
 
Inferred loss functions for each participant. (a) Loss functions estimated for color. (b) Loss functions estimated for orientation. The black curves are the estimated loss function for each subject. An inverted cosine loss function is shown in red for comparison.
Figure 8
 
Inferred loss functions for each participant. (a) Loss functions estimated for color. (b) Loss functions estimated for orientation. The black curves are the estimated loss function for each subject. An inverted cosine loss function is shown in red for comparison.
Figure 9
 
Schematic illustration of the challenge facing the visual system following an inaccurate saccade. Visual working memory representations of the intended target can be used to determine where to direct a corrective saccade (Hollingworth et al., 2008).
Figure 9
 
Schematic illustration of the challenge facing the visual system following an inaccurate saccade. Visual working memory representations of the intended target can be used to determine where to direct a corrective saccade (Hollingworth et al., 2008).
Figure 10
 
(a) Task-defined loss functions for an object reidentification task with varying numbers of distractor items (one through four distractors shown as separate curves). (b) Corresponding optimally efficient memory channels for each loss function, assuming a capacity limit of 1 bit.
Figure 10
 
(a) Task-defined loss functions for an object reidentification task with varying numbers of distractor items (one through four distractors shown as separate curves). (b) Corresponding optimally efficient memory channels for each loss function, assuming a capacity limit of 1 bit.
Figure 11
 
(a) Estimated memory capacity as a function of set size across seven delayed estimation datasets. Plot markers correspond to individual subjects. Black lines indicate the across-subjects average capacity. The smooth red curves indicate the predicted falloff in capacity according to a power law with exponent Image not available . (b) Estimated loss functions for each participant (black curves) in each dataset. The thick red curve shows an inverted cosine loss function as a reference for comparison across datasets.
Figure 11
 
(a) Estimated memory capacity as a function of set size across seven delayed estimation datasets. Plot markers correspond to individual subjects. Black lines indicate the across-subjects average capacity. The smooth red curves indicate the predicted falloff in capacity according to a power law with exponent Image not available . (b) Estimated loss functions for each participant (black curves) in each dataset. The thick red curve shows an inverted cosine loss function as a reference for comparison across datasets.
Figure 12
 
(a) Parameter recovery analysis. Each panel plots the actual parameter value (abscissa) against the recovered maximum likelihood parameter value (ordinate) for 1,400 simulated datasets. Clockwise from top left: Image not available , Image not available , Image not available , Image not available . Correlation coefficients are shown in the upper left of each panel. (b) Histogram of reconstruction error in the estimated loss function, across all datasets.
Figure 12
 
(a) Parameter recovery analysis. Each panel plots the actual parameter value (abscissa) against the recovered maximum likelihood parameter value (ordinate) for 1,400 simulated datasets. Clockwise from top left: Image not available , Image not available , Image not available , Image not available . Correlation coefficients are shown in the upper left of each panel. (b) Histogram of reconstruction error in the estimated loss function, across all datasets.
Table 1
 
Comparison of 16 models, varying across four binary factors. The models are evaluated based on the Akaike information criterion (AIC) score relative to the best model, as well as two-fold cross validation (CV) of maximum likelihood. For AIC, smaller values indicate a preferred model. For cross validation, higher values indicate a favored model (higher log-likelihood of the validation data).
Table 1
 
Comparison of 16 models, varying across four binary factors. The models are evaluated based on the Akaike information criterion (AIC) score relative to the best model, as well as two-fold cross validation (CV) of maximum likelihood. For AIC, smaller values indicate a preferred model. For cross validation, higher values indicate a favored model (higher log-likelihood of the validation data).
Model Factor Number of model parameters ΔAIC CV log-likelihood Model rank (AIC) Model rank (CV)
A B C D
1 N N N N 12 2.74 −848.69 2 2
2 N N N Y 8 0.00 −846.12 1 1
3 N N Y N 8 9.31 −850.31 4 5
4 N N Y Y 6 7.75 −849.39 3 3
5 N Y N N 10 16.13 −850.72 6 6
6 N Y N Y 6 13.29 −849.52 5 4
7 N Y Y N 6 23.36 −854.10 8 8
8 N Y Y Y 4 21.78 −853.17 7 7
9 Y N N N 10 76.06 −882.39 10 10
10 Y N N Y 6 73.49 −881.12 9 9
11 Y N Y N 6 88.37 −887.72 14 14
12 Y N Y Y 4 86.72 −886.66 12 12
13 Y Y N N 9 87.22 −887.30 13 13
14 Y Y N Y 5 85.26 −885.89 11 11
15 Y Y Y N 5 100.39 −892.88 16 16
16 Y Y Y Y 3 98.79 −891.84 15 15
Table 2
 
Mean parameter estimates (averaged across subjects) for each model parameter in each condition of Experiment 1, according to the best-fitting model (model 2). Values in parentheses indicate standard deviations.
Table 2
 
Mean parameter estimates (averaged across subjects) for each model parameter in each condition of Experiment 1, according to the best-fitting model (model 2). Values in parentheses indicate standard deviations.
Memory load Visual feature R, bits μ β
Single feature Color 1.42 (0.27) 0.85 (0.12) 1.79 (0.22)
Orientation 2.20 (0.63) 0.68 (0.09) 1.98 (0.30)
Conjunction Color 1.13 (0.38) 0.85 (0.12) 1.79 (0.22)
Orientation 1.86 (0.66) 0.68 (0.09) 1.98 (0.30)
Table 3
 
Experimental datasets used for comparing the decision-theoretic (DT) model to the variable precision model (VP-P; van den Berg et al. 2014). Models are compared based on relative AIC values.
Table 3
 
Experimental datasets used for comparing the decision-theoretic (DT) model to the variable precision model (VP-P; van den Berg et al. 2014). Models are compared based on relative AIC values.
Reference Feature Set sizes Model ΔAIC
DT VP-P
Wilken & Ma, 2004 Color 1, 2, 4, 8 0.00 +14.28
Zhang & Luck, 2008 Color 1, 2, 3, 6 0.00 +5.04
Bays, Catalao, & Husain, 2009 Color 1, 2, 4, 6 0.00 +0.56
Anderson, Vogel, & Awh, 2011 Orientation 1–4, 6, 8 +1.13 0.00
Anderson & Awh, 2012 Orientation 1–4, 6, 8 0.00 +0.24
van den Berg et al., 2012 Color (scrolling) 1–8 0.00 +5.75
van den Berg et al., 2012 Color (wheel) 1–8 0.00 +6.45
 Mean ΔAIC (all experiments) 0.0 +4.45
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×