**Since sensory measurements are noisy, an observer is rarely certain about the identity of a stimulus. In visual perception tasks, observers generally take their uncertainty about a stimulus into account when doing so helps task performance. Whether the same holds in visual working memory tasks is largely unknown. Ten human and two monkey subjects localized a single change in orientation between a sample display containing three ellipses and a test display containing two ellipses. To manipulate uncertainty, we varied the reliability of orientation information by making each ellipse more or less elongated (two levels); reliability was independent across the stimuli. In both species, a variable-precision encoding model equipped with an “uncertainty–indifferent” decision rule, which uses only the noisy memories, fitted the data poorly. In both species, a much better fit was provided by a model in which the observer also takes the levels of reliability-driven uncertainty associated with the memories into account. In particular, a measured change in a low-reliability stimulus was given lower weight than the same change in a high-reliability stimulus. We did not find strong evidence that observers took reliability-independent variations in uncertainty into account. Our results illustrate the importance of studying the decision stage in comparison tasks and provide further evidence for evolutionary continuity of working memory systems between monkeys and humans.**

*N*stimuli changed, the optimal observer would compute from the

*N*noisy memories and the

*N*test stimuli the probability that any stimulus changed, and report the test stimulus with the highest probability. Human decisions in VWM-based comparison tasks are well described by an optimal decision rule acting on a variable-precision encoding stage (Devkar, Wright, & Ma, 2015; Keshvari et al., 2012; Mazyar, van den Berg, & Ma, 2012; van den Berg, Shin, Chou, George, & Ma, 2012).

*x*, then the likelihood function over the hypothesized sample stimulus

*θ*that produced

*x*is

*L*(

*θ*) =

*p*(

*x*|

*θ*). The likelihood function represents degrees of belief in different hypothesized values of

*θ*, and the width of the likelihood function represents uncertainty. If the observer is not only optimal but also Bayesian in a strong sense (engaging in what has been called “probabilistic computation” or “Bayesian transfer” (Ma, 2012; Ma & Jazayeri, 2014; Maloney & Mamassian, 2009), then they will incorporated the full likelihood function (and associated level of uncertainty) over each sample stimulus on each trial, and perform optimally even as uncertainty differs between sample stimuli and across trials. A main alternative to such an “uncertainty-incorporating” rule is an “uncertainty-indifferent” rule, which only uses the measured changes (the differences between the memories and the test stimulus) to make a comparison decision (Donkin, Nosofsky, Gold, & Shiffrin, 2013). To distinguish between uncertainty-indifferent and uncertainty-incorporating decision rules, it is imperative to manipulate the reliability of the stimulus information, as is often done in the study of cue combination (Knill & Pouget, 2004; Trommershauser, Kording, & Landy, 2011; Yuille & Bulthoff, 1996) and recently also in the study of visual search (Ma, Navalpakkam, Beck, van den Berg, & Pouget, 2011). Higher reliability means a narrower likelihood function and lower uncertainty, and the Bayes-optimal observer would give less weight to less reliable (more uncertain) evidence (Yuille & Bulthoff, 1996). In an orientation change detection task with variable reliabilities, the Bayes-optimal decision model with a variable-precision encoding model again best described human subjects' behavior (Keshvari et al., 2012), indicating that the brain treats working memories differently depending on their associated levels of uncertainty.

*Macaca mulatta*; weights: M1 = 16.5 kg and M2 = 13.5 kg; ages: M1 = 17.5 and M2 = 12.5 years) were tested in a change localization task for five days each week. Before daily testing, we restricted the monkeys' food and water intake. After completing the daily experimental sessions, animals were returned to their individual caging room and received a standard diet of primate chow and water. All animal procedures conformed to the National Institutes of Health guidelines, approved by the Institutional Review Board at University of Texas Health Science Center at Houston, and supervised by the Institutional Animal Care and Use Committee. The study adhered to the ARVO (Association for Research in Vision and Ophthalmology) Statement for the Use of Animals in Ophthalmic and Visual Research.

^{2}displayed on a black background. Two types of ellipses, of equal area, were used: “high reliability” (HR; long and narrow) and “low reliability” (LR; short and wide). Based on the average distance of the monkey from the screen (approximately 35 cm), the HR and LR stimuli subtended visual angles of 2.9° × 0.65° and 1.5° × 1.3°, respectively. Stimuli were presented in six possible locations on the screen, arranged on an imaginary circle of radius 7.4°.

*θ*, was drawn independently from a uniform distribution over 18 possible orientations (−90° to 80° in increments of 10°). The orientation of the changed stimulus in the test display was drawn from the same uniform distribution. Testing consisted of 60 sessions, with 192-trial blocks per session, for a total of 11,520 trials per monkey.

*π*] by multiplying all orientations and orientation change magnitudes by 2 before analysis. We consistently follow this convention in all equations as follow; however, for the figures only, we mapped change magnitudes back to actual orientation space.

*x*of the

_{i}*i*

^{th}sample orientation,

*θ*, follows a von Mises distribution centered at

_{i}*θ*with concentration parameter

_{i}*κ*:

_{i}*κ*is the concentration parameter that controls the width of the distribution and

_{i}*I*

_{0}is the modified Bessel functions of the first kind of order 0. Following our previous work, we express encoding precision in terms of Fisher information, denoted by

*J*(Keshvari et al., 2013; van den Berg et al., 2012). Fisher information measures the best possible performance of any unbiased decoder through the Cramér-Rao bound (Cover & Thomas, 1991). If

_{i}*x*were normally distributed, Fisher information would be equal to the inverse of the variance of the Gaussian distribution. Moreover, Fisher information has a neural interpretation: in Poisson-like neural populations, it is proportional to the amplitude (gain) of activity in the population (W. J. Ma, 2010; Seung & Sompolinsky, 1993), which, in turn, can be thought of as the amount of neural resource devoted to the encoding of the memory (Bays, 2014; Ma et al., 2014; Van den Berg et al., 2012). For a von Mises distribution (Equation 1), Fisher information is directly related to the concentration parameter

_{i}*κ*through

_{i}*I*

_{1}is the modified Bessel function of the first kind of order 1.

*x*

_{1},

*x*

_{2}, and

*x*

_{3}through a doubly stochastic process, where each

*x*is drawn from a von Mises distribution (a circular analog of a Gaussian distribution, used because orientation space is periodic) with a precision value that is itself randomly drawn from a gamma distribution with mean

_{i}*τ*, denoted by

*x*

_{1}and

*x*

_{2}, the optimal observer would build beliefs about the underlying sample stimuli,

*θ*

_{1}and

*θ*

_{2}, by “inverting” the generative model. Specifically, based on Equation 1, the likelihood function over

*θ*

*is a von Mises distribution over*

_{i}*θ*

*centered at*

_{i}*x*and concentration parameter

_{i}*κ*. The width of this likelihood function is a measure of the observer's uncertainty about

_{i}*θ*

*: the higher encoding precision, the higher*

_{i}*κ*, and the lower the optimal observer's uncertainty. Although uncertainty is determined by the sources of variability in precision in the encoding stage, both reliability-driven and reliability-independent, we use the term “uncertainty” only to describe a property of the observer's beliefs about the stimuli, not to describe the encoding stage.

_{i}*L*(1 or 2), given the noisy memories,

*x*

_{1}and

*x*

_{2}, the associated concentration parameters,

*κ*

_{1}and

*κ*

_{2}(corresponding with uncertainty levels), and the two test orientations,

*φ*

_{1}and

*φ*

_{2}. We will use the term “measured changes” for the differences

*x*

_{1}−

*φ*

_{1}and

*x*

_{2}−

*φ*

_{2}.

*Variable Precision-Variable Precision (optimal)*model, the observer takes both reliability-driven and reliability-independent variations in uncertainty into account. Specifically, the observer responds that the change occurred at location 1 if

*κ*

_{1}and

*κ*

_{2}, which are determined by the respective values of precision,

*J*

_{1}and

*J*

_{2}, which are in turn determined both by the physical reliability and by random fluctuations in precision.

*x*−

_{i}*φ*

*), as a measure of dissimilarity between memory and test, and of the corresponding prefactors*

_{i}*κ*

_{1}and

*κ*

_{2}as weights on dissimilarity, analogous to the

*x*

_{1}−

*φ*

_{1}=

*x*

_{2}−

*φ*

_{2}=

*π*. Then, the decision rule becomes log

*I*

_{0}(

*κ*

_{1}) +

*κ*

_{1}> log

*I*

_{0}(

*κ*

_{2}) +

*κ*

_{2}, which simplifies to

*κ*

_{1}>

*κ*

_{2}, since the function log

*I*

_{0}(

*y*) +

*y*is a monotonically increasing function of

*y*. In other words, the observer reports the location where uncertainty is lower; this makes sense, since at this location, the measured change is less likely to have been caused by noise. The other special case is in which the measured change is 0 at both locations, i.e.

*x*

_{1}−

*φ*

_{1}=

*x*

_{2}−

*φ*

_{2}=0. Then, the decision rule becomes log

*I*

_{0}(

*κ*

_{1}) −

*κ*

_{1}> log

*I*

_{0}(

*κ*

_{2}) −

*κ*

_{2}, which simplifies to

*κ*

_{1}<

*κ*

_{2}, since the function log

*I*

_{0}(

*y*) −

*y*is a monotonically decreasing function of

*y*. In other words, the observer now reports the location where uncertainty is

*higher*; this makes sense, since at this location, it is more likely that the measured change of 0 was caused by a true change greater than 0.

*Variable Precision-Fixed Precision*model, a suboptimal Bayesian model, the observer takes reliability-driven, but not reliability-independent variations in uncertainty into account: they behave as if

*κ*

_{1}and

*κ*

_{2}are completely determined by the physical reliabilities of the stimuli, and ignores any additional internal variability in encoding precision. This model is the same as the Variable Precision-Equal Precision (VEO) model in (Keshvari et al., 2012), but we found “fixed” a more intuitive description for “lack of variability across trials” than “equal.” The observer thus uses only two levels of assumed precision:

*κ*

_{high}and

*κ*

_{low}. The decision rule, then, is identical to Equation 3 but with

*κ*

_{1}and

*κ*

_{2}each taking on one of only two possible values,

*κ*

_{high}and

*κ*

_{low}, depending on the reliability of that stimulus.

*Variable Precision-Single Precision*model, another suboptimal Bayesian model, the observer completely disregards variations in uncertainty and behaves as if

*κ*

_{1}=

*κ*

_{2}on every trial. Then, the observer reports location 1 when

- Without loss of generality, we assume
*θ*1 =*θ*2 = 0 and L = 1 (the changing stimulus is always the first one), so that*φ*1 = Δ and*φ*2 = 0. - We drew 10,000 random values of J1 and J2 from a gamma distribution with scale parameter
*τ*. Depending on the reliability condition, the means of the gamma distributions areDisplay Formula \(\left( {{{\bar J}_{{\rm{high}}}},{{\bar J}_{{\rm{high}}}}} \right)\),Display Formula \(\left( {{{\bar J}_{{\rm{high}}}},{{\bar J}_{{\rm{low}}}}} \right)\),Display Formula \(\left( {{{\bar J}_{{\rm{low}}}},{{\bar J}_{{\rm{high}}}}} \right)\), orDisplay Formula \(\left( {{{\bar J}_{{\rm{low}}}},{{\bar J}_{{\rm{low}}}}} \right)\). - For each combination of
*J*_{1}and*J*_{2}, we computed the corresponding*κ*_{1}and*κ*_{2}through Equation 2, then drew of*x*_{1}and*x*_{2}from a von Mises distribution with mean 0 and those concentration parameters. - We evaluated the decision rule for each of the 10,000 draws, and then computed the proportion of correct responses across all draws. This is our estimate of the probability correct according to the model for a given change magnitude and reliability condition.

*c*) and for each parameter combination.

**t**. The likelihood of

**t**is the probability of the trial-to-trial subject responses given the trial-to-trial stimuli and

**t**. In our task, each response is uniquely defined by the correctness of the response. Moreover, the model's prediction for the probability of a correct response only depends on reliability condition

*c*and change magnitude Δ. Finally, we assume that all trials are independent. Then, the log likelihood of

**t**is

*n*

_{trials}) and correctness

*is 1 if the subject was correct on the*

_{i}*i*

^{th}trial and 0 if not. We can rewrite this as

*c*, change magnitude Δ, and by whether the observer was correct or incorrect, and

*n*(

*c*, Δ, correct) is the number of trials with a particular

*c*, Δ, and correctness.

^{1}

*SEM*), 65.4 ± 0.5 for Monkey 1, and 64.7 ± 0.4 for Monkey 2 (both mean ±

*SD*across bootstrapped data sets, which is an estimate of the

*SEM*; nonbootstrapped: 65.4 for Monkey 1 and 64.6 for Monkey 2). A logistic regression of proportion correct against species (with the data from all individuals aggregated) and condition (arranged as both HR, mixed reliability with the HR item changing, mixed reliability with the LR item changing, and both LR) shows significant effects of species (

*β*= 0.175 ± 0.005;

*p*<10

^{−10}) and of condition (

*β*= −0.0788 ± 0.0021;

*p*<10

^{−10}) (Figure 2A). Yet, proportion correct was above chance even in the both-LR condition (binomial tests: aggregated human data:

*p*< 10

^{−10}; Monkey 1:

*p*= 6.8 × 10

^{−9}; Monkey 2:

*p*= 6.8 × 10

^{−5}). Thus, the low-reliability stimulus was not completely ignored.

*R*

^{2}computed from the fit to the psychometric curves for model comparison, but

*R*

^{2}is an unprincipled measure for binary data (it is different from the log likelihood, which we optimize) and should not be used.

*best*-fitting model does not necessarily fit the data

*well*, but here it does. The VP–FP model, and to a lesser extent the VP–VP model, captures the features of the psychometric curves both qualitatively and quantitatively. In particular, the VP–FP and VP–VP models both correctly predict that in the mixed-reliability condition where the low-reliability stimulus changes, the magnitude of the change does not affect proportion correct. This makes sense: If the observer is confident that the high-reliability stimulus did not change, then the other stimulus must have changed; this type of decision by elimination does not take the magnitude of the change into account.

*N*-stimulus change detection rather than two-stimulus change localization. However, it is not clear to us how these differences could have caused the VP–VP model to win.

*, 8 (11), e1002771.*

*PLoS Computational Biology**, 19 (6), 716–723.*

*IEEE Transactions on Automatic Control**, 15, 106–111.*

*Psychological Science**, 14 (4): 7, 1–23, doi:10.1167/14.4.7. [PubMed] [Article]*

*Journal of Vision**, 34 (10), 3632–3645.*

*Journal of Neuroscience**, 321 (5890), 851–854.*

*Science**, 120 (1), 85–109.*

*Psychological Review**, 63, 386–396.*

*Neuron**, 30 (45), 15241–15253.*

*Journal of Neuroscience**. New York: John Wiley & Sons.*

*Elements of information theory**. New York: Psychology Press.*

*Working memory capacity**, 15 (16): 13, 1–18, doi:10.1167/15.16.13. [PubMed] [Article]*

*Joournal of Vision**Psychological Review*, 120 (4), 873–902.

*. Paper presented at Computational and Systems Neuroscience.*

*Neural corrrelates of dynamic sensory cue re-weighting in macaque area MSTd**, 3, 1229.*

*Nature Communications**, 14, 926–932.*

*Nature Neuroscience**. Paper presented at the Cosyne Abstracts, Salt Lake City.*

*Using a doubly-stochastic model to analyze neuronal activity in the visual cortex**, 11 (10), 1201–1210.*

*Nature Neuroscience**, 11 (3): 11, 1–10, doi:10.1167/11.3.11. [PubMed] [Article]*

*Journal of Vision**, 7 (6), e40216. doi:10.1371/journal.pone.0040216*

*PLoS ONE**, 9 (2), e1002927.*

*PLoS Computational Biology**, 27 (12), 712–719.*

*Trends in Neuroscience**, 44 (14), 1707–1716.*

*Vision Research**, 12 (3): 13, 1–12, doi:10.1167/12.3.13. [PubMed] [Article]*

*Journal of Vision**, 50, 2308–2319.*

*Vision Research**, 16 (10), 511–518.*

*Trends in Cognitive Sciences**, 17, 347–356.*

*Nature Neuroscience**, 37, 205–220.*

*Annual Review of Neuroscience**, 14, 783–790, doi:10.1038/nn.2814.*

*Nature Neuroscience**, 307 (5712), 1121–1124.*

*Science**. Cambridge, UK: Cambridge University Press.*

*Information theory, inference and learning algorithms**. Mahwah, New Jersey: Lawrence Erlbaum Associates.*

*Detection theory: A user's guide*(2nd ed.)*, 26 (1), 147–155.*

*Visual Neuroscience**, 12 (6): 10, 1–16, doi:10.1167/12.6.10. [PubMed] [Article]*

*Journal of Vision**, 16 (2), 332–350.*

*Journal of Experimental Psychology: Human Perception & Performance**, 44 (4), 369–378.*

*Perception & Psychophysics**, 16 (2), 283–290.*

*Perception & Psychophysics**, 43 (1), 6–17.*

*Journal of Experimental Psychology: Human Perception & Performance**, 12 (13): 21, 1–13, doi:10.1167/12.13.21. [PubMed] [Article]*

*Journal of Vision**, 4 (3), 203–218.*

*Nature Reviews Neuroscience**, 6 (2), 461–464.*

*Annals of Statistics**, 90, 10749–10753.*

*Proceedings of the National Academy of Sciences, USA**. New York: Oxford University Press.*

*Sensory cue integration**, 121 (1), 124–149.*

*Psychological Reviews**. 76 (7), 2117–2135.*

*Attention, Perception, & Psychophysics**, 109 (22), 8780–8785.*

*Proceedings of the National Academy of Sciences, USA**, 124 (2), 197–214.*

*Psychological Review**, 4 (12): 11, 1120–1135, doi:10.1167/4.12.11. [PubMed] [Article]*

*Journal of Vision**. (pp. 123–161). New York: Cambridge University Press.*

*Perception as Bayesian inference**, 453 (7192), 233–235.*

*Nature**L*(1 or 2), the magnitude of the change, Δ, the relevant sample orientations,

*θ*

_{1}and

*θ*

_{2}(all other sample stimuli are irrelevant to the decision), their noisy memories,

*x*

_{1}, and

*x*

_{2}, and the two test orientations,

*φ*

_{1}and

*φ*

_{2}. Each variable has an associated probability distribution.

- Since both test locations are equally likely to contain the change, we have
*p*(*L*) = 0.5. - In the experiment, both sample orientations (
*θ*_{1}and*θ*_{2}) follow a discrete uniform distribution with 18 possible values. The subject may or may not have learned these values (see next point). However, the computation as follows will hold also if the observer assumes a different discrete uniform distribution or a continuous uniform distribution. Therefore, we simply writeDisplay Formula \(p\left( {{\theta _1},{\theta _2}} \right) = {1 \over k}\), with*k*a constant. - In the experiment, change magnitude Δ also follows a discrete uniform distribution with 18 possible values, but we approximate it by a continuous uniform distributions,
Display Formula \(p\left( \Delta \right) = {1 \over {2\pi }}\). There are three reasons for this choice: - • We consider it unlikely that an observer learns those exact 18 change magnitudes. Albeit in a different domain, a study that used a discrete stimulus distribution consisting of six values showed that subjects did not learn those values. Specifically, a single Gaussian or a mixture of two Gaussians accounted better for the data than a mixture of six Gaussians (with free standard deviations, allowing for the value 0) centered on the true stimuli (Acerbi, Wolpert, & Vijayakumar, 2012).
- • The “true ideal-observer” model in which the subject does learn the 18 change magnitudes makes very similar trial-to-trial predictions. Specifically, when we use the parameters estimated from the data of any individual human or monkey subject, and simulate 10,000 simulated pairs of
*x*_{1}and*x*_{2}per subject, reliability, and change magnitude, the trial-to-trial agreement between the decisions made by the exact ideal observer and our approximated ideal observer was greater than 99.4%. Thus, the models are essentially identical in the relevant range. - • The choice of continuous uniform distributions allows for a decision rule that not only has closed form but is also easily interpretable (as we show in the subsection “Models and model fitting—Decision rules”).
- We assume that the noisy memories
*x*_{1}and*x*_{2}are conditionally independent given the sample orientations*θ*_{1}and*θ*_{2}. Formally,*p*(*x*_{1},*x*_{2}|*θ*_{1},*θ*_{2})=*p*(*x*_{1}|*θ*_{1})*p*(*x*_{2}|*θ*_{2}). - We assume that
*p*(*x*|_{i}*θ*) is a von Mises distribution._{i} - When the change happens in the first location (
*L*=1), then*φ*_{1}=*θ*_{1}+ Δ and*φ*_{2}=*θ*_{2}. When the change happens in the second location (*L*=2), then*φ*_{1}=*θ*_{1}and*φ*_{2}=*θ*_{2}+ Δ.

*L*based on the noisy memories

*x*

_{1}and

*x*

_{2}and the test orientations

*φ*

_{1}and

*φ*

_{2}. An ideal observer does this by computing the posterior distribution over

*L*,

*p*(

*L*|

*x*

_{1},

*x*

_{2},

*φ*

_{1},

*φ*

_{2}). Since

*L*is binary, all information about the posterior is contained in the log posterior ratio, which can be rewritten using Bayes' rule:

*p*(

*L*= 1) =

*p*(

*L*= 2). We evaluate the likelihood of

*L*= 1 (the probability of the memories

*x*

_{1}and

*x*

_{2}if the change happened at the first location):

*L*= 2 (the probability of the memories if the change happened at the second location) is