**In the absence of external feedback, a decision maker must rely on a subjective estimate of their decision accuracy in order to appropriately guide behavior. Normative models of perceptual decision-making relate subjective estimates of internal signal quality (e.g., confidence) directly to the internal signal quality itself, thereby making it unknowable whether the subjective estimate or the underlying signal is what drives behavior. We constructed stimuli that dissociated the human observer's performance on a visual estimation task from their subjective estimates of confidence in their performance, thus violating normative principles. To understand whether confidence influences future decision-making, we examined serial dependence in observer's responses, a phenomenon whereby the estimate of a stimulus on the current trial can be biased toward the stimulus from the previous trial. We found that when decisions were made with high confidence, they conferred stronger biases upon the following trial, suggesting that confidence may enhance serial dependence. Critically, this finding was true also when confidence was experimentally dissociated from task performance, indicating that subjective confidence, independent of signal quality, can amplify serial dependence. These findings demonstrate an effect of confidence on future behavior, independent of task performance, and suggest that perceptual decisions incorporate recent history in an uncertainty-weighted manner, but where the uncertainty carried forward is a subjectively estimated and possibly suboptimal readout of objective sensory uncertainty.**

*per se*influences subsequent behavior, or if the underlying sensory uncertainty on which confidence is based is sufficient to drive future behavior. Indeed, normative models of perceptual decision-making posit a direct relationship between sensory uncertainty and the readout of subjective confidence (Kiani & Shadlen, 2009; Meyniel, Sigman, & Mainen, 2015; Pouget, Drugowitsch, & Kepecs, 2016; Sanders, Hangya, & Kepecs, 2016). Typically, experimenters manipulate stimulus evidence and evaluate the relation between decision accuracy and confidence (Kiani, Corthell, & Shadlen, 2014; van den Berg et al., 2016; Zylberberg, Fetsch, & Shadlen, 2016). Or, stimulus evidence is kept constant and trial-to-trial covariation in confidence and accuracy is examined (Hebart, Schriever, Donner, & Haynes, 2014). Both approaches, however, do not allow one to separate the influence of confidence from the influence of the quality of evidence. By manipulating evidence externally, as in the former case, or by relying on internal fluctuations of stimulus evidence, as in the latter case, previous paradigms have not teased apart sensory uncertainty and subjective confidence when examining the effects of confidence on subsequent behavior (for critical review, see Samaha, 2015).

*SD*= 2.01, 14 female). All subjects reported normal or corrected visual acuity, provided written informed consent, and were compensated monetarily. Sample size was chosen to be on par with recent serial dependence experiments that focus on group-level statistical inferences (Alais et al., 2017; Bliss et al., 2017; Fritsche et al., 2017), while also being large enough to detect the PEB, as per our prior work (Samaha et al., 2016). Data from this experiment were published previously as part of a multi-experiment study addressing different hypotheses (Samaha & Postle, 2017). This experiment was conducted in accordance with the University of Wisconsin Institutional Review Board and the Declaration of Helsinki. In accordance with the practices of open science and reproducibility, all raw data and code used in the present analyses are freely available through the Open Science Framework (https://osf.io/6uczk/).

*complete guess*) to 4 (

*very close*). Event timings are shown in Figure 2A.

*SD*= 2.72), which was held constant throughout the subsequent main task.

^{1}whereas the contrast of the grating and the noise component of the stimuli in the low PE condition were both halved with respect to the high PE values (see Figure 1B). Noise was set to 100% contrast (prior to averaging with the signal) in the high PE condition, and was halved (50%) in the low PE condition. This procedure matches the signal-to-noise ratio across both conditions, which we anticipate would lead to no change in estimation accuracy, but would lead to a change in confidence, if confidence is overreliant on the magnitude of the grating (signal) contrast, a proxy for the amount of PE represented in the brain.

*κ*, which describes the precision of the Von Mises, and a parameter that describes the height of the uniform distribution, which corresponds to the probability of making a random (“guess”) response. The model was fit to data using an expectation-maximization algorithm implemented in MATLAB code obtained from www.bayslab.com. We did not obtain enough trials at each of the five delay durations to reliably fit mixture models to each combination of delay and PE level separately. Therefore, any analysis of PEB with delay as a factor was conducted on mean and median absolute error. Confidence for high and low PE conditions was quantified as the mean rating across each type of trial. The effect of PE on confidence and each accuracy metric was evaluated statistically using two-tailed paired-sample

*t*tests. Improbable trials with responses faster than 200 ms or slower than the 95th percentile of response times across all subjects (5.04 s) were discarded prior to any analysis. Lastly, because we predict a null effect of our PE manipulation on estimation accuracy, we include Bayes factors whenever interpreting null hypotheses. All Bayes factors (BFs) are reported in terms of evidence for the null hypothesis (BF

_{null}), quantified as how many times more likely the data are to be observed under the null hypothesis. In the case of

*t*tests, Jeffreys–Zellner–Siow BFs were computed according to Rouder, Speckman, Sun, Morey, and Iverson (2009) using a normal prior (Jarosz & Wiley, 2014). In the case of correlations, BFs were computed according to Wetzels and Wagenmakers (2012).

*y*-axis) will be pulled toward the same sign as the relative difference (

*x*-axis), whereas a repulsive bias would result in response errors of an opposite sign, and no serial dependence would result in a flat line. This profile has been previously parameterized by fitting the data with a derivative-of-Gaussian function (DoG; Fischer & Whitney, 2014; Liberman et al., 2014; Alais et al., 2017; Bliss et al., 2017; Fritsche et al., 2017) of the form:

*x*is the relative orientation of the previous trial,

*a*is the amplitude of the curve peaks,

*w*is the width of the curve, and

*c*is the constant

*a*and width parameter

*w*were free to vary across a wide range of plausible values between [−15°, 15°] and [0.02, 0.2] respectively. Fitting was conducted by minimizing the sum of squared errors using the MATLAB routine

*lsqcurvefit.m*. To determine the statistical significance of group-level DoG fits, we used a bootstrapping procedure (DiCiccio & Efron, 1996). On each of 80,000 iterations we sampled subjects with replacement and fit a DoG to the average of the bootstrap sampled data. We saved the value of the amplitude parameter after each iteration, forming a distribution of the amplitude parameter of our sample. We computed 95% confidence intervals from this distribution and a

*p*value was calculated as the proportion of samples above zero amplitude (no serial dependence), which was considered significant at

*α*= 0.025 (two-tailed bootstrap test). To test whether confidence or PE on the previous trial predicted serial biases on the current trial, we refit DoG functions to data split according to whether the previous trial was high or low confidence (mean split according to each subject's mean confidence rating), or high or low PE. Statistical significance testing was conducted using the same bootstrap procedure as above, but the difference in serial dependence amplitude between conditions was saved on each iteration and a

*p*value was computed as the proportion of the difference score distribution greater than zero (

*α*= 0.025, two-tailed bootstrap test).

*t*tests, or by fitting a linear function to each subject's bias by confidence or bias by delay data and comparing the slope against zero at the group level with a two-tailed paired-samples

*t*test (Figure 5A, right panel).

*fft.m*). Serial dependence was quantified as the power at the frequency with highest power for each condition (the “dominant frequency”). This is akin to finding the amplitude of the best-fitting sinusoid that is allowed to vary in both frequency and phase. Following the statistical analysis of the DoG fit (described above), a bootstrap procedure was performed whereby group-level serial dependence curves were recomputed using a random subset of subjects (sampled with replacement) and decomposed with an FFT. On each of 80,000 bootstrap iterations, the power spectrum for each condition (Figure 5) was saved and the power of the dominant frequency in each condition was recorded. Subtracting the distributions created a distribution of difference scores reflecting the change in sine wave amplitude across conditions. Using these difference-score distributions, we computed

*p*values by taking the proportion of bootstrap samples greater than zero (α = 0.025, two-tailed bootstrap test) as well as 95% confidence intervals. Note that the

*x*-axis in the FFT plots are normalized to reflect the number of cycles of a particular sine wave across the whole serial dependence plot, rather than Hz (since the data are not a timeseries).

*ρ*range: [−0.46, −0.01]). This indicates that subject's confidence ratings reflected some knowledge of their own performance. A repeated-measures ANOVA including an interaction term between delay duration and PE did not reveal any reliable interaction of PE with delay when predicting accuracy (mean or median absolute error) or confidence (all

*ps*> 0.05); therefore, we focus on paired comparisons between high and low PE trials, aggregating over delay duration. As hypothesized, proportionally increasing both signal and noise contrast in a compound grating stimulus led to no discernable change in the accuracy of observer's responses as characterized by median response error,

*t*(19) = −0.27,

*p*= 0.79, BF

_{null}= 5.66, mean response error,

*t*(19) = 1.20,

*p*= 0.24, BF

_{null}= 3.00, the precision of responses (see Materials and Methods;

*t*[19] = 0.17,

*p*= 0.86, BF

_{null}= 5.78), or the probability of making a random response,

*t*(19) = 1.10,

*p*= 0.28, BF

_{null}= 3.33 (see Figure 3A and C). This is in line with previous null effects of the exact same (Samaha et al., 2016) and similar PE manipulations on two-choice discrimination accuracy (Zylberberg et al., 2012; Koizumi et al., 2015; Maniscalco et al., 2016). The BF analysis indicates that, across different metrics, the change in accuracy we observed is between 3 and 5.78 times more likely to be observed under the null.

*t*(19) = −5.06,

*p*= 0.00006 (Figure 3B). Analysis of the proportion of responses at each of the four confidence levels (Figure 3B) revealed that increasing PE led a decrease in the use of “1” ratings (

*complete guess*) and an increase in the use of “4” ratings (

*very close to the true orientation*;

*p*value per level of confidence:

*p*

_{conf1}= 0.0002,

*p*

_{conf2}= 0.99,

*p*

_{conf3}= 0.35,

*p*

_{conf4}= 0.005). Additionally, we checked whether individual differences in the PE-related change in accuracy and the PE-related change in confidence were correlated. Across all four metrics of accuracy, there was virtually no correlation (Spearman's rho) across subjects (mean error:

*ρ*= −0.073,

*p*= 0.75, BF

_{null}= 5.60; median error:

*ρ*= −0.024,

*p*= 0.92, BF

_{null}= 5.83; precision:

*ρ*= −0.066,

*p*= 0.78, BF

_{null}= 5.64; guess rate:

*ρ*= −0.176,

*p*= 0.45, BF

_{null}= 4.45; Figure 3C). This provides further evidence of independence between confidence and accuracy, indicating that even for an individual whose accuracy benefited from increasing PE, their confidence did not increase in kind. Analysis of BFs suggest that the correlation between individual differences are 4.45 to 5.83 times as likely to be observed under the null hypothesis of no correlation. This result suggests that confidence is not simply a more sensitive measure of behavior than estimation error, as individual differences would likely be correlated under this hypothesis.

*a*= 2.3°, 95% CI = [1.30, 3.28],

*p*< 0.0001), and a notable repulsive bias at larger relative orientation differences, consistent with recent reports (Bliss et al., 2017; Fritsche et al., 2017). This repulsive-attractive-attractive-repulsive profile produced an oscillation-like curve, which was verified with an FFT analysis showing a clear peak at a frequency of about two cycles (Figure 4, middle panel). Serial dependence was undetectable when trials considered incorrect (see Methods) were included (

*a*= 1.01°, CI = [−0.97, 2.60],

*p*= 0.11), which is sensible given that an undetected stimulus would not be expected to influence subsequent responses. The presence of serial dependence was also confirmed in the model-free analysis, which revealed a bias of comparable magnitude for trials within ±45° of relative difference (mean bias = 3.1°, CI = [1.26, 5.07],

*t*[19] = 3.45,

*p*= 0.002). Serial bias showed no reliable linear relationship with delay duration on the previous (mean slope = −0.12,

*t*[19] = −0.57,

*p*= 0.57, BF

_{null}= 5.02), or current trial (mean slope = 0.083,

*t*[19] = 0.54,

*p*= 0.59, BF

_{null}= 5.10), so we focused subsequent analysis on serial dependence averaged across all delays.

*a*= 2.15°, CI = [1.25, 3.04],

*p*= 0.0001), but not low confidence trials (

*a*= 0.64°, CI = [−0.78, 1.82],

*p*= 0.16). The difference distribution formed by subtracting the bootstrapped distribution of amplitude parameters in each condition was predominantly greater than zero, (Δ

*a*= 1.50, CI = [0.003, 3.21],

*p*= 0.023), indicating significantly larger biases following high as compared with low confidence trials. The same conclusion was reached with an FFT analysis comparing power at the dominant frequency of serial dependence curves for each condition (Δpower = 0.73, CI = [0.17, 1.35],

*p*= 0.005; Figure 5A, middle panel). This effect was also confirmed in the model-free analysis: Fitting a line to each subject's serial bias magnitude as a function of their confidence on the previous trial revealed a significant positive relationship (mean slope = 2.98°, CI = [1.50, 4.46],

*t*[19] = 4.22,

*p*= 0.0004). Paired contrasts at each level of confidence revealed that serial dependence was present only at confidence level 3 (

*p*= 0.0073) and 4 (

*p*= 0.020), and not levels 1 (

*p*= 0.17) or 2 (

*p*= 0.29; Figure 5A, right panel). These results suggest that confidence on the current trial may mediate that trial's attractive influence on the subsequent trial. However, this finding conflates confidence with other factors that may relate to performance. For instance, if subjects were inattentive on the current trial, then confidence and performance could both be reduced, leading to a smaller bias on the subsequent trial. In this explanation, attention would be the primary variable leading to reduced serial bias, not subjective confidence.

*p*s > 0.14, BF

_{nulls}between 2.03 and 4.34, indicating anecdotal to substantial evidence for the null). In fact, all metrics were pointing toward a difference in the opposite direction than the effect on confidence—slightly

*higher*mean and median error, and lower

*k*in the

*high*PE condition (guess rate was 0 in all cases since high error trials were removed). Confidence, on the other hand, remained significantly higher for high PE stimuli (

*t*[19] = −4.62,

*p*= 0.0002), confirming the PEB for this subset of trials. As was the case for the analysis of all trials, individual differences in the PE-related change in confidence was uncorrelated with PE-related changes in accuracy across all metrics (

*ρ*range = [−0.05–0.19],

*p*s > 0.42, BF

_{nulls}between 4.26 and 5.47).

*a*= 2.52°, CI = [1.54, 3.49],

*p*= 0.00001), and an effect following low PE trials (

*a*= 1.04°, CI = [0.12, 1.84],

*p*= 0.022). Critically, the distribution of amplitude differences from the bootstrap was significantly non-overlapping zero (Δ

*a*= 1.48, CI = [0.21, 2.90],

*p*= 0.011; Figure 5B, left panel), indicating that high PE on the previous trial lead to larger biases on the current trial than did low PE. The FFT analysis corroborated this effect, with higher power at the peak serial dependence frequency following high as compared with low PE trials (Δpower = 0.82, CI = [0.20, 1.30],

*p*= 0.008; Figure 5B, middle panel). The model-free analysis also replicated this result, with a significant bias following high,

*t*(19) = 4.57,

*p*= 0.0002, and low PE trials,

*t*(19) = 2.23,

*p*= 0.031, and a significantly greater bias following high as compared with low PE trials,

*t*(19) = 2.32,

*p*= 0.031 (Figure 5B, right panel). Because high PE was associated with a boost in confidence, but no change in accuracy, these results suggest that increasing confidence independently of accuracy is sufficient for amplifying serial dependence in orientation judgments.

*per se*, but because attention on that trial was diverted and the stimulus was processed suboptimally. To ascertain whether subjective confidence can modulate dependencies between current and future decisions, we designed an orientation estimation experiment that disentangled confidence ratings from objective task performance. We found that trial-to-trial variation in confidence predicted the magnitude of serial biases, such that when a trial was performed with high confidence it exerted a larger bias on the decision in the subsequent trial. Crucially, this relationship was replicated when we experimentally manipulated confidence levels without affecting task performance, indicating that confidence, divorced from performance, is capable of increasing serial dependence. This finding suggests that a representation of sensory uncertainty is carried forward to subsequent trials to influence decision-making. However, the representation of uncertainty that is carried forward need not be a perfect reflection of the actual stimulus evidence used to perform the task. Our results support a framework in which a suboptimal readout of sensory evidence forms the basis of subjective confidence judgments and gets carried forward to alter future decision behavior.

*Cognition*, 146, 377–386.

*Journal of Neuroscience*, 37, 4381–4390.

*Journal of Vision*, 16 (11): 4, 1–12, https://doi.org/10.1167/16.11.4. [PubMed] [Article]

*Journal of Vision*, 9 (10): 7, 1–11, https://doi.org/ 10.1167/9.10.7. [PubMed] [Article].

*Neuron*, 60, 1142–1152.

*Scientific Reports*, 7, 14739.

*Vision Research*, 39, 257–269.

*Journal of Neuroscience*, 2189–17.

*Animal Cognition*, 3, 207–220.

*Nature Reviews Neuroscience*, 13, 51–62.

*Journal of Physiology*, 318, 413–427.

*Psychological Science*, https://doi.org/10.1177/0956797617744771.

*Statistical Science*, 11, 189–212.

*Nature*, 415, 429–433.

*Nature Neuroscience*, 17, 738–743.

*Psychological Review*, 124, 91–114.

*Science*, 329, 1541–1543.

*Psychological Science*, 29, 437–446.

*Current Biology*, 27, 590–595.

*Journal of Vision*, 14 (7): 9, 1–16, https://doi.org/10.1167/14.7.9. [PubMed] [Article]

*Nature Neuroscience*, 14, 933–939.

*Cerebral Cortex*, 26 (1), 118–130.

*Visual Neuroscience*, 9, 181–197.

*The Journal of Problem Solving*, 7 (1). Available at https://docs.lib.purdue.edu/jps/vol7/iss1/2.

*Nature*, 455, 227–231.

*Neuron*, 84, 1329–1342.

*Science*, 324, 759–764.

*Trends in Cognitive Sciences*, 21, 493–497.

*Attention, Perception, & Psychophysics*, 77, 1295–1306.

*Current Biology*, 24, 2569–2574.

*Nature Neuroscience*, 9, 1432–1438.

*Attention, Perception, & Psychophysics*, 78 (3), 923–937.

*Neuron*, 88, 78–92.

*Tutorials in Quantitative Methods for Psychology*, 4, 61–64.

*Proceedings of the National Academy of Sciences, USA*, 115 (7), E1588–E1597.

*Nature Human Behavior*, 1, 0139. Available at http://europepmc.org/abstract/med/29130070

*Nature Reviews Neuroscience*, 1, 125–132.

*Nature Neuroscience*, 19, 366–374.

*Behavioral and Brain Sciences*, 2018, 1–107.

*Attention, Perception, & Psychophysics*, 80 (1), 134–154.

*Psychonomic Bulletin Review*, 16, 225–237.

*Frontiers in Psychology*, 6, 604. Available at http://journal.frontiersin.org/article/10.3389/fpsyg.2015.00604/abstract

*Frontiers in Psychology*, 7, 851.

*Consciousness and Cognition*, 54, 47–55.

*Proceedings of the Royal Society, B: Biological Sciences*, 284 (1867).

*Neuron*, 90, 499–506.

*Consciousness and Cognition*, 20, 1787–1792.

*Journal of Vision*, 18 (7): 4, 1–24, https://doi.org/10.1167/18.7.4. [PubMed] [Article]

*The Journal of the Acoustical Society of America*, 41, 782–787.

*Current Biology*, 26 (23), 3157–3168. Available at http://www.sciencedirect.com/science/article/pii/

*Journal of Vision*, 17 (10): 590, https://doi.org/10.1167/17.10.590. [Abstract]

*Conscious Cognition*, 22, 264–271.

*Psychonomic Bulletin Review*, 19, 1057–1064.

*Philosophical Transactions of the Royal Society of London, B: Biological Science*, 367, 1310–1321.

*Frontiers in Integrative Neuroscience*, 6, 79. Available at: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3448113/

*eLife*, 5, e17688.

^{1}In Experiment 1 of Samaha and Postle (2017), a similar stimulus manipulation was used but under slightly different task conditions. In this experiment, halving signal and noise contrast led to a decrease in accuracy as well as confidence. For future work with this paradigm, we recommend individually staircasing both high and low PE stimuli to ensure that accuracy is matched, rather than hoping for matched accuracy after staircasing the high PE condition and halving the signal and noise contrasts to form the low PE condition, as was done here and in Samaha and Postle (2017).