Our ability to estimate the duration of subsecond visual events is prone to distortions, which depend on both sensory and decisional factors. To disambiguate between these two influences, we can look at the alignment between discrimination estimates of duration at the point of subjective equality and confidence estimates when the confidence about decisions is minimal, because observers should be maximally uncertain when two stimuli are perceptually the same. Here, we used this approach to investigate the relationship between the speed of a visual stimulus and its perceived duration. Participants were required to compare two intervals, report which had the longer duration, and then rate their confidence in that judgment. One of the intervals contained a stimulus drifting at a constant speed, whereas the stimulus embedded in the other interval could be stationary, linearly accelerating or decelerating, or drifting at the same speed. Discrimination estimates revealed duration compression for the stationary stimuli and, to a lesser degree, for the accelerating and decelerating stimuli. Confidence showed a similar pattern, but, overall, the confidence estimates were shifted more toward higher durations, pointing to a small contribution of decisional processes. A simple observer model, which assumes that both judgments are based on the same sensory information, captured well inter-individual differences in the criterion used to form a confidence judgment.

*R*

^{2}for either the duration PSE or the PMC estimate was lower than 0.15; and if the proportion of low confidence responses for both the shortest and the longest interval was larger than 0.5. Third, we ran an outlier analysis, and participants were excluded if the PSE, the PMC, the just noticeable difference (JND), or the full width at half height (FWHH) differed by more than three scaled median absolute deviations (MADs) from the group median in any given condition. Finally, participants were excluded if the mean difference between the mean duration of the video playback in Gorilla (averaged across all trials) and the actual mean video duration (averaged across all video durations) exceeded ±10% of the actual mean video duration (which was 2300 ms for both Experiments 1 and 2).

*SD*, 31.63 ± 12.33 years; mean playback error [video playback duration – actual video duration], −8.61 ± 41.06 ms). Experiment 2 included 53 participants out of 106 initially recruited (35 identifying as female, 18 as male; mean age, 29.92 ± 8.09 years; mean playback error, −19.69 ± 17.72 ms).

*F*= 0.2) with an alpha error probability of 0.05 and a power of 0.8. Our estimate of the correlation between repeated measures was based on what we observed in Experiment 1 (average Pearson's

*r*= 0.25).

*t*-tests. The description of the model was not pre-registered, either.

*p*= 0.05/3 = 0.0167) and of the paired-sample

*t*-tests between the PSEs and the PMCs for each of the four speed conditions (

*p*= 0.05/4 = 0.0125), as well as those between the PSE and PMC changes in the stationary, accelerating, and decelerating conditions relative to the drifting condition (

*p*= 0.05/3 = 0.0167). We ran separate one-way ANOVAs to test the effect of stimulus type on our precision measures (i.e., JND and FWHH) and on the peak heights of the confidence functions (i.e., curve height at PMC). The

*F*statistic of the Brown–Forsythe test was reported when the assumption of homogeneity of variances was violated.

*F*(1.916, 61.328) = 60.01,

*p*< 0.0001. As expected, the analysis of the PSEs showed that the largest duration compression (∼25% reduction relative to the actual standard duration, 500 ms) was observed when the standard interval contained a stationary stimulus (mean PSE ±

*SD*, 376.18 ± 77.93 ms), whereas the bias was equally smaller for the accelerating (449.38 ± 31.34 ms) and decelerating (466.33 ± 39.62 ms) conditions (Bonferroni-corrected

*p*= 0.0167). For planned contrasts: decelerating < drifting,

*t*(60.41) = 6.34,

*p*< 0.0001; accelerating < decelerating,

*t*(60.77) = 1.93,

*p*= 0.059; and stationary < accelerating,

*t*(42.09) = 5.01,

*p*< 0.0001.

*F*(1, 32) = 8.001,

*p*= 0.008, indicated that, overall, the mean PMCs were shifted toward higher values relative to the PSEs. This difference was not significantly different across the four speed profiles (interaction speed condition × judgment type),

*F*(1.83, 58.67) = 2.04,

*p*= 0.143. However, the mean PMCs were modulated by the speed profile in the same fashion as the mean PSEs: The peaks were associated with the shortest durations in the stationary condition (mean PMC, 406.53 ± 74.21 ms), whereas those in the accelerating condition (465.35 ± 41.47 ms) and decelerating condition (491.13 ± 63.54 ms) were associated with longer durations than in the stationary condition but shorter than in the drifting condition (Bonferroni-corrected

*p*= 0.0167). For planned contrasts: decelerating < drifting,

*t*(52.93) = 2.67,

*p*= 0.01; accelerating < decelerating,

*t*(55.07) = 1.95,

*p*= 0.056; and stationary < accelerating,

*t*(50.21) = 3.97,

*p*< 0.0001. Also, none of the comparisons between PSE and PMC across conditions turned out to be statistically significant after correcting for multiple comparisons (Bonferroni-corrected

*p*= 0.0124). For paired-samples

*t*-tests: drifting,

*t*(32) = 0.51,

*p*= 0.613; stationary,

*t*(32) = 2.57,

*p*= 0.015; accelerating:

*t*(32) = 2.13,

*p*= 0.041; and decelerating,

*t*(32) = 2.30,

*p*= 0.028.

*F*(1.65, 52.92) = 33.92,

*p*< 0.0001, and judgment type,

*F*(1, 32) = 8.01,

*p*= 0.008, but no significant interaction,

*F*(1.5, 47.89) = 0.664,

*p*= 0.478. Also, none of the comparisons between PSE and PMC changes relative to the drifting condition reached statistical significance after correcting for multiple comparisons (Bonferroni-corrected

*p*= 0.0167. For paired-samples

*t*-tests: stationary duration change,

*t*(32) = 1.95,

*p*= 0.06; accelerating duration change,

*t*(32) = 1.65,

*p*= 0.108; and decelerating duration change,

*t*(32) = 2.52,

*p*= 0.017.

*F*(3, 128) = 0.86,

*p*= 0.466. Similarly, no difference was detected for the FWHHs for the confidence judgments (Figure 3B, red symbols and box plots),

*F*(3, 128) = 1.42,

*p*= 0.239, indicating that the different speed profiles of our stimuli had comparable effects on the precision of both the discrimination and confidence judgments. The proportion of “low confidence” responses peaked at values that were substantially smaller than 1 (mean peak heights: drifting, 0.75 ± 2.46; stationary, 0.71 ± 0.21; accelerating, 0.75± 0.21; decelerating, 0.74 ± 0.21), implying that participants were generally overconfident even when their performance was at chance. This tendency was not influenced by the speed profile of the test stimuli,

*F*(3, 128) = 0.21,

*p*= 0.891.

*t*-test against 500,

*t*(32) = 4.05,

*p*< 0.001, and confidence judgments, with mean PMC = 525.72 ± 38.78,

*t*(32) = 3.81, and

*p*= 0.001. We thought this unexpected bias might have been due to two factors: first, the blocked presentation, and, second, the narrow duration range. If we look at the panel corresponding to the drifting condition in Figure 2a, we can see that, when standard and comparison intervals had the same duration, participants were at chance, as expected, but they underestimated the duration of the longer intervals (especially 800 ms), and this shifted the PSE toward higher values.

*x*and

*y*directions from the fitted line; that is, it minimizes the sum-of-squared orthogonal deviations. It also produces confidence interval estimates for the slope and the intercept of the orthogonal fit, which can be used to test whether the two parameters are significantly different from 1 and 0, respectively, indicating a deviation from a perfect linear correlation between the two measures. In addition, we determined the Bayes factor, which gave us the amount of evidence favoring the reduced model (with slope fixed to 1 and intercept fixed to 0) over the orthogonal model given the data. To calculate the Bayes factor, we used the large sample approximation method (Burnham & Anderson, 2004). A similar application of this method can be found, for example, in Schütz, Kerzel, and Souto (2014). We first determined the Bayesian information criterion (BIC) (Schwarz, 1978) for both methods:

*n*corresponds to the number of participants,

*RSS*is the residual sum of squares, and

*k*is the number of free parameters (0 for the reduced model and 2 for the orthogonal model). Then, for each model

*i*, we determined the posterior probability

*p*:

*BIC*is the difference, for each model, between the

*BIC*for that model and the lower

*BIC*between the two models (i.e., the ∆

*BIC*for the minimum

*BIC*model is 0). Finally, the Bayes factor was calculated as the ratio between the two posterior probabilities:

*F*(1.38, 71.71) = 74.34,

*p*< 0.0001, with some slight differences in the amount of shift predicted by the two measures. Perceived duration estimates provided by the PSEs revealed a very strong compression (∼35%) for the stationary standard (mean PSE, 315.67 ± 127.94), whereas milder compression was observed for accelerating standard (462.18 ± 47.1) and the decelerating standard (454.44 ± 42.71; Bonferroni-corrected

*p*= 0.0167). For planned contrasts: decelerating < drifting,

*t*(95.94) = 8.81,

*p*< 0.0001; accelerating < decelerating,

*t*(103.02) = –0.89,

*p*= 0.377; and stationary < accelerating,

*t*(65.84) = 7.82,

*p*< 0.0001.

*F*(1, 52) = 38.59,

*p*< 0.0001—and the magnitude of this difference varied across conditions (interaction speed condition × judgment type),

*F*(3,16) = 5.26,

*p*= 0.002. In fact, although, as for the PSEs, the amount of duration compression estimated by the PMCs was maximal in the stationary condition (mean PMC, 349.51 ± 124.19) and equally milder in the accelerating condition (472.4 ± 38.11) and decelerating condition (475.59 ± 47.92; Bonferroni-corrected

*p*= 0.0167). For planned contrasts: decelerating < drifting,

*t*(96.78) = 5.82,

*p*< 0.0001; accelerating < decelerating,

*t*(98.99) = 0.38,

*p*= 0.706; and stationary < accelerating,

*t*(61.71) = 6.89,

*p*< 0.0001. Comparisons between the two measures were significant only in the stationary and decelerating conditions after correcting for multiple comparisons (Bonferroni-corrected

*p*= 0.0125). For paired-samples

*t*-tests, drifting,

*t*(52) = 0.99,

*p*= 0.327; stationary,

*t*(52) = 5.35,

*p*< 0.0001; accelerating,

*t*(52) = 1.72,

*p*= 0.092; and decelerating,

*t*(52) = 4.09,

*p*< 0.0001.

*F*(1.24, 64.41) = 54.84,

*p*< 0.0001, and judgment type,

*F*(1, 52) = 9.54,

*p*= 0.003, as well as for the interaction between these two factors,

*F*(2, 104) = 3.91,

*p*= 0.023. Only for the stationary condition did the comparison between PSE and PMC changes reach statistical significance after correcting for multiple comparisons (Bonferroni-corrected

*p*= 0.0167. For paired-samples

*t*-tests, stationary duration change,

*t*(52) = 3.73,

*p*< 0.0001; accelerating duration change,

*t*(52) = 0.73,

*p*= 0.472; and decelerating duration change,

*t*(52) = 2.43,

*p*= 0.019.

*r*> 0.45, all

*p*< 0.0001). The orthogonal fits (Figure 6a, blue lines) showed positive correlations that are not perfect. In fact, the 95% confidence intervals derived from the orthogonal regression (Figure 6b) crossed both the 1 line for the slope and the 0 line for the intercept only for the drifting and decelerating conditions. Furthermore, we determined the Bayes factor, which provides the amount of evidence supporting the null hypothesis, which here indicated that the data could be better fitted by a reduced model with fixed slope = 1 and fixed intercept = 0 (indicating a perfect correlation between the two estimates), over the alternative hypothesis that an orthogonal model with free-to-vary slope and intercept should be favored. A Bayes factor analysis provided strong and moderate evidence for the null hypothesis for the drifting (BF

_{10}= 26.03) and accelerating (BF

_{10}= 8.05) conditions, respectively, indicating that in those conditions the equality line was the best fitting model. However, there was moderate and anecdotal evidence favoring the alternative hypothesis for the stationary (BF

_{10}= 0.137) and decelerating (BF

_{10}= 0.8857) conditions, respectively, implying that the two estimates were not perfectly correlated.

*F*(3, 208) = 0.41,

*p*= 0.742, and the FWHHs for the confidence judgments,

*F*(3, 155.4) = 0.17,

*p*= 0.918, did not differ across conditions. Also, the feedback during training did not improve the calibration of participants’ confidence, as the peak heights were still substantially smaller than 1 (mean peak heights: drifting, 0.71 ± 1.79; stationary, 0.64 ± 0.19; accelerating, 0.7± 0.18; decelerating, 0.67 ± 0.19), indicating overconfidence when their performance was at chance. The amount of overconfidence did not change across speed conditions,

*F*(3, 208) = 1.41,

*p*= 0.242.

*R*

^{2}> 0.9991). For all of the speed conditions, the Bayes factor was 0, indicating extremely large evidence against the reduced model. For the confidence judgments, between 75% and 80% of the variance in the real data was captured by the simulation. Bayes factors revealed anecdotal evidence supporting the reduced model in the stationary condition (BF

_{10}= 1.12), whereas it provided strong to very strong evidence in favor of the alternative model in the accelerating (BF

_{10}= 0.0545) drifting (BF

_{10}= 0) and decelerating (BF

_{10}= 0.0004) conditions. The mean confidence criterion estimate ranged from 1.3 to 1.46 across speed conditions, indicating that, on average, the difference between the comparison and standard duration signals had to be almost 1.5 times as big as the JND for our participants to report high confidence in their decision.

*R*

^{2}values: drifting, 0.996; stationary, 0.915; accelerating, 0.996; decelerating, 0.986), whereas about 70% of the variance was explained for those with criterion < 1 (

*R*

^{2}values: drifting, 0.739; stationary, 0.693; accelerating, 0.686; decelerating, 0.671). A criterion of 1 or higher meant that the difference between the standard and comparison duration signals had to be at least as big as the JND for a participant to report high confidence. In other words, those with a criterion estimate > 1 based their confidence judgment almost exclusively on the perceptual discriminability between the two test durations, as assumed by our model. Those with a criterion estimate < 1, though, reported high confidence even when the perceptual difference between the two test durations was smaller than the JND, indicating that their confidence judgment was also influenced by other factors that we did not include in our model.

*F*(1, 52) = 8.172,

*p*= 0.006 (data not shown), which might suggest that sensory noise contributed to the overconfidence effect more than the participants’ personality.

*d*′) and compared against a confidence criterion, modeled as a threshold to exceed to formulate a high confidence judgment. Note that the assumption of a separate criterion for confidence did not entail the two judgments being based on different types of sensory information, as suggested by some studies (De Martino, Fleming, Garrett, & Dolan, 2013; Fleming, Ryu, Golfinos, & Blackmon, 2014; Li, Hill, & He, 2014). In fact, our model assumed that both types of judgment are based on duration discriminability and are therefore affected by the same sensory noise.

*R*

^{2}> 0.915) in the real data for participants with a predicted criterion higher than 1 (Figure 8). This value is not arbitrary, as a criterion of 1 or higher indicates that, to have high confidence, the difference between the two duration signals (which are affected only by sensory noise, according to our model) has to be at least as large as the JND between the two durations. Therefore, participants with a criterion higher than 1 based their confidence judgments on the same sensory information they used for their discrimination judgments, and, in fact, their FWHHs were very well captured by our model. For these participants, the confidence criteria ranged between 1 and 4. This finding indicates that, as shown by Arnold et al. (2021) for tilt perception, high confidence in a perceptual decision requires a different magnitude of the same sensory information (i.e., a larger difference in duration between the two test intervals relative to the JND). Also, it shows that individual participants set their internal thresholds at different distances (in sensory units) from the JND, pointing to a tendency to be more conservative or less conservative in their confidence criterion. It is worth stressing that, even though we did not include a random component to account for this tendency, our model was still able to capture this variability. In fact, it predicted participants’ FWHHs equally well when the estimated confidence criterion was substantially larger than 1.

*d*′, which is the ratio between the signal and a combination of sensory and confidence noise, and

*M*, which is the ratio of meta-

_{ratio}*d*′ and

*d*′ (Maniscalco & Lau, 2012). They suggested that this finding supports the idea that confidence judgments are affected by independent metacognitive noise. If that is the origin of our unexplained variance, it would be interesting to investigate why only some participants are affected by this confidence noise but other participants (i.e., those with a confidence criterion > 1) do not seem to show this influence.

*Behavior Research Methods,*52(1), 388–407, https://doi.org/10.3758/s13428-019-01237-x. [PubMed]

*Proceedings of the Royal Society B: Biological Sciences,*288(1956), 20211276, https://doi.org/10.1098/rspb.2021.1276.

*Perception & Psychophysics,*55(4), 412–428, https://doi.org/10.3758/bf03205299. [PubMed]

*Perception & Psychophysics,*61(7), 1369–1383, https://doi.org/10.3758/bf03206187. [PubMed]

*Journal of Vision,*12(7):8, 1–19, https://doi.org/10.1167/12.7.8.

*Personality and Individual Differences,*38(7), 1701–1713, https://doi.org/10.1016/j.paid.2004.11.004.

*Psychological Research,*14(1), 233–248, https://doi.org/10.1007/BF00403874.

*Journal of Vision,*15(6):2, 1–18, https://doi.org/10.1167/15.6.2.

*Journal of Vision,*21(9), 2530–2530, https://doi.org/10.1167/jov.21.9.2530.

*Current Opinion in Behavioral Sciences,*8, 131–139, https://doi.org/10.1016/j.cobeha.2016.02.028. [PubMed]

*Sociological Methods & Research,*33(2), 261–304, https://doi.org/10.1177/0049124104268644.

*Journal of Vision,*21(12):8, 1–15, https://doi.org/10.1167/jov.21.12.8.

*Journal of Vision,*9(1):9, 1–13, https://doi.org/10.1167/9.1.9. [PubMed]

*Psychological Science,*25(6), 1286–1288, https://doi.org/10.1177/0956797614528956. [PubMed]

*Nature Neuroscience,*16(1), 105–110, https://doi.org/10.1038/nn.3279. [PubMed]

*Statistical Adjustment of Data*. New York: Wiley.

*Behavior Research Methods,*41(4), 1149–1160, https://doi.org/10.3758/BRM.41.4.1149. [PubMed]

*Brain,*137(pt 10), 2811–2822, https://doi.org/10.1093/brain/awu221. [PubMed]

*Scientific Reports,*9(1), 7124, https://doi.org/10.1038/s41598-019-43170-1. [PubMed]

*Attention, Perception, & Psychophysics,*83(8), 3047–3055, https://doi.org/10.3758/s13414-021-02331-z.

*Perceptual and Motor Skills,*39(1), 63–82, https://doi.org/10.2466/pms.1974.39.1.63.

*Linear Deming Regression, MATLAB Central File Exchange*. Retrieved from https://www.mathworks.com/matlabcentral/fileexchange/33484-linear-deming-regression.

*Trends in Cognitive Sciences,*1(2), 78–82, https://doi.org/10.1016/S1364-6613(97)01014-0. [PubMed]

*Attention and time*(pp. 187–200). Oxford, UK: Oxford University Press.

*The new visual neurosciences*(pp. 749–762). Cambridge, MA: MIT Press.

*Current Biology,*16(5), 472–479, https://doi.org/10.1016/j.cub.2006.01.032.

*Journal of Vision,*6(12), 1421–1430, https://doi.org/10.1167/6.12.8.

*ETS Research Report Series,*2020(1), 1–24, https://doi.org/10.1002/ets2.12298.

*Journal of Vision,*9(7):14, 1–12, https://doi.org/10.1167/9.7.14. [PubMed]

*Perceptual and Motor Skills,*39(1), 295–307, https://doi.org/10.2466/pms.1974.39.1.295.

*Journal of Neuroscience,*34(12), 4382–4395, https://doi.org/10.1523/JNEUROSCI.1820-13.2014.

*Scientific Reports,*10(1), 904, https://doi.org/10.1038/s41598-019-57204-1. [PubMed]

*Nature Human Behaviour,*5(2), 273–280, https://doi.org/10.1038/s41562-020-00953-1. [PubMed]

*Vision Research,*184, 58–73, https://doi.org/10.1016/j.visres.2021.03.003. [PubMed]

*Proceedings of the Royal Society B: Biological Sciences,*287(1927), 20200801, https://doi.org/10.1098/rspb.2020.0801.

*Psychological Review,*129(5), 976–998, https://doi.org/10.1037/rev0000312. [PubMed]

*Proceedings of the Royal Society B: Biological Sciences,*279(1730), 854–859, https://doi.org/10.1098/rspb.2011.1598.

*Cognitive Psychology,*66(3), 259–282, https://doi.org/10.1016/j.cogpsych.2013.01.001. [PubMed]

*Journal of Behavioral and Experimental Finance,*17, 22–27, https://doi.org/10.1016/j.jbef.2017.12.004.

*PLoS One,*2(11), e1264, https://doi.org/10.1371/journal.pone.0001264. [PubMed]

*Behavior Research Methods,*51(1), 195–203, https://doi.org/10.3758/s13428-018-01193-y. [PubMed]

*Acta Psychologica,*8(0), 89–128, https://doi.org/10.1016/0001-6918(51)90007-8.

*Nature Communications,*10(1), 267, https://doi.org/10.1038/s41467-018-08194-7. [PubMed]

*Perception,*42(2), 198–207, https://doi.org/10.1068/p7241. [PubMed]

*Journal of Vision,*14(5):4, 1–19, https://doi.org/10.1167/14.5.4.

*The Annals of Statistics,*6(2), 461–464.

*Psychological Review,*128(1), 45–70, https://doi.org/10.1037/rev0000249. [PubMed]

*Scientific Report,*11(1), 23312, https://doi.org/10.1038/s41598-021-02412-x.

*Perception & Psychophysics,*66(7), 1171–1189, https://doi.org/10.3758/BF03196844. [PubMed]

*Frontiers in Integrative Neuroscience,*6, 79, https://doi.org/10.3389/fnint.2012.00079. [PubMed]