Table 2 summarizes the fittings to individual participants’ data (
Supplementary Material 2 also reports fittings to aggregated data). As is usual in individual fittings, there were quite a few cases where the estimation did not converge. The logistic models showed some advantage over the gaussian models in terms of the number of converged cases. Because the number of converged cases differed across the models, we could not simply calculate summed information criteria over individual cases for model comparisons. Thus, across the total of 5160 individual cases, we examined the number of cases for which each model declared the best AIC/BIC fit. Models that failed to converge were disqualified for each case, and no winner was declared in those cases where all four models did not converge.
The analysis adjudicated the type 1 logistic model as a clear victor, presumably because it naturally captures curvilinearity in zROCs without incorporating the extra meta-dʹ parameter (
Table 3). The type 1 SDT models were generally favored over the meta-SDT models, suggesting that the occurrence of metacognitive inefficiency may not be taken as much for granted as previously considered. This means that some of those previous reports from the gaussian perspective, originally constituting supporting evidence for metacognitive inefficiency, could be better explained by the type 1 logistic model postulating flawless metacognition (m-ratio = 1). However, it should be noted that the inclusion of extra parameters is difficult to be justified in individual analyses due to the matter of statistical power. There was no clear-cut winner in the comparison of the gaussian and logistic meta-SDT models, indicating that the logistic meta-SDT is no less viable than the gaussian meta-SDT as a measurement model of metacognitive accuracy.
Next, we would evaluate parameter estimates on those cases in which each of the models converged and revealed above-chance type 1 and type 2 performances (
Table 2). Paired
t-tests showed that mean meta-dʹ was significantly smaller than mean dʹ under the gaussian meta-SDT (
t (3817) = −16.99,
p < 0.001,
Figure 5A) and the logistic meta-SDT (
t (4126) = −4.03,
p < 0.001,
Figure 5B), indicating metacognitive inefficiency on average basis. However, caution would be advised because the logistic meta-SDT indicated greater mean m-ratio than the gaussian meta-SDT (
t = 28.09,
p < 0.001,
Figure 5C), and there are even cases where the gaussian meta-SDT showed m-ratio < 1 whereas the logistic meta-SDT demonstrated m-ratio > 1 (399 of 3552 cases in
Figure 5C). These results exemplify the important consequences from the distributional assumptions. In an extreme scenario, it is even possible that the models advocate for the qualitatively opposite theoretical operations (i.e., contamination of metacognitive noise vs. acquisition of metacognitive evidence).
We found rather frequent occurrences of hyper-metacognitive sensitivity (m-ratio > 1) in
Figure 5. A Fisher's exact test showed that this observation was more frequent under the logistic than gaussian meta-SDTs (1873 of 4127 vs. 1412 of 3818 cases,
p < 0.001). Despite these differences, however, m-ratios estimated by these models were highly consistent across the individual cases (Pearson's
r = 0.942), ensuring the models’ reliability in the assessment of metacognitive accuracy (
Figure 5C).
Lastly, we have examined criterion-dependency of metacognitive accuracy by estimating meta-dʹ parameters at different confidence criteria. For this purpose, we have converted multilevel confidence rating data into binary formats (i.e., high vs. low confidence) by making dichotomous cutoffs at different confidence criteria (
Rahnev, 2021;
Shekhar & Rahnev, 2021). To illustrate this, let us consider a response frequency dataset of (2, 3, 5, 7, 11, 13), which is comprised of two type 1 response classes and three levels of confidence rating (i.e., ten S1 responses [sum of the first three] and thirty-one S2 responses [sum of the last three] are bounded by lower and higher confidence criteria, constituting a sequence of response frequencies from highest confidence S1 to highest confidence S2). This dataset will be (5, 5, 7, 24) by making a binary cutoff at the lower confidence criterion while cutting off at the higher confidence criterion gives (2, 8, 18, 13). Thus a dataset of n levels confidence rating allows for n-1 different binary cutoffs, and we have estimated meta-dʹ for each of the reformatted datasets; speaking of the current example, meta-dʹ estimated on the dataset (5, 5, 7, 24) represents metacognitive accuracy at the lower confidence criterion, whereas that on (2, 8, 18, 13) indicates metacognitive accuracy at the higher confidence criterion.
The binary reformatting yielded a total of 34975 datasets, and both the gaussian and logistic meta-SDTs converged in 30512 cases. Among those, we have included 26927 cases in the present analysis for which both the models exhibited above-chance type 1 and type 2 performances. Because different levels of confidence rating were employed across studies, we have normalized confidence criteria for the subsequent analysis. For example, regarding confidence data of 6 levels, we have estimated meta-dʹ for 5 different binary reformatted data, and then confidence criteria associated with these meta-dʹ values are numbered as [1, 2, 3, 4, 5] from lowest to highest. Next, we have normalized these ordinal numbers so that the lowest (and highest) criterion would be coded as −1 (and 1) with intermediate criteria evenly spaced between them (i.e., confidence criteria coded as [1, 2, 3, 4, 5] were normalized into [−1, −0.5, 0, 0.5, 1]). Last, metacognitive performances were regressed by the normalized ordinal criteria, which captures rather qualitative trends of criterion-dependency; the estimated slope is scaled to indicate the performance difference for middle and extreme levels of confidence.
4
We conducted linear regression to explain metacognitive performances from the normalized ordinal criteria and tested if the slope is significantly different from 0 (26927 data points are aggregated in each panel of
Figure 6). The results showed negative criterion-dependency for gaussian meta-dʹ (
t = −3921.54,
p < 0.001) and gaussian m-ratio (
t = −16.52,
p < 0.001), which is consistent with the previous report of greater metacognitive inefficiency at higher confidence criteria (
Rahnev, 2021;
Shekhar & Rahnev, 2021). Importantly, however, the logistic meta-SDT showed positive criterion-dependency for meta-dʹ (
t = 11.27,
p < 0.001) and m-ratio (
t = 38.20,
p < 0.001), indicating that metacognition is more efficient at higher confidence criteria. The reversal of the criterion dependency again showcases the serious consequence of the auxiliary modeling assumptions.