Abstract
The ability of humans to identify and reproduce short time intervals (in the region of a second) may be affected by many factors ranging from the gender and personality of the individual observer, through the attentional state, to the precise spatiotemporal structure of the stimulus. The relative roles of these very different factors are a challenge to describe and define; several methodological approaches have been used to achieve this to varying degrees of success. Here we describe and model the results of a paradigm affording not only a first-order measurement of the perceived duration of an interval but also a second-order metacognitive judgement of perceived time. This approach, we argue, expands the form of the data generally collected in duration-judgements and allows more detailed comparison of psychophysical behavior to the underlying theory. We also describe a hierarchical Bayesian measurement model that performs a quantitative analysis of the trial-by-trial data calculating the variability of the temporal estimates and the metacognitive judgments allowing direct comparison between an actual and an ideal observer. We fit the model to data collected for judgements of 750 ms (bisecting 1500 ms) and 1500 ms (bisecting 3000 ms) intervals across three stimulus modalities (visual, audio, and audiovisual). This enhanced form of data on a given interval judgement and the ability to track its progression on a trial-by-trial basis offers a way of looking at the different roles that subject-based, task-based and stimulus-based factors have on the perception of time.
In the purely empirical analysis above, the MCI accounts for the metacognitive decision and bisection data simultaneously by using the metacognitive judgment as a basis for informing the sorting of the bisection data. In this section, we have taken an alternative analytical approach by modeling the metacognitive decision directly while simultaneously modeling the bisection judgments. This model is psychological in the sense that we consider the computational decision problem faced by the observer during the bisection and metacognitive stages of each trial (
Marr, 1982) and we model each decision as a probabilistic choice with the latent parameters of decision distributions estimated in a hierarchical Bayesian manner (
Lee & Wagenmakers, 2014). The benefit of this hierarchical analysis is each observer informs the group-level distribution, which “shrinks” estimates of subject-level parameters toward the group average (
Britten et al., 2021;
Davis-Stober, Dana, & Rouder, 2018;
Davis-Stober, Dana, & Rouder, 2019;
Gelman et al., 2004) allowing for more accurate estimates at the group level. Since we estimate posterior distributions over all quantities associated with the experiment for each participant, we can show that the model can predict the observed interval bisections estimates and the derived MCI data from each condition, while unpacking this latter measure into an alternative demonstration of ideal observer behavior.
We modeled the metacognitive decision on each trial, ri, as a Bernoulli distribution with a parameter, pi, indicating the probability that interval 1 was chosen as the more accurate interval on trial i. This parameter was derived from a comparison of the perceived error in each interval as follows:
Let Δ
1i and Δ
2i be the error in intervals 1 and 2, respectively, on trial i. That is:
\begin{eqnarray*}
{\Delta _{1i}} = \left| {{I_1} - \varphi } \right|\\
{\Delta _{2i}} = \left| {{I_2} - \varphi } \right|
\end{eqnarray*}
where
I1 and
I2 are the bisections made by the participant on intervals 1 and 2, respectively, and φ indicates the subjective midpoint of the interval. The subjective midpoint of each of the visual, audio and audiovisual conditions is estimated around the subjective midpoint estimate across all conditions, which is itself assumed to be, as a reasonable starting assumption, normally distributed around the true midpoint. We model the bisection error by assuming that the observed interval estimates
I1 and
I2 are normally distributed with an observer-specific standard deviation (which is unique to each condition—modality × duration—and interval).
The model assumes that the participant chooses the option with the smaller perceived error. To implement this, we find the difference between the error in each interval: \({d_\Delta } = {\Delta _1} - {\Delta _2}\). (Here, we've suppressed indexing by trial for simplicity). If the difference is negative, then the error in the first interval is smaller, and the participant should choose the first interval. If the difference is positive, then the error in the second interval is smaller, and the participant should choose the second interval.
We assume that this choice occurs only probabilistically and that the difference between the error estimates is normally distributed around the true error difference, with a standard deviation representing the accuracy of the error estimation.
\begin{equation*}{\hat{d}_\Delta }\sim Normal\left( {{d_\Delta },\ \sigma _m^{}} \right)\end{equation*}
In the ideal observer model, the standard deviation would approach zero and consequently the representation of the error would approach the true error. Hence, the estimate of the standard deviation gives us a direct comparison to the ideal observer.
p i, which determines the Bernoulli outcome of the metacognitive decision, then equals the integral from the distribution of
\({\hat{d}_\Delta }\)between − ∞ and 0.
\begin{equation*}{p_i}\ = \mathop \int \limits_{ - \infty }^0 P\left( {{{\hat{d}}_\Delta }|\ {d_\Delta },\sigma _d^{}} \right)\end{equation*}
When the error in interval 1 is smaller than in interval 2, the distribution will have most of its area in the negative region, and hence, when variance is small,
pi will be close to 1 (see
Figure 9). When the error in interval 2 is smaller than in interval 1, the distribution will have most of its area in the positive region and
pi will be near 0 (i.e., interval 2 will be preferred). As the standard deviation approaches 0, the model will choose the interval with less error more accurately.
We estimated the standard deviations, σ
I1, σ
I2, and σ
d (along with the standard deviations for the midpoint estimates, σ
Φ, σ
Mid), hierarchically. We implemented the model in JAGS (
Plummer, 2003) with Matlab and matjags (
Steyvers, 2011) using two chains with a burn-in period of 2000 and a sampling period of 5000 samples, thinning every twentieth sample. We assumed a group-level distribution over the standard deviations for the visual, audio and audiovisual conditions. JAGS specifies the spread of the normal distribution in terms of precision (i.e.,
\( {1 / {{\sigma ^2}}}\)); the prior distributions for each precision parameter were uniform from 0 to 100. To capture the repeated-measures design, we assumed that the subject-level precision parameters were sampled from a multivariate normal distribution with a shared variance-covariance parameter and a subject-specific correlation parameter that captured how correlated the variance estimates were across the three conditions. All precision estimates were transformed to standard deviations for ease of interpretation; a smaller standard deviation indicating greater precision in the judgement.
In a nutshell, group-level standard deviation estimates for the visual, audio and audiovisual conditions measure the variability of the interval estimation and the metacognitive accuracy. Values near 0 indicate more accurate (veridical) estimates. At the group-level, these parameter estimates allow inference as to the difference between each modality and the combined audiovisual condition. The subject-level parameters indicate the performance of different individuals and allow an assessment of individual differences in bisection and metacognitive judgment accuracy. Because the estimates are posterior distributions of the parameters given the data, we can make inferences about the difference in value by directly comparing these distributions and examining their overlap. We can also use the 95% credible interval estimates as an inferential statistic. The credible interval is the region of the posterior distribution containing 95% of the posterior probability. A more flexible version of the credible interval, in that it deals with skewed distributions better, is the highest density interval (HDI), and it is this we calculate in the model (
Kruschke & Liddell, 2018).
Figure 10 shows the graphical model used to estimate parameter posteriors for each duration condition. Of note is that the latent parameter estimates are constrained by
all of the data, including the bisection estimates for each interval and the metacognitive choice for each trial and each subject from each condition. All chains showed good convergence (maximum R-hat = 1.0002;
Gelman et al., 2004).
Figure 11 shows the posterior density estimates for the group level parameters. Specifically, the top two panels for each condition show the standard deviation of the subjective interval estimation while the bottom panel (for each condition) shows the standard deviation of the normal distribution for the metacognitive decision. The metacognitive decision is driven by the standard deviation estimate shown in this lower panel along with the difference in the perceived error between interval 1 and interval 2 (i.e.,
\({\hat{d}_\Delta }\) as described above).
The interval bisection standard deviation estimates have roughly the same posterior regardless of whether the stimulus was presented visually, auditorily or as a combined audiovisual signal, though in the 1500 msec condition, the audiovisual condition had more posterior density over smaller estimates than in the visual or audio conditions alone. In the 3000 msec condition, the posterior distributions are more similar. In sum, the estimates have roughly equivalent precision between modality and duration conditions (
Corcoran et al., 2018;
Grondin, 2014;
Mitani & Kashino, 2016;
Wearden & Lejeune, 2008). It appears the benefit of having a redundant audiovisual signal (i.e., the same information from two sources) is less prominent when the duration to be estimated is longer. Comparing the posterior estimates across the top two panels, there does not appear to be any difference between the estimates for interval 1 and interval 2.
The variance of the metacognitive judgement, which determines the accuracy of the judgement, varied depending on whether the interval was 1500 msec or 3000 msec. In the former, the visual condition posterior had more density over smaller values (i.e., was closer to the ideal observer) than the audio or audiovisual conditions. In the 3000 msec condition, the pattern reversed; the audiovisual condition had larger posterior density over smaller variance estimates than the visual and audio conditions, respectively.
To ascertain whether it was necessary to allow for different condition level parameters, we fit a constrained model in which the bisection standard deviation and metacognitive standard deviation estimates were assumed to be equivalent across the A, V, and AV conditions. We compare the models based on the Deviance Information Criteria (DIC; (
Gelman et al., 2014)) which provides a measure of model fit penalized for the complexity of the model. The deviance of the posterior parameters (i.e., θ) is computed as:
D(θ) = −2log
L(
y|θ), where
L(
y|θ) is the likelihood of the data given the model parameters. DIC is given as:
\(DIC = \bar{D}( \theta ) + 2{p_D}\), where
\(\bar{D}( \theta )\) is the average of the distribution of posterior deviance and
pD = 2
var[log
L(
y|θ)]. DIC corrects average negative log likelihood by a term which accounts for model complexity; hence, smaller values of DIC are preferred. The DICs for each condition are displayed in
Table 4. As shown, the full model which allowed separate estimates for the A, V, and AV conditions was preferred in each duration condition.
Table 4. DIC values for the constrained and full models fit to each duration condition.
Table 4. DIC values for the constrained and full models fit to each duration condition.
Table 5 shows the subject-level parameters in each condition; of key interest is the subject level correlation parameter which indicates a substantial level of consistency between the precision of the interval estimates and the precision of the metacognitive decision. That is, the average posterior estimate of the correlation between the estimates for A, V, and AV is large (> 0.5) for most subjects.
Table 5. Subject level model parameters for each condition.
Table 5. Subject level model parameters for each condition.
To link the model estimates back into the empirical data more clearly,
Figure 12 plots the posterior subject-level midpoint estimates for each condition as given by the model (the data are shown in each plot as black dots with ±1 standard error bar). These estimates qualitatively follow the same pattern as the empirical averages but are now additionally constrained by the group level estimates as well as the data from both intervals, not just the subjective “best” interval. In most cases, this additional constraint quantitatively captures the data within the 95% highest-density interval of each bisection estimate.
The group-level midpoint estimates are shown in
Figure 13; although the individual bisection estimates show noticeable variability between subjects, the group estimates tend toward the veridical bisection point. In both duration conditions, the overall mean (across modalities) tends slightly toward overestimation of the bisection interval. However, this is more consistent between modalities in the longer duration condition. In the shorter duration condition, the visual modality condition tends toward underestimation and there is greater variability between conditions, much like the empirical data.
Finally, we compared the posterior predictive model estimates of the group MCI values with the mean values from the data in
Figure 7; both results are plotted in
Figure 14. That is, from the model's posterior predictions for the interval estimates and the metacognitive judgment, we derived the posterior predictions for the MCI values in the same manner as computed for the data.
The figure shows both the model and the participants have a better insight into their own performance in the short duration condition than the longer one for audio and audiovisual conditions (i.e., the posterior distribution has higher values in the A and AV conditions of the 1500 msec condition than the 3000 msec condition; 95% HDI, 1500 msec A condition = [0.33, 0.57] vs 3000 msec A condition = [0.27, 0.57]; 1500 msec AV condition = [0.53, 0.69] vs. 3000 msec AV condition = [0.34, 0.45]), the visual condition is largely similar across durations (95% HDIs, 1500 msec V condition = [0.22, 0.32] vs. 3000 msec V condition = [0.24, 0.30]). The audiovisual stimulus gives the best insight to bisection performance overall at both durations, though performance is better at the shorter duration. Although these can only be rough estimates with only five participants, it does show good agreement between the model and the empirical data, suggesting that the underlying theory of the model is a good estimate of what the participants are doing when reflecting on their performance in the interval judgement.
# Variance of precision draws
hyperPrec ∼ dunif(0, 100)
hyperSigma <- 1/sqrt(hyperPrec)
groupMidpoint ∼ dnorm(midpoint, mpprec)
conprec[k] ∼ dunif(0,100) # Condition precision
conmidpoint[k] ∼ dnorm(groupMidpoint, conprec[k]) # midpoint estimate for each condition
## Group level precision means
mprec1[k] ∼ dunif(0, 100)
logmprec1[k] <- log(mprec1[k]) # Log transform
msigma1[k] <- 1/sqrt(mprec1[k]) # convert to standard deviation
mprec2[k] ∼ dunif(0, 100)
logmprec2[k] <- log(mprec2[k])
msigma2[k] <- 1/sqrt(mprec2[k])
metaLambda[k] ∼ dunif(0, 100)
logMetaLambda[k] <- log(metaLambda[k])
metasigma[k] <- 1/sqrt(metaLambda[k])}
# subject specific correlation between conditions
# Build covariance matrix from standard deviation
Sigma[1,1,j] <- pow(hyperSigma,2) + .1
Sigma[1,2,j] <- subr[j] * pow(hyperSigma,2)
Sigma[1,3,j] <- subr[j] * pow(hyperSigma,2)
Sigma[2,1,j] <- subr[j] * pow(hyperSigma,2)
Sigma[2,2,j] <- pow(hyperSigma,2) + .1
Sigma[2,3,j] <- subr[j] * pow(hyperSigma,2)
Sigma[3,1,j] <- subr[j] * pow(hyperSigma,2)
Sigma[3,2,j] <- subr[j] * pow(hyperSigma,2)
Sigma[3,3,j] <- pow(hyperSigma,2) + .1
# convert to precision matrix
Omega[1:ncons,1:ncons,j] <- inverse(Sigma[ , ,j]) # invert matrix for dmnorm
# sample log precision estimates for A, V, and AV for
logsubmprec1[j,1:ncons] ∼ dmnorm(logmprec1[1:3], Omega[ , ,j]) # interval 1
logsubmprec2[j,1:ncons] ∼ dmnorm(logmprec2[1:3], Omega[ , ,j]) # interval 2
logSubMetaLambda[j,1:ncons] ∼ dmnorm(logMetaLambda[1:3], Omega[ , ,j]) # metacognitive decision
# possible problem is that precision samples can be negative so transform them back to I[0, ]
subprec[j,m]∼ dunif(0, 100)
submid[j,m] ∼ dnorm(conmidpoint[m], subprec[j,m]) # subjective midpoint esetimates
submprec1[j,m] <- exp(logsubmprec1[j,m]) # interval 1
submprec2[j,m] <- exp(logsubmprec2[j,m]) # interval 2
subMetaLambda[j,m] <- exp(logSubMetaLambda[j,m]) # metacognitive decision
# Cycle through each trial
# Likelihood of bisection estimate
# mprec indicates the variability of the midpoint estimate
# (smaller indicates a more accurate midpoint estimate)
i1[i] ∼ dnorm(submid[s[i], c[i]], submprec1[s[i], c[i]])
i2[i] ∼ dnorm(submid[s[i], c[i]], submprec2[s[i], c[i]])
d1[i] <- abs(i1[i] - submid[s[i], c[i]])
d2[i] <- abs(i2[i] - submid[s[i], c[i]])
# Estimated error difference
# Find probability of responding with interval 1
# Lambda indicates the variability of the error estimate
# Lower values indicate a more accurate assessment of error
p[i] <- pnorm(0, dhat[i], subMetaLambda[s[i], c[i]]) # Integrate the difference distribution up to 0
# Likelihood of metacognitive choice
r[i] ∼ dbern(p[i]) # Each metacognitive choice is a Bernoulli variable with a probability of selecting interval 1 specified by p