For these simulations, we focused on comparing MVUE and MS, as these models can be distinguished based upon both the integrated cues percept and its precision. Observers were simulated as having access from two cues (
\({\hat S_A}\) and
\({\hat S_B}\)) from which to make an integrated cues perceptual estimate (
\({\hat S_C}\)). These cues were in conflict such that
SA = 55 + Δ/2 and
SB = 55 − Δ/2 (in separate experiments, Δ was either 3 or 6 mm).
\({\hat S_A}\) always had the same sigma σ
A = 4.86, which is approximately that of the haptic cue in
Ernst and Banks (2002), whereas
\({\hat S_B}\) had a randomly determined sigma of σ
B = σ
Ar where, consistent with the recommendations of
Rhode et al. (2016),
r ∈ [0.5, 2]. To select values with equal probability between these limits, for each observer we generated a random number x
i ∈ [ − 1, 1], and set
\(r = {2^{{x_i}}}\). Separate simulations were run with 4, 12, and 36 observers per simulated experiment, and for 10, 25, 40, and 55 trials per stimulus level. For each combination of (a) data collection regime, (b) number of observers per experiment, and (c) cue conflict (4 × 3 × 2), we simulated 1000 experiments (i.e. 32,000 experiments with 416,000 observers in total).
With a heterogenous population of observers the relationship between predicted and observed data are often compared using a linear regression analysis. For example,
Burge, Girshick, and Banks (2010) examined the perception of slant from disparity and haptic cues and reported an R
2 of 0.60 (significance not stated) for predicted versus observed integrated cues sensitivity.
Knill and Saunders (2003) also examined the perception of slant, but from disparity and texture cues, and reported R
2 values between around 0.15 and 0.46 (
p < 0.05) for the predicted and observed cue weighting for different base slants.
Svarverud et al. (2010) examined “texture-based” and “physical-based” cues to distance and reported R
2 values of about 0.95 (
p < 0.001) for predicted and observed cue weights. The median R
2 value in these studies is 0.53 and in all instances the authors concluded that observers were combining cues optimally in accordance with MVUE. Following these studies, a regression analysis was adopted here.
For each experiment, the data from the population of observers behaving in accordance with either MVUE or MS were plotted against the predictions of each of the two candidate models. Data were fit with a first order polynomial by least squares and an R2 value for the fit of each model to the data calculated. Thus, there were four possible regression comparisons: (1) “MVUE versus MVUE” – predictions of MVUE, plotted against data from a population of observers behaving in accordance with MVUE; (2) “MS versus MS” – predictions of MS, plotted against the behavior of a population of observers behaving in accordance MS; (3) “MVUE versus MS” – predictions of the MVUE model, plotted against the data of a population of observers behaving in accordance with MS; and (4) “MS versus MVUE” – predictions of the MS model, plotted against the data of a population of observers behaving in accordance with MVUE. We will refer to (1) and (2) as “consistent” predicted and observed data as the simulated data and predictions are from the same model, and (3) and (4) as “inconsistent” predicted and observed data as the simulated data and predictions arise from different models.
A set of example data (PSE and sigma) from 36 observers behaving in accordance with MVUE (with 55 samples per stimulus value and a delta of 3 mm) is shown in
Figure 8a to d for the “MVUE versus MVUE” and “MS versus MVUE” comparisons. This example represents the upper limit of observers and data collection in a typical cue combination experiment (
Kingdom & Prins, 2016;
Rohde et al., 2016).
Figure 8a and b plot the PSE data from the MVUE observers against the experimentally derived predictions of the two candidate models, with the green and red dashed lines show the true underlying PSE for each cue.
Figure 8c and d plot the observed sigma data from the MVUE observers against the experimentally derived predictions of the two candidate models, here, the dashed red line shows the fixed sigma of cue A and the green dashed line the minimum possible sigma for cue B.
What is most striking from this example is that the observed R
2 values for both PSE's and sigmas are directly comparable to those found in the literature (and even better) regardless of whether the data from a population of MVUE observers were fitted with a regression against the predictions of either MVUE or MS.
Figure 8e and f shows histograms of the observed R
2 values for the same example, but across all 1000 simulated experiments. The raw histograms are shown overlaid with smooth kernel distributions, given by
\begin{equation}{\hat F_X}\left( x \right) = \frac{1}{{nh}}\mathop \sum \nolimits_{i = 1}^n {\cal K}\left( {\frac{{x - {x_i}}}{h}} \right)\end{equation}
Here,
\(\;{\cal K}\) is a Gaussian kernel function, x
i ∈ [0, 1] (i.e. the domain of the R
2 value is 0 to 1), and
\({\hat F_X}\) is the estimate of the unknown probability density function
Fx. The key parameter of interest is the extent to which these distributions overlap, as this determines the extent to which an R
2 value from fitting predicted to observer data can be used to distinguish between candidate models of cue integration. The overlap of two smooth kernel distributions
\({\hat F_X}\) and
\({\hat F_Y}\) can be estimated via numerical integration (
Pastore & Calcagni, 2019)
\begin{equation}\hat \eta \left( {X,Y} \right) = \mathop \smallint \limits_1^0 min\left( {{{\hat F}_X}\left( z \right),{{\hat F}_Y}\left( z \right)} \right)dz\end{equation}
Numerically the overlap value lays between 0 (no overlap) and 1 (full overlap). This is shown inset in
Figures 8e and f. As can be seen there is substantial overlap in the distribution of R
2 values, especially so for the predicted and observed PSEs.
Data across all comparisons for both PSE and sigma are shown in
Figures 9,
10, and
11. As one would expect, with more data collected per function and more observers per experiment the
R2 values improve, with a maximal median of approximately 0.7 to 0.8. Problematically, this pattern is present regardless of whether one is plotting consistent predicted and observed data (MVUE versus MVUE and MS versus MS), or inconsistent predicted and observed data (MVUE versus MS and MS versus MVUE). Across all plots, there is the large overlap in the distributions of
R2 values when plotting “consistent” and “inconsistent” predicted and observed data. With fewer observers per experiment (4 and 12 versus 36) the overlap increases greatly, to the extent that with four observers per experiment the data have near complete overlap.
Figure 12 shows the overlap (
Equation 10) for the distributions where a population of observers behaving in accordance with MVUE or MS were compared to the experimentally derived predictions of MVUE and MS. As expected, (1) the distribution overlap decreases with increasing amounts of data collected per function, (2) for the PSE distributions, the distribution overlap is less with a Δ of 6 mm versus 3 mm, and (3) the delta magnitude has no effect on the overlap of the sigma distributions. Problematically the distribution overlap is greater than 50% for virtually all conditions. This strongly questions one's ability to use
R2 to assess the extent to which a set of data are consistent with the predictions of MVUE. The precise amount of quantitative overlap acceptable for an experiment would be a judgment on the part of the experimenter.
An additional problem is that the R2 statistic that experimenters report does not measure the deviation of the data from the predictions of a cue integration model (even though it is often stated in this way), rather, the R2 statistic gives a measure of the fit of the polynomial. The predicted values of a cue integration model could be off by any arbitrary amount or have the opposite relationship between predictions and data, and experimenters could still obtain an R2 close to 1. Thus, a regression analysis negates one of the key benefits of MVUE (and other cue integration models), which is the ability to predict the absolute value of the integrated cues percept and its reliability and then compare this to that observed experimentally. Tests do exist to determine whether the intercept and slope differ from predicted model values, but these are rarely reported and are definitively not shown by the R2 statistic alone.