Sensory cue integration is one of the primary areas in which a normative mathematical framework has been used to define the “optimal” way in which to make decisions based upon ambiguous sensory information and compare these predictions to behavior. The conclusion from such studies is that sensory cues are integrated in a statistically optimal fashion. However, numerous alternative computational frameworks exist by which sensory cues could be integrated, many of which could be described as “optimal” based on different criteria. Existing studies rarely assess the evidence relative to different candidate models, resulting in an inability to conclude that sensory cues are integrated according to the experimenter's preferred framework. The aims of the present paper are to summarize and highlight the implicit assumptions rarely acknowledged in testing models of sensory cue integration, as well as to introduce an unbiased and principled method by which to determine, for a given experimental design, the probability with which a population of observers behaving in accordance with one model of sensory integration can be distinguished from the predictions of a set of alternative models.

*w*=

_{A}*r*/(

_{A}*r*+

_{A}*r*) and

_{B}*w*=

_{B}*r*/(

_{B}*r*+

_{A}*r*) and the standard deviation (sigma) of the Gaussian probability density function representing the integrated cues estimator is given by

_{B}*cannot conclude that optimal integration occurred*…” (Rohde et al., 2016, p. 10. Note: equation numbers have been changed to correspond to the equivalent equations in the current paper and italics added.).

*p*value” to a result is clearly not the only way in which to make inferences from data (and can be highly problematic; Kruschke, 2010, 2011). It is acknowledged that more rigorous methodologies are required to distinguish between competing cue integration models (Acerbi et al., 2018).

*p*and

_{A}*p*(where

_{B}*p*+

_{A}*p*= 1). The mean and sigma of the integrated cues estimator is given by

_{B}*p*=

_{A}*w*and

_{A}*p*=

_{B}*w*. In other words, for the mean of the integrated cues estimator, a model in which cues are not integrated and instead used completely independently can produce identical predictions to MVUE. Throughout the paper where PCS is modeled, we have set

_{B}*p*=

_{A}*w*and

_{A}*p*=

_{B}*w*, so as to be consistent with previous research where these parameters are estimated and modelled (e.g. Byrne & Henriques, 2013). However, it is true that in reality

_{B}*p*,

_{A}*p*,

_{B}*w*, and

_{A}*w*could be determined by more than simply the relative reliability of cues as measured with a 2AFC forced choice experiment that experimenters typically adopt to estimate these parameters (see Jacobs, 2002 for an extended discussion).

_{B}_{A}= 4.86), which is approximately that of the haptic cue in Ernst and Banks (2002). Cue B had a sigma given by σ

_{B}= σ

_{A}

*r*where

*r*varied between 1 and 4 in 27 linearly spaced steps. Although it has been suggested that to test for optimal cue integration, the sigma ratio should be no larger than 2 (Rohde et al., 2016, p. 15), it is evident that experimenters go beyond this reliability ratio (see Figure 3). Thus, in the simulations presented we go beyond a ratio of 2 to be consistent with the experimental literature. For each reliability ratio, we simulated experiments where there were 4 through 30 (in steps of 1) participants. Cue integration experiments normally have few observers per experiment, but a substantial amount of data collected per observer (Rohde et al., 2016). For example, Ernst and Banks (2002) and Hillis, Watt, Landy and Banks (2004), each used four observers. Our highest observer number therefore represents an upper limit to the observers one might reasonably expect to see in a cue integration study.

*S*= 55 + Δ/2 and

_{A}*S*= 55 − Δ/2. Estimated from the data of Ernst and Banks (2002), the above zero conflicts represented approximately 0.8 and 0.4 JNDs, which is around the recommended magnitude of cue conflict to use in a perturbation analysis (Rohde et al., 2016). In Ernst and Banks (2002) there were conditions with equal and opposite cue conflicts applied in order avoid perceptual adaptation. We did not replicate this here as our simulated observers have no mechanisms of adaptation.

_{B}*p*=

_{A}*w*and

_{A}*p*=

_{B}*w*. Dividing sigma's by \(\sqrt 2 \), as in a typical two interval forced choice procedure (Green & Swets, 1974), was not needed as the functions were parametrically simulated. For each simulated experiment, the MVUE data were entered into a one-sample within-subjects

_{B}*t*-test and compared to the point predictions of MS and PCS. The mean value of the alternative model prediction across observers was taken as the point prediction for each model.

*p*< 0.05 level for “statistical significance”.

*S*= 55 + Δ/2 and

_{A}*S*= 55 − Δ/2 (in separate experiments, Δ was either 3 or 6 mm). \({\hat S_A}\) always had the same sigma σ

_{B}_{A}= 4.86, which is approximately that of the haptic cue in Ernst and Banks (2002), whereas \({\hat S_B}\) had a randomly determined sigma of σ

_{B}= σ

_{A}

*r*where, consistent with the recommendations of Rhode et al. (2016),

*r*∈ [0.5, 2]. To select values with equal probability between these limits, for each observer we generated a random number x

_{i}∈ [ − 1, 1], and set \(r = {2^{{x_i}}}\). Separate simulations were run with 4, 12, and 36 observers per simulated experiment, and for 10, 25, 40, and 55 trials per stimulus level. For each combination of (a) data collection regime, (b) number of observers per experiment, and (c) cue conflict (4 × 3 × 2), we simulated 1000 experiments (i.e. 32,000 experiments with 416,000 observers in total).

^{2}of 0.60 (significance not stated) for predicted versus observed integrated cues sensitivity. Knill and Saunders (2003) also examined the perception of slant, but from disparity and texture cues, and reported R

^{2}values between around 0.15 and 0.46 (

*p*< 0.05) for the predicted and observed cue weighting for different base slants. Svarverud et al. (2010) examined “texture-based” and “physical-based” cues to distance and reported R

^{2}values of about 0.95 (

*p*< 0.001) for predicted and observed cue weights. The median R

^{2}value in these studies is 0.53 and in all instances the authors concluded that observers were combining cues optimally in accordance with MVUE. Following these studies, a regression analysis was adopted here.

*either*MVUE or MS were plotted against the predictions of each of the two candidate models. Data were fit with a first order polynomial by least squares and an R

^{2}value for the fit of each model to the data calculated. Thus, there were four possible regression comparisons: (1) “MVUE versus MVUE” – predictions of MVUE, plotted against data from a population of observers behaving in accordance with MVUE; (2) “MS versus MS” – predictions of MS, plotted against the behavior of a population of observers behaving in accordance MS; (3) “MVUE versus MS” – predictions of the MVUE model, plotted against the data of a population of observers behaving in accordance with MS; and (4) “MS versus MVUE” – predictions of the MS model, plotted against the data of a population of observers behaving in accordance with MVUE. We will refer to (1) and (2) as “consistent” predicted and observed data as the simulated data and predictions are from the same model, and (3) and (4) as “inconsistent” predicted and observed data as the simulated data and predictions arise from different models.

^{2}values for both PSE's and sigmas are directly comparable to those found in the literature (and even better) regardless of whether the data from a population of MVUE observers were fitted with a regression against the predictions of either MVUE or MS. Figure 8e and f shows histograms of the observed R

^{2}values for the same example, but across all 1000 simulated experiments. The raw histograms are shown overlaid with smooth kernel distributions, given by

_{i}∈ [0, 1] (i.e. the domain of the R

^{2}value is 0 to 1), and \({\hat F_X}\) is the estimate of the unknown probability density function

*F*. The key parameter of interest is the extent to which these distributions overlap, as this determines the extent to which an R

_{x}^{2}value from fitting predicted to observer data can be used to distinguish between candidate models of cue integration. The overlap of two smooth kernel distributions \({\hat F_X}\) and \({\hat F_Y}\) can be estimated via numerical integration (Pastore & Calcagni, 2019)

^{2}values, especially so for the predicted and observed PSEs.

*R*

^{2}values improve, with a maximal median of approximately 0.7 to 0.8. Problematically, this pattern is present regardless of whether one is plotting consistent predicted and observed data (MVUE versus MVUE and MS versus MS), or inconsistent predicted and observed data (MVUE versus MS and MS versus MVUE). Across all plots, there is the large overlap in the distributions of

*R*

^{2}values when plotting “consistent” and “inconsistent” predicted and observed data. With fewer observers per experiment (4 and 12 versus 36) the overlap increases greatly, to the extent that with four observers per experiment the data have near complete overlap.

*R*

^{2}to assess the extent to which a set of data are consistent with the predictions of MVUE. The precise amount of quantitative overlap acceptable for an experiment would be a judgment on the part of the experimenter.

*R*

^{2}statistic that experimenters report does not measure the deviation of the data from the predictions of a cue integration model (even though it is often stated in this way), rather, the

*R*

^{2}statistic gives a measure of the fit of the polynomial. The predicted values of a cue integration model could be off by any arbitrary amount or have the opposite relationship between predictions and data, and experimenters could still obtain an

*R*

^{2}close to 1. Thus, a regression analysis negates one of the key benefits of MVUE (and other cue integration models), which is the ability to predict the absolute value of the integrated cues percept and its reliability and then compare this to that observed experimentally. Tests do exist to determine whether the intercept and slope differ from predicted model values, but these are rarely reported and are definitively not shown by the

*R*

^{2}statistic alone.

*S*for judgments of size. On each trial, in one interval, the experiment presents a “standard” stimulus and in the other interval a “comparison” stimulus, the difference between these being Δ

_{A}*S*. The observer must signal in which interval the “larger” stimulus was presented. Next, let's assume that this is done in the presence of a conflicting “nuisance” cue,

_{A}*S*, which is constant and signals that the stimulus is unchanged across intervals. This means that the “single cue” stimulus is in fact an integrated cues stimulus and can be described as

_{N}*S*(

_{c}*i*), the experimenter measures \(p\left( {``larger"{\rm{|\Delta }}{S_c}\left( i \right)} \right)\) and (with the assumption that the “standard” and “comparison” stimuli can be represented by Gaussian probability density functions) maps out a psychometric function by plotting \(p\left( {``larger"{\rm{|\Delta }}{S_c}\left( i \right)} \right)\) against Δ

*S*(

_{A}*i*), then fits a Cumulative Gaussian to the data (blue data and function in Figure 13). Clearly, the experimenter will incorrectly estimate σ

_{A}from this fitted function. More specifically, they will overestimate σ

_{A}because each stimulus that they present is in fact an attenuated version of that which they intended (i.e., Δ

*S*(

_{c}*i*) < Δ

*S*(

_{A}*i*)). The extent to which the experimenter misestimates σ

_{A}will be a function of

*w*(the weight given to the nuisance cue

_{N}*S*, which is signally no change across intervals). As σ

_{N}_{N}→ ∞, the weight given to the nuisance cue will approach zero (

*w*→ 0) and σ

_{N}_{A}will be estimated accurately. However, for any non-infinite value of σ

_{N}, the experimenter will misestimate σ

_{A}.

*S*(

_{c}*i*) instead of Δ

*S*(

_{A}*i*) (red data and function in Figure 13). To determine this “warping,” we can ask, what scale factor,

*k*, would we need to apply to Δ

*S*such that in all cases Δ

_{A}*S*= Δ

_{c}*S*. Given

_{A}*w*= 1 −

_{N}*w*, we can write this as

_{A}*w*= 1, no scaling is required to combat the attenuation caused by

_{A}*S*, because it receives zero weight, however, as soon as

_{N}*w*< 1, scaling is needed (i.e.

_{A}*k*> 1). Next, we can ask, given the true value of σ

_{A}, what would be our estimate, \({\hat \sigma _A}\), of this be in the presence of the conflicting nuisance cue. To do this, we recognize that for a probability density function of a random variable

*X*distributed according to

*F*(

_{X}*x*), the probability density function of a variable

*Y*=

*g*(

*X*) is also a random variable. If

*g*is differentiable and \(g : {\mathbb{R}} \rightarrow {\mathbb{R}}\) is a monotonic function, we can then use a change of variables to transform between probability density functions.

*x*) =

*g*

^{−1}(

*y*) and the support of

*Y*is

*g*(

*x*) with the support of

*X*being

*x*(Blitzstein & Hwang, 2015). For our example, the Gaussian probability density function representing our cue

*S*(red function in Figure 13) can be written as

_{A}_{A}. From Equation 13, using the transform

*x**

*k*, a change of variables gives

*S*(blue function in Figure 13). The standard deviation of

_{A}*F*(

_{Y}*y*) is given by

*w*< 1, we overestimate the sigma of the underlying estimator, \({\sigma _A}^{\prime}\)>σ

_{A}_{A}.

*S*and

_{A}*S*with standard deviations of σ

_{B}_{A}and σ

_{B}signalling a property of interest,

*S*. We measure “single cue” sensitivity functions for each cue while holding the other cue constant. Because \(1/\sigma _A^2 + 1/\sigma _B^2\) is a constant,

*c*, the weights given to each cue are \({w_A} = \frac{1}{{c*\sigma _A^{2\;}}}\) and \({w_B} = \frac{1}{{c*\sigma _B^2}}\), and given in Equation 17, our experimental estimates of the true underlying standard deviations are given by \({\hat \sigma _A} = c*\sigma _A^3\) and \({\hat \sigma _B} = c*\sigma _B^3\). These are each larger than the true underlying values as they have been measured in the presence of a cue signally no change (see Figure 13). The ratio of these estimates is given by

_{A}/σ

_{B}= 1/27 the true sigma ratio is in fact 1/3 and we experimentally misestimate σ

_{A}/σ

_{B}by a factor of approximately 9. Studies which have measured the reliability of cues in the presence of a secondarily constant conflicting cue (e.g. Murphy et al., 2013 and Svarverud et al., 2010), will therefore have significantly overestimated the true cue relative reliabilities. As such, the data in these studies cannot be used to accurately test MVUE, without some form of correction. This analysis shows the critical importance of being able to isolate singles cues satisfactorily, or if one is not able to, correct for their influence when inferring relative cue reliabilities.

*PLoS Computational Biology,*14(7), e1006110. [CrossRef]

*Nature Neuroscience,*4(11), 1063–1064. [CrossRef]

*Scientific Reports,*9(1), 5155. [CrossRef]

*Psychological Methods,*10(4), 389–396. [CrossRef]

*Introduction to Probability*. Boca Raton, FL: CRC Press.

*Vision Research,*40(27), 3725–3734. [CrossRef]

*Journal of Neuroscience,*30(21), 7269–7280. [CrossRef]

*Journal of Neuroscience,*30(22), 7714–7721. [CrossRef]

*Journal of Vision,*5(6), 534–542. [CrossRef]

*Neuropsychologia,*51(1), 26–37. [CrossRef]

*Journal of the Royal Statistical Society,*4, 102–118.

*Journal of Cellular Biology,*177(1), 7–11. [CrossRef]

*Scientific Reports,*8(1), 5483. [CrossRef]

*Journal of Vision,*9(2):25, 21–25. [CrossRef]

*Human body perception from the inside out*(pp. 105–131). New York, NY: Oxford University Press.

*Nature,*415(6870), 429–433. [CrossRef]

*Trends in Cognitive Science,*8(4), 162–169. [CrossRef]

*Sensory Cue Integration*(pp. 224–250). New York, NY: Oxford University Press.

*Nature Neuroscience,*17(5), 738–743. [CrossRef]

*Journal of Vision,*11(6), 16. [CrossRef]

*Journal of Vision,*5(11), 1013–1023. [CrossRef]

*Journal of Vision,*9(9):8, 1–20. [CrossRef]

*Current Biology: CB,*16(4), 428–432. [CrossRef]

*Signal Detection Theory and Psychophysics*. West Nyack, NY: Cambridge University Press.

*Experimental Brain Research,*179(4), 595–606. [CrossRef]

*Journal of Motor Behavior,*44(6), 435–444. [CrossRef]

*Science,*298(5598), 1627–1630. [CrossRef]

*Journal of Vision,*4(12), 967–992. [CrossRef]

*Journal of Vision,*6(5), 634–648. [CrossRef]

*Trends in Cognitive Sciences,*6(8), 345–350. [CrossRef]

*Vision Res,*34(17), 2259–2275. [CrossRef]

*Vision Res,*33(5-6), 813–826. [CrossRef]

*Journal of Mathematical Psychology,*73, 117–139. [CrossRef]

*Psychophysics: A Practical Introduction*. (1st ed.). Salt Lake City, UT: Academic Press.

*Psychophysics: A Practical Introduction*. (2nd ed.). Salt Lake City, UT: Academic Press.

*Trends in Cognitive Science,*21(7), 493–497. [CrossRef]

*Perception as Bayesian Inference*. West Nyack, NY: Cambridge University Press.

*Vision Research,*43(24), 2539–2558. [CrossRef]

*Perception,*31(12), 1467–1475. [CrossRef]

*Percept Psychophys,*64(3), 380–391. [CrossRef]

*Perception,*29(1), 69–79. [CrossRef]

*Vision Research,*39(16), 2729–2737. [CrossRef]

*PLoS One,*2(9), e943. [CrossRef]

*Trends in Cognitive Science,*14(7), 293–300. [CrossRef]

*Doing Bayesian Data Analysis*. New York, NY: Elsevier.

*Journal of Vision,*5(5), 478–492. [CrossRef]

*Frontiers in Psychology,*3, 56. [CrossRef]

*Vision Research,*35(3), 389–412. [CrossRef]

*Perception Psychophysics,*63(8), 1279–1292. [CrossRef]

*Current Biology: CB,*24(21), 2569–2574. [CrossRef]

*Attention, Perception Psychophysics,*80(6), 1461–1473. [CrossRef]

*Journal of Vision,*16(15), 16. [CrossRef]

*Journal of Vision,*12(1), 1. [CrossRef]

*Probabilistic Models of the Brain: Perception and Neural Function*(pp. 13–36). Cambridge, MA: MIT Press.

*Attention, Perception Psychophysics,*2(1), 37–44. [CrossRef]

*Journal of Neurophysiology,*110(1), 190–203. [CrossRef]

*Journal of Vision,*10(11), 15. [CrossRef]

*Current Biology,*18(9), 689–693. [CrossRef]

*Scientific Reports,*8(1), 16880. [CrossRef]

*Vision Research,*43(23), 2451–2468. [CrossRef]

*Frontiers in Psychology,*10, 1089. [CrossRef]

*Perception Psychophysics,*28(4), 377–379. [CrossRef]

*Journal of Vision,*12(6), 25. [CrossRef]

*Journal of Vision,*13(7), 3. [CrossRef]

*Palamedes: Matlab routines for analyzing psychophysical data*, http://www.palamedestoolbox.org.

*Frontiers in Psychology,*9, 1250. [CrossRef]

*Multisensory Research,*29(4–5), 279–317.

*Sensory Cue Integration*(pp. 144–152). New York, NY: Oxford University Press.

*Journal of Vision,*15(2), 14. [CrossRef]

*Journal of Neuroscience,*34(31), 10394–10401. [CrossRef]

*Journal of Vision,*11(7), 12. [CrossRef]

*Vision Research,*122, 105–123. [CrossRef]

*Journal of Vision,*9(5):28, 21–14. [CrossRef]

*Proceedings of the National Academy of Sciences of the United States of America,*103(49), 18781–18786. [CrossRef]

*Journal of Vision,*10(1):5, 1–13.

*Journal of Vision,*9(6):3, 1–13. [CrossRef]

*Perception,*37(1), 79–95. [CrossRef]

*Journal of Vision,*15(9), 22. [CrossRef]

*Journal of Vision,*10(2):20, 21–18. [CrossRef]

*Journal of Vision,*10(5), 17. [CrossRef]

*Sensory Cue Integration*. New York, NY: Oxford University Press.

*Journal of Vision,*17(3), 10. [CrossRef]

*Perception Psychophysics,*33(2), 113–120. [CrossRef]

*Journal of Vision,*5(10), 834–862. [CrossRef]

*Perception Psychophysics,*54(2), 195–204. [CrossRef]

*Perception Psychophysics,*63(8), 1293–1313. [CrossRef]

*Perception Psychophysics,*63(8), 1314–1329. [CrossRef]

*Journal of Vision,*16(15), 28. [CrossRef]

*Vision Research,*33(18), 2685–2696. [CrossRef]

*Journal of the Optical Society of America. A, Optics, Image Science and Vision,*21(11), 2049–2060. [CrossRef]