We first checked whether subjects took the mean as an estimate from the sensory cue, and not a heuristic, such as the robust average. In tasks similar to ours (
Bejjanki et al., 2016;
Chambers, Gaebler-spira, & Kording, 2018;
Vilares et al., 2012), authors assume that observers use the mean of the dots as their best estimate of true location from the likelihood information. However, we did not explicitly tell our participants how the eight dots that formed the likelihood were generated, or that the best estimate they could take from them was their mean position, leaving open the possibility that observes may have taken a different estimate from the cue than the mean (
de Gardelle & Summerfield, 2011;
Van Den Berg & Ma, 2012). The mean horizontal position was found to explain the most amount of variance in participants’ responses (
R2 = 0.996), relative to the robust average (
R2 = 0.995), median (
R2 = 0.995), or the mid-range of the dots (
R2 = 0.992). This suggests that the mean of the dots is the estimate that participants take from the sensory cue.
We then examined whether the weight participants placed on the likelihood, relative to the prior, varied with respect to trial type (prior/likelihood pairing) for all the trial types present from the beginning of the experiment. Without this basic result — a replication of the pattern found by
Bejjanki et al. (2016) — we could not expect them to transfer knowledge of the learned prior distributions to the new high variance likelihood in the later blocks. This was a qualified success: for these trial types (blue and green bars in
Figure 2), participants showed the predicted pattern, but placed more weight on the likelihood than is optimal (compare bar heights to dashed lines in
Figure 2), in line with previous research (
Bejjanki et al., 2016;
Tassinari et al., 2006). We conducted a 2 (narrow versus wide variance prior) × 2 (low versus medium variance likelihood) × 5 (block) repeated measures ANOVA with the weight given to the likelihood (the displayed dots) as the dependent variable. These results are shown in
Table 1 and summarized here. There was a main effect of prior variance, with less weight on the likelihood when the prior was narrower (
p < 0.001). There was also a main effect of likelihood variance (
p < 0.001), where participants relied less on the medium variance likelihood. However, there was also a significant interaction effect of likelihood and prior (
p = 0.001). When the prior was narrow, the decrease in reliance on the likelihood was smaller as the likelihood variance increased (
t(25) = 3.57,
p = 0.001).
We found a main effect of block (p < 0.001) and an interaction between block and likelihood (p = 0.014), with the medium variance likelihood weighted significantly differently across blocks (simple main effect of block, \(F ( {4,100} ) = 5.84, p < 0.001, \eta _{p}^2 = 0.189\), weights decrease with increasing exposure), but not the low variance likelihood (no simple main effect of block, \(F ( {4,100} ) = 1.64, p = 0.169,\eta _{p}^2 = 0.063\)). This suggests that participants adjusted, through practice, their weights on the medium variance likelihood, getting closer to optimal.
Examination of the prior-only trials shows successful learning of the priors. On average, subjects’ responses were not significantly different from the prior mean in the wide prior condition (t(25) = − 0.77, p = 0.450). They were significantly different in the narrow prior condition (t(25) = − 2.78, p = 0.010), although the bias was extremely small (95% confidence interval [CI]: 0.06, 0.41 percent of the screen width to the left). The median SD of responses for all subjects was 1.4% (narrow prior) and 2.5% (wide prior): almost identical to the true prior SDs of 1.3% and 2.5%, respectively.
Participants qualitatively followed the predicted optimal pattern of reweighting: like the dashed lines (predictions) in
Figure 2, actual likelihood weights (bars) were higher for the wide prior (right) than the narrow prior (left), and higher for the low variance likelihood (blue) than the medium variance likelihood (green). However, comparing bar heights with dashed lines (predictions) shows that quantitatively, their weights were far from optimal. Participants systematically gave much more weight than is optimal to the likelihood when its variance was medium (see
Figure 2, green bars versus lines —
p < 0.001 in all blocks for the medium likelihood when paired with either prior). This over-reliance on the likelihood is in line with previous studies (e.g.
Bejjanki et al., 2016), although stronger in the present study. Participants, therefore, accounted for changes in the probabilities involved in the task (e.g. weighted the likelihood less when it was more variable), but did not perform exactly as predicted by the optimal strategy.
Having found that participants’ performance was in line with the predicted patterns, we could then ask if they would generalize their knowledge to the new high likelihood trials added in blocks four and five (“Bayesian transfer”), as predicted for an observer following Bayesian principles. This should lead immediately to a lower weight for the new high variance likelihood than the familiar medium variance likelihood. By contrast, lack of a significant difference in weights between the medium and high likelihood trial types would suggest that the observer is employing an alternative strategy, such as simply learning a look-up table. To test this, we performed a 2 (prior) × 3 (likelihood) × 2 (block) repeated measures ANOVA (summarize only blocks four and five — those with all likelihoods present). These results are shown in
Table 2 and summarized here. As above, we found a main effect of likelihood, participants placing less weight on the likelihood as it became more uncertain (
p < 0.001). However, post hoc analyses showed that the weight placed on the high likelihood was not significantly lower than the weight placed on the medium likelihood (
p = 0.103). Only the comparison of the weights placed on the likelihood in low and high variance trial types was significant (
p < 0.001). Moreover, there was no main effect of block (
p = 0.28), nor an interaction between block and likelihood (
p = 0.48), suggesting that the weight placed on the newly introduced likelihood variance did not decrease with increasing exposure.
Finally, we compared mean weights in block five against the optimal Bayesian values for each prior and likelihood pairing. In the low variance likelihood trials, we did not observe significant deviation from the Bayesian prediction, irrespective of prior (low variance likelihood, narrow prior: t(25) = .784, p = 0.440); low variance likelihood, wide prior: t(25) = − 1.12, p = 0.270). Subjects’ weights differed significantly from optimal in all other conditions (p < 0.001 in all cases).
Overall, our results do not exactly match the predictions of a Bayesian observer because we find only weak evidence of Bayesian transfer. Specifically, although we find a main effect of likelihood, the weight on the high variance likelihood is not significantly different to that placed on the medium variance likelihood (although the change is in the predicted direction). That said, our results are not simply more consistent with a rote process, because the weight placed on the high likelihood does not decrease with increasing exposure (no interaction between likelihood and block).
Our results point mostly away from a simple variance weighted Bayesian model being a good model of human behavior in this particular task. The correct pattern of weights was present, but evidence of transfer was weak. Participants were also significantly suboptimal, overweighting the likelihood whenever its variance was medium or high. Previous studies have also found that observers give more weight to the sensory cue than is optimal (e.g.
Bejjanki et al., 2016); even so, the level of suboptimality that we observe here is still drastically higher, compared to previous reports. However,
Sato and Kording (2014) found better, near-optimal performance in those participants who were told that the sensory information can have one of two levels of variance, and that the variance will sometimes change, compared to those who were not provided with this information. We, therefore, reasoned that if observers are given additional information about the structure and statistics of the task (e.g. that the variances of the prior distributions are different), the weight they give to the sensory cue may move closer to optimal. If we find weights closer to optimal, we may be better able to detect whether transfer had taken place because the effect size of a change in the likelihood would be bigger. In fact, we wonder whether the size of this effect could be an important factor behind the lack of significant differences observed in
Experiment 1 (i.e. that the effect size of the change from medium to high likelihood was too small for our statistical analysis to reliably detect). In view of this, we set out to test whether additional instructions will lead to weighting of likelihood and prior information that is closer to optimal.