The relationship between participant accuracy and the predictions of the FLNN, SDT, best-normal, RCref, and temporal-serial models are explicit, and we could therefore directly compare the experiment results to these predictions.
Figure 4 depicts the observed accuracy of each participant plotted against the predictions of each of the models. If a model could perfectly predict participant accuracy, the various points should fall exactly on the diagonal line. Hence, the closer the points are to this diagonal line, the better the model's predictive abilities are. Moreover, because the number of parameters used in the various models is not equal, we used the reduced chi-square measure (
χ 2/
df) and the chi-square test (
χ 2 test) (for details, see
2) to compare their predictive abilities (Taylor,
1982). The reduced chi-square measure allows us to compare models that use a different number of parameters by assigning a better fitting grade to a model with fewer parameters that predicts the same results. The lower the value of the reduced chi-square is, the more accurate the prediction. The chi-square test determines whether the probability of obtaining a
χ2 value larger than the one measured is higher than 0.05, given that the data was supported by the model and taking into account the degrees of freedom (
df). If it is not, the model is rejected.
5 In
Table 2, we report, for each combination of model and participant, the values of the model's parameters that gave the best fit (i.e., resulted in the lowest
χ2 value), the reduced chi-square measure, and whether the model was rejected on the basis of the chi-square test. As can be seen in
Table 2, the reduced chi-square score of the FLNN model is the lowest for 4 out of the 5 participants, and it is only rejected by 1 out of the 5 chi-square tests. The best-normal model is rejected by 3 out of 5 chi-square tests, the RCref by 4 out of 5 tests, and the SDT and temporal-serial models are rejected by all 5 tests. Thus, the predictions of the FLNN model are clearly the closest to human search performance in this experiment.