Figure 6 shows the measured results against the predicted results for each task. Three density levels were collapsed because it did not significantly influence observers' PSEs and JNDs.
Figure 6a shows (1) observers' measured performance in the area task against (2) predicted performance for the area task computed by performance of the mean size and numerosity tasks. There were three levels of numerosity, three levels of mean size, and five levels of comparison stimuli for each observer. Thus, each graph in
Figure 6 contains 180 (3 × 3 × 5 × 4) data points. If observers inferred the mean size of stimuli from the ratio of total area to numerosity, the slope of the regression should be close to 1 (predicted performance = measured performance). In contrast, if observers perceived the mean size of stimuli directly, the slope would not necessarily be 1. In the area task, the slope of the regressed line was 1.145 (
Figure 6a). To quantitatively evaluate whether the slope was significantly different from 1, we compared the goodness-of-fit for the two regression models: the full model with a free slope parameter and the reduced model with the fixed slope parameter of 1. An
F-test for nested models was used for statistical comparison (Wannacott & Wannacott,
1981). For the two models with
kfull and
kreduced parameters, the
F statistic is defined as
where
df1 =
kfull −
kreduced, and
df2 =
N −
kfull;
N is the number of data points. For the area task, the slope was not significantly different from 1,
F(1, 178) = 0.695,
p = 0.406 (
Figure 6a), suggesting that performance of the area task was well explained by those of the mean size and numerosity tasks. For the numerosity task, the slope (0.808) of the full model was significantly different from 1,
F(1, 178) = 18.869,
p < 0.001 (
Figure 6b), suggesting that the numerosity task was not predicted well by the mean size and area tasks. For the mean size task, the slope (1.353) of the full model was significantly different from 1,
F(1, 178) = 4.971,
p = 0.03 (
Figure 6c). These results suggest that performance in the mean size task was not predicted by the ratio of summed area and numerosity.