Overfitting is an important concern for complex models. For the present study, the correlation (r) that can be reached by the optimization algorithm will always improve with a larger number of dimensions because more dimensions always give more room to explain the variations in the data. However, some of these additional dimensions may have provided artificial explanations for “random noises” rather than substantive underlying mechanisms. It is therefore important to assess the issue of overfitting for the SCI shape space. Specifically, are three dimensions too many? Is the third dimension an artificial one or not?
The central problem of overfitting is the failure of making good predictions for new data. I considered two types of new data: the generalization to a new group of observers and the generalization to new shapes.
First, for generalization to a new group of observers, I used an out-of-sample fitting procedure to assess this problem. In this procedure, I randomly split the data into two parts (i.e., 93 observers for each), conducted an optimization on Part 1, and calculated the usual in-sample fitting index on which the optimization was based,
r(log-distances Part 1, discriminabilities Part 1), as well as the out-of-sample fitting index,
r(log-distances Part 1, discriminabilities Part 2). This was repeated 15 times, and the average results are plotted in
Figure 5a. Unsurprisingly, the in-sample fitting was better than the out-of-sample fitting, and the difference grew with more dimensions. However, it is also clear that the difference was relatively trivial (
r = 0.972 vs. 0.957 for the 3D model), and this suggests that the SCI shape space can be generalized fairly well to a new group of observers.
Second, for generalization to new shapes, the set of shapes is too small to be split into two subsets. Therefore, to assess this issue, I used a leave-one-out fitting procedure to assess this problem. In this procedure, I took one “test shape” out and used optimization to fully determine a shape space on the basis of the other 13 shapes. Then I put the test shape back to see how well this test shape could fit with this previously optimized shape space (i.e., fitting of the 13
d′ values relevant to this test shape) and compared it against the usual situation in which all 14 shapes were optimized together. Here, the linear correlation coefficient (
r) could not be used as an unbiased index to compare the fitting of a subset to that of the whole set,
9 so I used the root-mean-square error (RMSE). This procedure was repeated for all 14 shapes, and the average results are plotted in
Figure 5b. Unsurprisingly, when a shape was left out, the fitting of the shape was worse than the usual fitting, and the difference grew with more dimensions. However, it is also clear that the difference was modest (a 49% increase of RMSE for the 3D model) and this suggests that the SCI shape space can be generalized reasonably well to new shapes. Of course, this leave-one-out fitting procedure only provides evidence for the generalization within this category of shapes, namely, other 2D filled geometrical shapes. With this restriction, it seems unlikely that the current diverse set has missed the variations in a whole dimension. In the “additional shape sets” section below, we will consider the issue of generalization to other types of shapes, such as 3D shapes and shapes of real-world objects (e.g., animal silhouettes, letters, and characters).
In addition to the quantified assessments, conceptual clarity is essential to the prevention of overfitting. As mentioned above, the three dimensions of SCI shape space have fairly clear conceptual interpretations. For comparison, the best-fitting two-dimensional model is plotted in
Figure 6, and it is clearly inadequate and overly compressed. Some of the fundamentally different variations (✚ vs.● difference;
О vs.● difference) have been forced to reside on almost the same dimension, which is conceptually inappropriate. Therefore, 3D (i.e., SCI) shape space is conceptually better than the two-dimensional shape space.
It is worth mentioning that another important type of generalization is the generalization to new tasks. The present study has only tried one task. Therefore, we cannot really come up with quantified assessments on how well the SCI will generalize to other tasks. Conceptually, I predict that the SCI should be able to generalize very well to other attention-demanding tasks (e.g., visual search), but probably only partly to one-to-one comparison tasks (e.g., pairwise comparison or same-different discrimination) because the latter tasks measures postattentive processing of shapes. It will be interesting to see how much (and what aspect) of SCI can generalize to these one-to-one comparison tasks.