The experiment was an old-new rating study often used in memory research. There were two blocks, counter-balanced across participants. One block had only basic level objects, and the other block had subordinate level objects (12 cars, 12 dinosaurs, and 12 airplanes). Each block had two phases: study and test (
Figure 2).
In the basic level block, the 108 objects were randomly divided into two halves. One half, consisting of 54 objects, was called “old” and was shown in the study, with half of them in canonical views (CC) and the other half in non-canonical views (NC). In the test, half of the “old” objects were shown identically as in the study. The other half's viewpoints were changed between canonical and non-canonical. The 54 “new” objects were shown only in the test phase, half of which were in canonical and the other half in non-canonical views. In the subordinate level block, the same procedure was followed, and half objects in each category (cars, dinosaurs, and planes) were “old” and half were “new.”
More specifically, in the study phase, an object image was shown for 1 s and was replaced by a scale of (−3 −2 −1 +1 +2 +3). The following words were written below their corresponding numbers in the scale: (least attractive, −3), (below average, −1), (above average, +1), and (most attractive, +3). Participants responded by selecting a number, and the next trial started automatically. The subordinate level block was preceded by a reminder that there was a high degree of resemblance between some objects and that care was needed to do well. Each subordinate level object was shown twice in the study, as compared to the basic level block where each studied object was shown only once.
In the test phase, an object image was shown for 1 s, and was replaced by a scale of (−3 −2 −1 +1 +2 +3). For half of the participants, the following words were written below their corresponding numbers in the scale: (“surely old,” −3), (“guess old,” −1), (“guess new,” +1), and (“surely new,” +3). For the remaining participants, the old-new direction was reversed. Participants responded by selecting a number, and the next trial started automatically. It is important to emphasize that no object was retested.
The old-new assignment of objects, the viewpoint (CC or NC) chosen, and whether an image or its mirror reflection was shown, were all randomized across participants. For any participant, an “old” object's study-test image pairs were either both mirror reflected or both not reflected. In other words, no participant had to consider that an image was mirror reflected.
It took about 20 minutes for a participant to complete the experiment.
The canonical views, in the study phase, were overall rated as more attractive than non-canonical views. In the basic level block, the mean ratings were 0.23 and 0.09, respectively (t(56) = 2.22, p = 0.03). In the subordinate block, no difference could be found (t < 1).
The data in the test phase are here reported as the frequency an object was categorized as “old.” For studied objects, this “old” response is the hit rate (
Figure 3). For “new” objects, this “old” response is the false alarm rate (
Figure 4). Our analysis of
d' data gave rise to similar results as the analysis using hit and false alarm rates.
We categorized the rating data by positioning the decision criterion between −1 and +1, and calculated the hit and false alarm rate per participant, per block (basic vs. subordinate), per study view (canonical vs. non-canonical), and per test view (matched/same as vs. mismatched/different from the study view). First, an overall 2 × 2 × 2 ANOVA was performed on the hit data. The main effect of block was significant, F(1, 56) = 13.86, p < 0.001, where the overall hit rate for the basic level block (89.09%) was higher than for the subordinate level (83.95%) block, not surprisingly. The main effect of test view was also significant, F(1, 56) = 72.26, p < 0.001, meaning that the overall hit rate was higher when study-test views matched than mismatched (92.97% vs. 80.07%). The main effect of study view was not significant, F(1, 56) < 1. There was a significant interaction between block and study view, F(1, 56) = 5.38, p < 0.05, between block and test view, F(1, 56) = 24.09, p < 0.001, and between study and test views, F(1, 56) = 4.17, p < 0.05. The three-way interaction was also significant, F(1, 56) = 8.19, p < 0.01.
In order to better understand the overall effects above, we looked at the data more closely. The hit rates were similar when the study and test views matched, for both the basic and subordinate level objects (study-view_test-view: NC_NC = 92.15%, CC_CC = 93.79%). The similar hit rates show that interestingly, the presence of within category distractors did not influence recognition when viewpoints were unchanged from study to test. No significant difference between the NC_NC and CC_CC conditions was found in either block (t(56) = 1.16 and 0.91, p = 0.25 and 0.36, for subordinate and basic levels, respectively).
These results are different from those in Gomez et al. (2008) who found that non-canonical matched views gave rise to a higher hit rate than did canonical matched views (84.80 vs. 78.90%, as compared to 92.15% vs. 93.79% in our study). The false alarm results are also different between the two studies. In Gomez et al. (2008), “new” objects tested in canonical views gave rise to a higher false alarm rate (18.20%) than in non-canonical views (10.80%). In our study, however, the main effect of “new” objects' view (canonical vs. non-canonical) was not significant in either block (basic level: 11.26% vs. 10.54%,
t(56) < 1; subordinate level: 30.63% vs. 26.69%,
t(56) = 1.40,
p = 0.15) (
Figure 4). It is likely that these different results between the two studies are due to the different objects used, and the different canonical and non-canonical views selected. In a sense, the comparable performance between canonical and non-canonical matched conditions (when study-test views were identical) in our study made it easier and more interesting to interpret mismatched conditions below (when study-test views were different) for evaluating the predictions made by
Equation 1. This is because any asymmetry of view generalization between canonical and non-canonical views is more likely due to the view generalization from one view to another, rather than the views themselves (
Figure 3; see
Discussion).
In order to test whether there was any such asymmetry, a 2 × 2 ANOVA on study (canonical, non-canonical) × test (matched, mismatched) views was performed in the basic level block. There was a main effect of test view, F(1, 56) = 32.38, p < 0.001. This means that the hit rate was higher when the study-test views matched than mismatched (NC study views: 91.84% (matched) vs. 84.41% (mismatched), CC study views: 92.93% vs. 87.18%). The main effect for study view was not significant, F(1, 56) = 2.96, p = 0.09, indicating that the hit rate was similar for CC (88.67%) and NC (88.50%) study views. The interaction was not significant, F(1, 56) < 1. This means that the performance drop from matched study-test views to mismatched ones was comparable regardless of view canonicality.
In comparison, the similar 2 × 2 ANOVA for the subordinate level block showed the following results. There was a main effect for test views,
F(1, 56) = 56.19,
p < 0.001. This again means that the hit rate was higher when the study-test views matched than when they mismatched (NC: 92.46% (matched) vs. 78.25% (mismatched), CC: 94.65% vs. 70.44%) (
Figure 3). The main effect for study views was not significant,
F(1, 56) = 2.56,
p = 0.12, indicating that comparable hit rates were obtained regardless of the study view canonicality (NC: 85.35%, CC: 82.54%). Interestingly, there was a significant interaction between study and test views,
F(1, 56) = 7.32,
p < 0.01. This indicates that performance was better when a non-canonical view was studied and its canonical counterpart tested than the other way around. More specifically, the drop in hit rate from matched to mismatched views was significant for both NC and CC study views,
p < 0.001 (CC: 24.21% drop,
t(56) = 7.65; NC: 14.21% drop,
t(56) = 4.51). This 10% additional drop for the CC study views indicates that, in presence of within-category objects that required additional shape details of an object to be encoded, view generalization from a non-canonical study view to a canonical test view was more accurate than the other way around.
The results from the subordinate level block of this experiment, taken in isolation, were consistent with the prediction from
Equation 1. However,
Equation 1 also predicts the similar pattern of results for basic level objects as well as when 3D object recognition is switched to 2D image recognition. Poggio & Edelman's model operates by matching incoming 2D images with stored 2D views. By the virtue of 2D image matching, the model attempts to find the most likely object and viewpoint that gave rise to the 2D image. Therefore, one would expect
Equation 1 to make the same qualitative prediction for an image-based recognition task as explained under Scenario 2 in the
Introduction. Experiment 3 tested this prediction.
Twenty fresh students of UCLA participated for partial course credit in undergraduate psychology courses. Thirty-seven students of the Technical University of Kaiserslautern, Germany, also participated and were paid for their time. All participants had normal or corrected-to-normal vision, were naive to the purpose of the experiment, and gave informed consent in accord with the policies of the Committee for the Protection of Human Subjects, which approved the experimental protocol at the respective universities.
The same apparatus as Experiment 1 was used at UCLA. The computer displays used in Germany were a 20″ Sun Microsystems CRT, and a 20″ Mitsubishi Diamond Pro 2070 SB CRT. The resolution was 1280 × 1024. With this resolution and image size of 450 × 450 pixels, the angular size of the image was 13° × 13° at the viewing distance of 57 cm.
The data were analyzed similarly as in Experiment 2 (
Figure 4). It should be emphasized that an “old” response to an “old” object being tested in a “new” view was now a false alarm.
We first compared the overall hit rates between the basic and subordinate level blocks. The main effect of block was significant, F(1, 56) = 4.28, p < 0.05, where the overall hit rate for the basic level block was 83.91% and that for subordinate level block was 91.84%. Interestingly, the hit rate for the basic level was lower than that for the subordinate level block. Because the average hit rate for matched views in Experiment 2 was 92.97%, this effect in Experiment 3 must have to do with the task difference between Experiments 2 and 3. In Experiment 3, it seemed difficult for participants to remember the exact studied images in the basic level study phase without similarly shaped objects for explicit comparison. In absence of within category distractors with similar shapes and features, view generalization appears to be automatic. This automatic view generalization not only raised the false alarm rate for “old” objects' in their “new” views (see below), but also lowered the hit rates for the basic level objects.
We further looked into the decision criterion in order to check whether this lowered hit rate was due to the decision criterion change from Experiment 2 to Experiment 3. It turned out that this change, from the β value of 1.02 to 1.25, did not reach significance (t(56) = 1.83, p = 0.073). In terms of the bias, however, the bias difference between the two experiments was highly significant. This difference, however, was almost entirely due to the shift of the bias-free or optimal decision criterion location from Experiment 2 to Experiment 3. This is because in Experiment 2 the relative frequency of “signal” and “noise” was 1:1, whereas in Experiment 3 it was 1:3. As a result, the optimal decision criterion was shifted to the right from Experiment 2 to Experiment 3. Nevertheless, this bias difference should not affect any of the conclusions in this study, because the hypothesis testing was based on the pattern of results within each experiment.
We also calculated
d′ per participant, per block, and per study view (canonical or non-canonical). It turned out that in the basic level block, the average
d′ in Experiment 3 was significantly lower than in Experiment 2 (2.12 vs. 2.79,
p < 0.0001). However, the
d′ scores in the subordinate block were comparable between the two experiments (1.83 vs. 1.70,
p = 0.14). These results suggest that, from Experiment 2 to Experiment 3, participants could not possibly just shift the decision criterion in order to switch from object recognition to image recognition. Instead, the way in which internal representations were constructed was different between the two experiments. In other words, the independence model proposed by Poggio and Edelman (
1990) was unlikely to account for these results.
Within each block, the hit rates were similar for non-canonical and canonical matched views (basic level: 85.96% and 81.86%, t(56) = 1.89, p = 0.63; subordinate level: 91.67% and 92.02%, t(56) < 1). The overall false alarm rate for “new” objects was lower for the basic level (4.01%) than for the subordinate level block (15.08%), an effect similar to what was observed for the object-based task in Experiment 2. Within a block, there was no significant difference in the overall false alarm rates for new objects between canonical and non-canonical views (basic level, 3.80% and 4.23%, t(56) < 1; subordinate-level, 13.36% and 16.81%, t(56) = 1.47, p = 0.15).
We further separated the false alarm rates for the “old” and “new” objects. There was a significant difference between these two sets of objects for both non-canonical and canonical test views in each block. In the basic level block, the false alarms for the non-canonical test views were 35.46% (“old” objects) vs. 4.23% (“new” objects),
t(56) = 11.86,
p < 0.001. The false alarms for the canonical test views were 45.15% (“old” objects) vs. 3.80% (“new” objects),
t(56) = 16.68;
p < 0.001. In the subordinate level block, the false alarms for the non-canonical views were 42.46% (“old” objects) vs. 16.81% (“new” objects),
t(56) = 7.32,
p < 0.001. The false alarms for the canonical views were 36.93% (“old” objects) vs. 13.36% (“new” objects),
t(56) = 7.20,
p < 0.001. These differences further indicate that there was an automatic view generalization from a study view to other views, even though the participants were explicitly instructed not to generalize. For “old” objects, the false alarms for non-canonical views (42.26%) were not significantly different from canonical views (36.93%),
t(56) = 1.24,
p = 0.22. However, the “old” responses for “old” objects were always lower when the study-test views were mismatched rather than matched. We refer to this difference as the performance drop. The average drop was bigger in Experiment 3, from 87.88% to 40.00%,
F(1, 56) = 72.26,
p < 0.001 than in Experiment 2 (from 92.97% to 80.07%), indicating that the participants followed the instructions to perform the image recognition task to some extent, despite the automatic view generalization (
Figure 4).
We looked into the performance drop more closely. The main effect of study view was not significant, F(1, 56) < 1. There was a significant interaction between block and study view, F(1, 56) = 5.14, p < 0.05, between block and test view, F(1, 56) = 7.18, p < 0.01, but not between study and test views, F(1, 56) = 1.63, p = 0.20. The three way interaction between block, study view, and test view was also significant, F(1, 56) = 8.00, p < 0.01.
In order to better understand the above interaction effects, we analyzed the “old” response data within each block. A 2 × 2 ANOVA for the basic level block showed that there was a main effect for test view, F(1, 56) = 270.78, p < 0.001. This means that the “old” response rate was higher when the study-test views matched than when mismatched (NC: 85.96% vs. 35.46%, CC: 81.86% vs. 45.15%). The main effect for study view was significant, F(1, 56) = 4.47, p < 0.05. This main effect is complex to interpret, because the comparison involves both hits and false alarms. The interaction between study and test views was highly significant, F(1, 56) = 14.41, p < 0.001. This means that the performance dropped from matched to mismatched views significantly differently (50.49% drop for non-canonical study views, and 36.71% for canonical ones). The results indicate that participants made fewer errors in following the instructions for non-canonical objects, implying that either non-canonical images were encoded in memory better or canonical images generalized to other views more involuntarily. This involuntary view generalization, in absence of within category distractors, probably led to more false alarms for “new” views of “old” objects.
The similar 2 × 2 ANOVA for the subordinate level block, in comparison, showed the following results. There was a main effect for test view, F(1, 56) = 385.93, p < 0.001, as expected. There was no main effect for study view, F(1, 56) = 1.31, p = 0.25. There was no significant interaction between study and test views, F(1, 56) = 1.14, p = 0.29. This means that the performance drop was comparable from a non-canonical study view to a canonical test view (49.21%) and vice versa (55.09%).
The results above indicated that the effects found in Experiment 2 were reversed when the task of viewpoint irrelevant object recognition was changed to view dependent 2D image recognition of 3D objects.
Finally, the attractiveness rating for this experiment showed a pattern similar to that in Experiment 2. In the basic level block, canonical views were again rated as more attractive than non-canonical views, 0.61 vs. 0.36, t(56) = 3.90, p < 0.001. In the subordinate block, no difference could be found (t < 1). Thus the attractiveness rating cannot explain the difference in results found in the two experiments.
In conclusion, results in Experiment 3 were inconsistent with
Equation 1, which never predicts better view generalization from canonical to non-canonical views than the other way around. The fact that a “new” view of an “old” object was involuntarily categorized as “old” substantially more often than chance also argues against the independence model by Poggio and Edelman (
1990).
In Experiment 3, object recognition is viewpoint relevant. Therefore, when an object is seen from a certain viewpoint, view propagation is supposed to be minimal. The existing representation is also supposed to be subdued since the major source of information for the viewpoint relevant recognition is primarily the study view. Our results indicated that the effects found in Experiment 2 were reversed when the task of viewpoint irrelevant object recognition was changed to view dependent 2D image recognition of 3D objects. We hypothesize that in the natural task in Experiment 2 information from one previously seen view of an object automatically propagates to nearby viewpoints of the same object. That is, the representation of the object automatically attempts to predict, or characterize, what the object looks like from other viewpoints. As far as we know, this is the first time that unavoidable interference from object-judgment to image-judgment is being explicitly reported and discussed in object recognition literature.
We found in Experiment 2 that for basic level objects, this propagation was symmetric between canonical and non-canonical views. Importantly, we also found that for subordinate level objects, the propagation from non-canonical to canonical views was more accurate than the other way around. In Experiment 3, when participants were instructed to treat a new view of an “old” object as “new” instead of “old,” they were effectively instructed to suppress the process of view propagation that probably led to reconstruction of the internal representation. Apparently, such suppression was not fully effective. Some aspects of this view propagation were involuntary in that “new” views of “old” objects' were categorized as “old” more often than “new” objects. Based on our results it appears that the involuntary view propagation is more likely for objects studied in canonical than in non-canonical views, especially so in absence of within category distractors. On the other hand, the experimental instruction, or the top-down aspect of the suppression, was partially effective in that for the subordinate level objects, the asymmetric effect of view propagation disappeared. In other words, since the propagation from non-canonical to canonical views was more accurate in Experiment 2, it means that the suppression in Experiment 3 was relatively more effective for non-canonical views. The results, similar to those for the basic level in Experiment 3, indicate that participants made fewer errors in following the instructions for non-canonical views, probably implying that non-canonical images were better encoded into memory. Performance was better for images studied in non-canonical views both in Experiment 2 (object recognition task) and Experiment 3 (image recognition task). On the other hand, the same results may also imply that the involuntary view generalization from canonical to other views can be suppressed when image details are made more important for the task at hand, thereby reducing the asymmetry of results seen in the subordinate level block for Experiment 3. But the reason for this differential suppression remains unclear based on the results of the experiments being reported here. Nevertheless, it is reasonable to assume that such suppression works similarly for both subordinate and basic level objects. Then one would expect that, for basic level objects, propagation from non-canonical views was also more effectively suppressed. This is consistent with the data in Experiment 3.
In summary, we studied internal shape representations in object recognition with the following manipulations. We manipulated how informative an object appeared by presenting the object either from a canonical or a non-canonical view. We manipulated the task demand by asking a participant to recognize an object with either between-category or within-category distractors. We also manipulated the task demand by asking a participant to recognize an object either regardless the viewpoint that had been seen or from the exact viewpoint that had been seen. Our results showed that the view-approximation model could not fully account for our experimental data, and suggest that rotational relationship from one view to another needs to be incorporated in the internal representation.
This research was in part supported by a Marie Curie Career Integration Grant (#293901) from the European Union awarded to Tandra Ghose and a US NSF grant (BCS 0617628) to Zili Liu. Part of this research was presented at the European Conference on Visual Perception, Lausanne, Switzerland, 2010; and at the Vision Sciences Society, Naples, Florida, 2011. We thank Drs. David Bennett and Nestor Matthews for helpful comments and suggestions, who read the entire manuscript. We also thank the anonymous reviewers and, in particular, Shimon Ullman, the editor, for his insightful comments and suggestions.
Commercial relationships: none.
Corresponding author: Tandra Ghose.
Address: Department of Psychology, Technical University of Kaiserslautern, Germany.