Abstract
Human visual object recognition largely relies on shape information. However, the nature of the shape features that actually underlie this task remain largely unknown despite the wealth of competing theories aiming to account for the code by which human vision represents shape. Here, we report a series of five object recognition experiments using a spatial sampling paradigm (cf. Bubbles) to calculate classification images (CIs) that demonstrate the efficient features used by human participants. In all experiments, the targets were behind an occluding mask and partially revealed for 100 ms by a collection of 12 circular gaussian apertures of 0.8° in diameter. Participants pressed a keyboard key to indicate the identity of the target. Response accuracy was maintained at 50% correct by manipulating the degree of degradation of the target image. The experiments essentially differ from one another in terms of the class of stimuli and the exposure of instances from various viewpoints or not. The mean CIs in all experiments constitute a fair representation of those of individual participants. Moreover, when only the significantly effective features from these CIs are visible, this image is easily mapped to the target object by any normal human observer. However, the CIs of human participants correlate poorly with those obtained from an ideal observer carrying out the task under the same conditions. There is no particular type of feature such as those proposed by major shape perception theories (e.g. concavities, convexities, edge intersections, object parts, etc.) that dominate in the mean CIs and feature sizes are quite variable. Overall, these features appear most compatible with the ‘image fragment’ theory proposed by Ullman and collaborators. Remarkably, for all objects that were presented along variable viewpoints, the regions on the object’s surface which constituted the effective features were extremely similar across viewpoints.