Abstract
Although hierarchical representation is known to foster the robustness of the object recognition system, there is no direct psychophysical evidence showing how humans acquire hierarchical representations of objects via statistical learning. We hypothesize that if the visual system learns objects in terms of hierarchical dictionaries of parts, humans will be able to infer "hidden" parts, which have not been perceived directly, as the complement of parts that have been observed. 12 elementary shapes were randomly combined to make objects. The inventory included objects composed of four shapes (termed quadruples) and objects composed of two shapes (termed pairs). One of the object quadruples was chosen to be the target, and a pair of shapes embedded within it was chosen to be the trained embedded part (TEP). During training, observers first learned object parts in a detection task, and then were presented with 104 distinct training images in which multiple objects were presented without clear segmentation. Object familiarity was measured in the subsequent test trials. For the non-target quadruple objects, observers failed to recognize their parts (accuracy of 0.53), replicating a previous finding reported by Fiser and Aslin (2005). However, for the target quadruple object, after learning the TEP, observers were able to recognize the complementary part with accuracy of 0.70, even though this complementary part had never been displayed alone. A second experiment further revealed that passive viewing of TEPs was not sufficient to establish the part-based representation that enables the inference of hidden parts. In summary, these two studies suggest that the key to the human ability to form hierarchical representations is the reuse of previously learned knowledge about object parts, enabling a compositional mechanism for forming object representations.
Meeting abstract presented at VSS 2013