Abstract
How object and scene structure is represented is an important problem in vision. Considerations of coding efficiency and of systematicity suggest that representations of structurally related objects should share components (which, as our earlier results show, need not be generic, categorical, or mutually exclusive). Can a useful set of object fragments be acquired in an unsupervised fashion? Statistical interdependence criteria such as pairwise conditional probabilities of the fragments, or Barlow's “suspicious coincidence” ratio (the joint probability of two fragments divided by the product of their marginal probabilities), can provide a basis for unsupervised learning. If humans use such criteria in learning structured objects, they would tend to lump together a pair of highly interdependent fragments, perceiving them as a single shape. We tested this hypothesis in two experiments involving a part verification task, in which subjects are known to detect a unitary probe embedded in a larger target faster than a composite one. Altogether, over 90 subjects were tested. As predicted, mere exposure to a set of statistically controlled, structured stimuli (80 and 100 objects in the first and second experiments, respectively) led to a larger speedup for fragment pairs with higher interdependence (conditional probability). This effect was modulated in a complicated manner by the fragment coincidence as measured by Barlow's ratio. Single-fragment probes generally exhibited a smaller speedup than composites. Our study complements and extends recent results by Aslin and others that demonstrate some of the learning strategies used by the brain in dealing with structured stimuli. To elucidate the mechanisms behind this kind of unsupervised learning, we developed a computational model of visual structure acquisition, which accepts the same stimuli seen by the human subjects, and exhibits similar patterns of behavior.