Abstract
Learning new visual features involves the encoding of local spatial correlations from the statistics of images. Previously, we showed that human observers can learn the spatial configuration of shape-pairs embedded within multiple exemplars of complex displays. Passive observation of several dozen exemplars was sufficient for learning both single-shape frequency and higher-order (conditional probability) information about shape-pairs in the displays. In the present study, these same 12 simple shapes were grouped into four 3-element base-triplets, each with a specific I- or V-shaped spatial arrangement. During the learning phase, subjects passively viewed displays (2 sec each) in which two of the four base-triplets were pseudorandomly arranged in a 5 × 5 grid. These two base-triplets created cross-triplet shape-pairs that were superficially indistinguishable from base-triplet shape-pairs. Some of these cross-triplet shape-pairs had the same probability of occurrence as some base-triplet shape-pairs. However, all four of the base-triplets were more predictable (they had higher conditional probabilities) than any non-base-triplets. After the learning phase, subjects were unable to discriminate cross-triplet shape-pairs from base-triplet shape-pairs [t(20)=1.18, n.s.], but they were able to discriminate base-triplets from non-base-triplets [t(20)=4.86, p<.001]. These results replicate our earlier findings by showing that subjects can learn higher-order spatial statistics across multiple images. More importantly, these results demonstrate that the statistically based coherence of scenes can operate on at least triplets of shapes which define a coherent object configuration, and that triplet statistics are not necessarily bootstrapped from pair-wise statistics.
Supported by NSF SBR-9873477 and the Schmitt Foundation.