Abstract
Visual crowding provides a window into object recognition: observers fail to recognize objects in clutter. Here we ask, what do they see instead? We analyze observers' errors to show that crowding necessarily reflects the combination of information across multiple complex objects, rather than the mislocalization (or substitution) of one object for another. First, we presented single letters, randomly chosen, in noise in the periphery and tabulated a confusion matrix based on observers' (n=3) reports. We then tested the same observers in a classic crowding task, in which they viewed a triplet (target and two flankers) of closely spaced letters in the periphery (10 deg) and reported the identity of the middle target. For each observer, we tailored the triplets based on that observer's single-letter confusion matrix. One flanker was chosen to be a letter that was most confused with (most “similar” to) to the target, and the other was chosen to be a letter that was least confused (least similar). Consistent with the literature, when mistaken, observers tend to report the flankers. The crucial issue, however, is which of the two flankers observers report on these trials. Blind substitution predicts that the two flankers (similar and dissimilar) are equally likely to be reported. Instead, we find that observers are more likely to report the similar flanker (70%) than the dissimilar flanker (30%). The effect of similarity on erroneous responses proves that the response combines information from both the target and the reported flanker. By systematically tailoring the stimuli, we induced a bias in the reports that reveals a pooled, “mongrel-like,” underlying percept. Our method, applicable to any object, generalizes the evidence for “compulsory pooling” from the narrow domain of grating orientation (Parkes et al., 2001) to complex, everyday objects.