Crowding is, arguably, the most distinctive feature of peripheral vision. Although it is most commonly discussed within the context of reading and visual disorders, for example, amblyopia, crowding is a universal phenomenon and has implications for such diverse tasks as object recognition, visual search, and saccadic eye movements.
Crowding has been studied since the early 1960s (Stuart & Burian,
1962), but its physiological basis is still not understood. One of the earliest explanations of crowding was suggested by Estes (
1972,
1974). Using the observation that confusable (similar) letters produce stronger crowding, Estes suggested that crowding happens because similar features of nearby letters inhibit each other. More recently, Chung, Levi, and Legge (
2001) argued along the same lines referring to similarities between visual masking and crowding. An alternative explanation of crowding was originally proposed by Wolford (
1975) and later incorporated into the Krumhansl and Thomas (
1977) model. They argued that because peripheral features tend to be mislocalized (toward the fovea), these features are mixed with those on the foveal side, which produces crowding. A conceptually similar “pooling” hypothesis was later introduced by Levi, Hariharan, and Klein (
2002), Parkes, Lund, Angelucci, Solomon, and Morgan (
2001), and Pelli, Palomares, and Majaj (
2004). They suggested that once individual features are detected, the feature information is pooled over a local area of visual field, causing crowding.
The local inhibition explanation predicts that the feature signal is strongly reduced or even completely lost. The pooling explanations are less specific about what happens to the signal. The common assumption is that the location of individual features is lost. In other words, when experiencing crowding, we retain the “what” part of the signal, but not the “where” part.
Although these crowding hypotheses make specific predictions, there have been few direct tests of their validity. Perhaps, this is in part explained by the particular stimuli traditionally used in crowding studies. Crowding was first observed in reading (Ehlers,
1936) and in clinical tests of visual acuity (Ehlers,
1953) where letters or letter-like characters were a natural choice. Despite their practical significance, letters are poor candidates as visual stimuli for crowding. Because it is not well understood how such complex shapes are processed and encoded, it makes it hard to analyze what happens in crowding. Indeed, in one of his most cited papers, Bouma (
1970) suggests using less complicated stimuli.
The aim of this study was to determine explicitly what kind of information is lost in crowding. We asked observers to identify the orientations of a triplet of horizontally aligned Gabor patches each slanted left or right in all possible combinations. By using the simple stimuli in which the orientation and spatial frequency of all components were well controlled, we were able to analyze precisely what features were confused in crowding and the way in which they were confused. Simplified stimuli in the form of bars, circular patches of grating, Gabor patches, or letters made of Gabors have been used in several recent crowding studies (He, Cavanagh, & Intriligator,
1996; Levi et al.,
2002; Parkes et al.,
2001; Wilkinson, Wilson, & Ellemberg,
1997). Stimuli in the present study were modeled from those used by He et al. (
1996).
A study by Parkes et al. (
2001) concluded that information about individual target orientations was averaged in crowded stimuli. However, it is possible that the averaging found in this study was due to the particular task used rather than to crowding. In their orientation discrimination experiment, the number of targets was varied and the target positions were randomized, which made subjects uncertain about the locations of the targets. In this case, averaging becomes the best strategy, especially when the targets are more numerous than the distractors as was true for half the conditions in their study. The same experiment repeated in the fovea also indicated strong averaging, but it is thought that there is no crowding in the fovea.
An experiment with a single target and a variable number of distractors would be more informative because there would be no strong incentive for a subject to pool target and distractor signals. Such an experiment was carried out by Felisberti, Solomon, and Morgan (
2005). One of the two tested subjects showed no significant change in orientation discrimination performance, when the number of distractors varied from 2 to 5. The second subject showed some effect, but only when the target position was cued, and there was no significant effect for the noncued target. Also, there was no significant change in subjects' ability to detect an odd-orientation target among the array of distractors, when the number of distractors changed from 4 to 15. Altogether, the above evidence strongly suggests that averaging can be chosen by subjects as a strategy when the target location is uncertain.
Another potential problem with both the Felisberti et al. (
2005) and Parkes et al. (
2001) studies is that target and distractors in these studies often had similar orientations, which induced a strong illusory tilt (Solomon, Felisberti, & Morgan,
2004) and a related increase of orientation discrimination thresholds. Because these effects persist in the fovea, they are not related to crowding (Solomon & Morgan,
2006) and are possibly confounding crowding in the above studies.
Here, we report that no trivial loss of positional information resulting from averaging can explain crowding. Also, it cannot be explained by inhibitory lateral interactions, similar to surround suppression. We show, instead, that crowding strongly depends on the dissimilarity of neighboring visual elements as well as on their order (inward–outward). We argue that the results can be parsimoniously explained if all information except for the local feature contrast calculated in an anisotropic fashion is lost in crowding.