The present series of experiments was aimed at testing whether training can reduce crowding.
Experiment 1 revealed that improvement in performance following training was largely restricted to the trained string (specific learning) although a small but unreliable improvement for untrained strings (unspecific learning) was also observed. When letter strings changed from trial to trial with no repetition (
Experiment 2), no gain in performance was obvious after hours of training, even when less familiar stimuli than letters were used as targets. The absence of learning effects in
Experiment 2 suggests that unspecific learning effects are of minor importance.
Experiments 3 and
4 were then designed to identify conditions that promote stimulus-specific leaning.
Experiment 3 indicated that after short training, stimulus-specific learning was restricted to the trained retinal location and to the exact interletter spacing used in training. Following longer training, however, learning generalized to untrained eccentricities and spacing configurations, although this transfer was not complete. Performance with trained display settings remained superior to performance with untrained settings and this superiority was still obvious 24 hr following training. The observed learning profile thus suggests that abstract representations of the stimulus (e.g., representations that are invariant to spacing and retinal eccentricity) and specific aspects of the letter strings were acquired through training. Finally,
Experiment 4 showed that when words were used as embedding context for the target letter, performance improved equally for trained and untrained display settings. This latter experiment also showed that for letters presented in isolation, training effects were scarcely observable.
All together, the present results thus clearly show that training improves the ability to identify flanked letters and that this learning effect partly depends on the sensory characteristics of the strings during training (for similar observations regarding font, see Sanocki,
1987). Hence, unlike current accounts that promote only abstract visual representation of familiar orthographic strings (e.g., Deheane et al.,
2005), in this study acquired knowledge about letter strings includes information about physical aspects of the stimulus—at least at the beginning of learning. With increasing experience stimulus representations seem to become more tolerant to surface variations, although it is yet unclear whether information about physical aspects of the stimulus ceases to be functional.
Whereas recognition of letters in strings improved after training, isolated letters did not profit from learning. Moreover, learning to recognize an embedded target letter improved even when participants received feedback about flanker identity only. To be able to report the correct target, training must thus have reduced interferences (pooling) between flankers and target. Taken together, the present findings suggest that learning reduces crowding. In fact, when confronted with a chain of letters, observers may actually attempt to process the string as a word, that is, holistically. For that purpose, visual information is spatially integrated. In the case of familiar chains, higher level internal representations of the stimulus exist already. Spatial integration of the chain might thus activate these representations and facilitate identification of the string and its embedded letters by top-down feedback. By contrast, in the absence of a higher level representation, holistic processing of the string results in interference among neighboring letter features, which is observed as the phenomenon of crowding (note that in tasks that do not encourage holistic processing like, e.g., visual search-like localization or detection tasks, interference between letters in unfamiliar strings differs from interference observable in identification tasks; Huckauf,
2006). The above speculation is in line with assumptions that conceptualize crowding as failure of feature integration (e.g., Bouma,
1970; Pelli et al.,
2004; Wolford,
1975). However, it adds that this failure arises because of the attempt to process an unfamiliar chain of letters like a word (i.e., holistic processing) without top-down information because higher level stimulus representations are not available.
It has to be stressed although that there are several critical issues that constrain the current findings. First, only skilled readers participated in this study. The limit of using this population is that feature- and letter-level information contained in the strings has already undergone extensive training. Hence, although the perception of unfamiliar visual objects might indeed benefit from training, skilled readers are not adequate for capturing such potential effects for the perception of isolated letters. A next important experimental step is therefore the investigation of training effects on the perception of isolated letters and on unfamiliar visual patterns in beginning readers.
Second, as already mentioned above, one problematic issue is surely the comparison of increments in correct responses across different levels of performance (e.g., learning to identify embedded versus isolated letters). Therefore, a replication of the present findings with an alternative measure of learning is desirable. One such solution would be to measure the threshold value of a given stimulus parameter because threshold of physical parameters provides a better estimation of the underlying metrics than proportion of errors (e.g., Farrell & Pelli,
1999). For such a measurement, however, one must take into consideration that each physical parameter is potentially subject to specific learning.
Third, to link the data to visual word recognition, effects of phonological and semantic information have to be taken into consideration. In fact, because in this study orthographic information was presented only in the visual periphery, the role of orthographic information and thus of sensory specificity could have been overestimated. The kind of perceptual learning that is reported here could also turn out to be of little relevance to natural reading because words are exposed at various retinal locations when a reader scans a page of text. This “training” might result in more position-independent internal representations and could explain why learning effects in the word context were more invariant to manipulations of spacing and retinal position. Curiously, however, due to the way eye movements are programmed during reading, most of the time words are perceived/fixated at the same retinal location (i.e., at the “preferred viewing position” slightly left of word center; e.g., Nazir, Ben-Bounayad, Decoppet, Deutsch, & Frost,
2004; Nazir, Heller, & Sussmann,
1992; Rayner,
1979; Vitu, O'Regan, & Mittau,
1990). Moreover, word recognition is effectively best at this preferred viewing position and drops with every letter of deviation from the “trained” retinal position (Nazir,
2000). This viewing position effect in word recognition is already observed after a few months of reading instructions (Aghababian & Nazir,
2000), which suggests that learning processes like those shown in this study may indeed underlie rapid, skilled word recognition. If this latter assumption is correct, the present data would thus have an important impact on current methods of reading instructions: Instead of only focusing on the training of letters and on the developments of higher level abstract word representations, perceptual training of whole words might lead to faster and stronger improvements of visual word recognition.
Finally, it is worth noting that for object recognition in general, there is evidence that internal presentations of even familiar objects are not completely devoid of positional information. Objects are processed faster and more accurately when they are presented in a canonical view (e.g., Blanz, Tarr, & Bülthoff,
1999; Tarr,
1995). This canonical view correlates with the standard viewpoint for an object, which seems to mainly depend on observers' experience with this particular view (Blanz et al.,
1999). Assuming similar mechanisms for letter/word recognition, this study is the first to attempt investigating the genesis of such canonical views for letter strings. Learning means that internal representations of letter strings emerge based on sensory experiences. The more frequent the sensory experiences are, the more abstract the internal representation becomes; that is, the more tolerant it will be to surface variations. The results of this study suggest that one important effect of learning is that crowding in letter strings decreases with increasing learning. In the reverse, this means that insufficient top-down information might be regarded as one basic source underlying crowding effects.