Free
Review  |   December 2011
Peripheral vision and pattern recognition: A review
Author Affiliations
Journal of Vision December 2011, Vol.11, 13. doi:https://doi.org/10.1167/11.5.13
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Hans Strasburger, Ingo Rentschler, Martin Jüttner; Peripheral vision and pattern recognition: A review. Journal of Vision 2011;11(5):13. https://doi.org/10.1167/11.5.13.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

We summarize the various strands of research on peripheral vision and relate them to theories of form perception. After a historical overview, we describe quantifications of the cortical magnification hypothesis, including an extension of Schwartz's cortical mapping function. The merits of this concept are considered across a wide range of psychophysical tasks, followed by a discussion of its limitations and the need for non-spatial scaling. We also review the eccentricity dependence of other low-level functions including reaction time, temporal resolution, and spatial summation, as well as perimetric methods. A central topic is then the recognition of characters in peripheral vision, both at low and high levels of contrast, and the impact of surrounding contours known as crowding. We demonstrate how Bouma's law, specifying the critical distance for the onset of crowding, can be stated in terms of the retinocortical mapping. The recognition of more complex stimuli, like textures, faces, and scenes, reveals a substantial impact of mid-level vision and cognitive factors. We further consider eccentricity-dependent limitations of learning, both at the level of perceptual learning and pattern category learning. Generic limitations of extrafoveal vision are observed for the latter in categorization tasks involving multiple stimulus classes. Finally, models of peripheral form vision are discussed. We report that peripheral vision is limited with regard to pattern categorization by a distinctly lower representational complexity and processing speed. Taken together, the limitations of cognitive processing in peripheral vision appear to be as significant as those imposed on low-level functions and by way of crowding.

Contents
Chapter 1. Introduction
The driver of a car traveling at high speed, a shy person avoiding to directly look at the object of her or his interest, and a patient suffering from age-related macular degeneration all face the problem of getting the most out of seeing sidelong. It is commonly thought that blurriness of vision is the main characteristic of that condition. Yet Lettvin (1976) picked up the thread where Aubert and Foerster (1857) had left it when he insisted that any theory of peripheral vision exclusively based on the assumption of blurriness is bound to fail: “When I look at something it is as if a pointer extends from my eye to an object. The ‘pointer’ is my gaze, and what it touches I see most clearly. Things are less distinct as they lie farther from my gaze. It is not as if these things go out of focus—but rather it's as if somehow they lose the quality of form” (Lettvin, 1976, p. 10, cf. Figure 1). 
Figure 1.
 
One of Lettvin's demonstrations. “Finally, there are two images that carry an amusing lesson. The first is illustrated by the O composed of small o's as below. It is a quite clearly circular array, not as vivid as the continuous O, but certainly definite. Compare this with the same large O surrounded by only two letters to make the word HOE. I note that the small o's are completely visible still, but that the large O cannot be told at all well. It simply looks like an aggregate of small o's.” (Lettvin, 1976, p. 14) with the permission of the New York Academy of Sciences.
Figure 1.
 
One of Lettvin's demonstrations. “Finally, there are two images that carry an amusing lesson. The first is illustrated by the O composed of small o's as below. It is a quite clearly circular array, not as vivid as the continuous O, but certainly definite. Compare this with the same large O surrounded by only two letters to make the word HOE. I note that the small o's are completely visible still, but that the large O cannot be told at all well. It simply looks like an aggregate of small o's.” (Lettvin, 1976, p. 14) with the permission of the New York Academy of Sciences.
To account for a great number of meticulous observations on peripheral form vision, Lettvin (1976, p. 20) suggested “that texture somehow redefined is the primitive stuff out of which form is constructed.” His proposal can be taken further by noting that texture perception was redefined by Julesz et al. (Caelli & Julesz, 1978; Caelli, Julesz, & Gilbert, 1978; Julesz, 1981; Julesz, Gilbert, Shepp, & Frisch, 1973). These authors succeeded to show that texture perception ignores relative spatial position, whereas form perception from local scrutiny does not. Julesz (1981, p. 97) concluded that cortical feature analyzers are “not connected directly to each other” in peripheral vision and interact “only in aggregate.” By contrast, research on form vision indicated the existence in the visual cortex of cooperative mechanisms that locally connect feature analyzers (e.g., Carpenter, Grossberg, & Mehanian, 1989; Grossberg & Mingolla, 1985; Lee, Mumford, Romero, & Lamme, 1998; Phillips & Singer, 1997; Shapley, Caelli, Grossberg, Morgan, & Rentschler, 1990). 
Our interest in peripheral vision was aroused by the work of Lettvin (1976). Our principal goal since was to better understand form vision in the peripheral visual field. However, the specifics of form vision can only be appreciated in the light of what we know about lower level functions. We therefore proceed from low-level functions to the recognition of characters and more complex patterns. We then turn to the question of how the recognition of form is learned. Finally, we consider models of peripheral form vision. As all that constitutes a huge field of research, we had to exclude important areas of work. We omitted work on optical aspects, on motion (cf. the paper by Nishida in this issue), on color, and on reading. We also ignored most clinical aspects including the large field of perimetry. We just touch on applied aspects, in particular insights from aviation and road traffic. 
More specifically, we review in Chapter 2: History of research on peripheral vision on research on peripheral vision in ophthalmology, optometry, psychology, and engineering sciences with a historical perspective. Chapter 3: Cortical magnification and the M-scaling concept addresses the variation of spatial scale as a major contributor to differences in performance across the visual field. Here, the concept of size scaling inspired by cortical magnification is the main topic. Levi's E2 value is introduced and we summarize E2 values over a wide range of tasks. However, non-spatial stimulus dimensions, in particular pattern contrast, are also important. Single-cell recording and fMRI studies support the concept for which we present empirical values and a logarithmic retinocortical mapping function that matches the inverse linear law. Further, low-level tasks reviewed are the measurements of visual reaction time, apparent brightness, temporal resolution, flicker detection, and spatial summation. These tasks have found application as diagnostic tools for perimetry, both in clinical and non-clinical settings. 
Peripheral letter recognition is a central topic in our review. In Chapter 4: Recognition of single characters, we first consider its dependence on stimulus contrast. We then proceed to crowding, the phenomenon traditionally defined as loss of recognition performance for letter targets appearing in the context of other, distracting letters (Chapter 5: Recognition of patterns in context—Crowding). Crowding occurs when the distracters are closer than a critical distance specified by Bouma's (1970) law. We demonstrate its relationship with size scaling according to cortical magnification and derive the equivalent of Bouma's law in retinotopic cortical visual areas. Furthermore, we discuss how crowding is related to low-level contour interactions, such as lateral masking and surround suppression, and how it is modulated by attentional factors. 
Regarding the recognition of scenes, objects, and faces in peripheral vision, a key question is whether observer performance follows predictions based on cortical magnification and acuity measures (Chapter 6: Complex stimulus configurations: Textures, scenes, and faces). Alternatively, it might be that configural information plays a role in the peripheral recognition of complex stimuli. Such information could result from mid-level processes of perceptual organization integrating local features into contours and contours into parts of objects or scenes. 
Of particular relevance for basic and clinical research is the possibility of improving peripheral form vision by way of learning (Chapter 7: Learning and spatial generalization across the visual field). Perceptual learning may enhance elementary functions such as orientation discrimination, contrast sensitivity, and types of acuity. This entails the question of whether crowding can be ameliorated or even removed by perceptual learning. We shall then proceed to consider possibilities of acquiring pattern categories through learning in indirect view. Of special interest is the extent of shift invariance of learned recognition performance and whether this imposes similar limitations on low-level and cognitive functions in peripheral vision. 
In Chapter 8: Modeling peripheral form vision, we review modeling peripheral form vision by employing concepts from computer vision, artificial neural networks, and pattern recognition. The most successful of these approaches are rooted in the above-mentioned work of Lettvin and Julesz et al. That is, they modeled peripheral form vision by deteriorating structure within image parts using some sort of summary statistics. An alternative approach, termed the method of classification images, uses techniques of system identification. Finally, cognitive limitations of peripheral form vision are explored using the analysis of category learning by means of psychometric methodologies based on statistical pattern recognition. 
Some remarks on terminology: The transition between the fovea and the region outside the fovea is smooth and there is no well-defined boundary between them. The uncertainty is reflected in a somewhat vague terminology. Speaking of foveal vision, we typically refer to the performance of the foveola having a diameter of 1 deg of arc (Wandell, 1995). The fovea's diameter according to Wandell (1995) is 5.2°. The parafovea (∼5°–9° Ø) and the perifovea (∼9°–17° Ø) extend around the fovea. Together, they make up the macula with a diameter of ∼17°. In perimetry, one might refer to the central visual field with 60° diameter (30° radius). Peripheral vision would then occur within the area from 60° (i.e., ±30°) up to around 214° horizontal diameter. However, as Korte (1923) noted, the functional differences for form recognition already occur at a few degrees eccentricity. He therefore used the term indirect vision. Here, we will refer to the central visual field as roughly that of the fovea and perifovea (<8° radius), to foveal vision below 2° eccentricity, and to peripheral vision for anything outside 2° eccentricity. 
Chapter 2. History of research on peripheral vision
2.1. Aubert and Foerster
The first quantitative measurements of indirect vision were conducted by Hueck (1840). As he measured only closely around the fovea, the first extensive study is the treatise by the physiologist Hermann Rudolph Aubert and the ophthalmologist Carl Friedrich Richard Foerster, in Breslau in 1857. Their perimeter (Figure 2a) allowed presentation of many different stimuli up to 60° eccentricity and used an electric arc for brief presentation to avoid eye movements. Letter acuity measurements were performed in a dark room that just allowed accommodation after 15-min dark adaptation. Using another apparatus, they also measured two-point resolution, i.e., the minimum resolvable distance of two black points (Figure 2b), in analogy (as they explain) to Ernst Heinrich Weber's resolution measurements with compass points on the skin in 1852. 
Figure 2.
 
(a) The perimeter built by Hermann Aubert and Carl Foerster in Breslau in 1855 to measure letter acuity in dark adaptation. “We had digits and letters printed on 2 ft wide and 5 ft long paper at equal distances. That paper sheet could be scrolled by two cylinders, such that new characters could always be brought into the visual field. The frame was adjustable between 0.1 and 1 m viewing distance …” (Aubert & Foerster, 1857). The use of an electric arc (“Riesssche Flasche”) for brief presentation dates back to Volkmann and Ernst Heinrich Weber. (b) Aubert and Foerster's (1857) results for photopic two-point resolution (measured with a different apparatus). The inner circle corresponds to 9° visual angle; measurements go out to 22°. Note the linear increase up to 14.5° radius, and steeper increase further out.
Figure 2.
 
(a) The perimeter built by Hermann Aubert and Carl Foerster in Breslau in 1855 to measure letter acuity in dark adaptation. “We had digits and letters printed on 2 ft wide and 5 ft long paper at equal distances. That paper sheet could be scrolled by two cylinders, such that new characters could always be brought into the visual field. The frame was adjustable between 0.1 and 1 m viewing distance …” (Aubert & Foerster, 1857). The use of an electric arc (“Riesssche Flasche”) for brief presentation dates back to Volkmann and Ernst Heinrich Weber. (b) Aubert and Foerster's (1857) results for photopic two-point resolution (measured with a different apparatus). The inner circle corresponds to 9° visual angle; measurements go out to 22°. Note the linear increase up to 14.5° radius, and steeper increase further out.
Aubert and Foerster's measurements of letter acuity demonstrated that, up to the blind spot, the minimum discernible size is essentially proportional to the maximum eccentricity angle. Minimum size increases (i.e., acuity decreases) at a steeper rate farther out. They also described the isopters (lines of equal acuity) as being elliptic rather than circular in shape, with the main axis along the horizontal meridian. For a more detailed description of the isopters, they performed a second experiment in which they measured with a different apparatus two-point separation under photopic conditions with unlimited viewing time. Here, the subjects were trained to fixate well. The pattern of results was more complex, showing a nasal/temporal anisotropy and considerable interindividual variation, but on the whole, the first experiment was confirmed. 
These results are well known. What is less well known is Aubert and Foerster's insight that peripheral vision seems to be qualitatively different from foveal vision in some rather strange way:
 

“When the two points cease to be distinguished as two, that is when they lie beyond the limiting point, they are not seen as a single point but quite peculiarly undetermined as something black, the form of which cannot be further stated. Also on the skin, in those bluntly sensing areas, two dividers' points never make qualitatively quite the same impression like a single dividers' point. … One either sees something black of indetermined form or one sees two points” (Aubert & Foerster, 1857, p. 30).1

 
The nature of this qualitative difference later became an issue for the Gestalt psychologists and is of particular interest for the present review. 
2.2. A timetable of peripheral vision research
Table 1 provides an overview of important dates in peripheral vision research. A first landmark was the publication of Fechner's book “Elemente der Psychophysik” in Leipzig (1860). Among other things, it presents a systematization of threshold measurement where Fechner coins the term “Weber's law” and develops his well-known logarithmic psychophysical scale. Many consider this book to be the birth of psychophysics. However, we are not certain to what extent it directly influenced threshold measurements. Few of the psychophysical papers reviewed here cite Fechner. Wertheim (1894), for example, whose isopters for square-wave grating acuity are shown in Figure 3, quotes Purkinje, Hueck, Volkmann, Aubert and Foerster, Weber, Landolt, and Helmholtz, but not Fechner. Possibly, Fechner had more influence on the area of psychometric scaling, and it seems that the traditions of psychophysics and psychometrics have stayed quite separate ever since—with a few notable exceptions (Klein & Macmillan, 2003; Macmillan, 2003). The foundations for the psychometric function, for example, were laid in the psychometrics tradition by F. M. Urban in three papers between 1907 and 1910. Urban (1910), in particular, introduced the term psychometric function (in analogy to the then established “biometric function,” p. 230), which is nowadays commonly used in threshold measurement (cf. Klein & Macmillan, 2003). 
Table 1.
 
Landmarks of peripheral vision research.
Table 1.
 
Landmarks of peripheral vision research.
Figure 3.
 
Square-wave grating acuity results by Theodor Wertheim (1894) in Berlin. The markings on the lines of constant acuity (isopters) are, from the inside outwards: 1; 0.333; 0.2; 0.143; 0.1; 0.074; 0.056; 0.045; 0.04; 0.033; 0.026. These were relative readings where central acuity is set equal to 1. Stimuli were constructed from wire frames.
Figure 3.
 
Square-wave grating acuity results by Theodor Wertheim (1894) in Berlin. The markings on the lines of constant acuity (isopters) are, from the inside outwards: 1; 0.333; 0.2; 0.143; 0.1; 0.074; 0.056; 0.045; 0.04; 0.033; 0.026. These were relative readings where central acuity is set equal to 1. Stimuli were constructed from wire frames.
With regard to peripheral vision, the second half of the 19th century saw a refinement of acuity measurement. We will review this briefly in Chapter 4.1.1: Letter acuity but mention a few milestones here. Wertheim (1894) explained that, while optotypes are important for the practicing ophthalmologist, simple and well-defined stimuli are required to obtain precise visual field topography. He used gratings produced by high-precision wire frames where the thickness and distance of the wires were measured in micrometers under a microscope (Helmholtz, 1867, had used similar objects). With respect to interindividual differences, Wertheim highlighted the importance of perceptual learning (cf. Chapter 7.1: Learning). He further pointed out that acuity depends on stimulus size (cf. our review of spatial summation in Chapter 3.6.4: Spatial summation). 
Two noteworthy papers were published by Basler (1906, 1908). They dealt with the minimum shift at which a movement is seen, in photopic vision and in the dark. For photopic vision, the surprising finding was that the minimum shift is in the range of Vernier acuity, “such that a movement can be seen between two points that would not be resolved on the retina” (p. 587). That minimum distance is 1/3 of a degree of arc in the fovea and steeply increases toward the periphery. The increase is shallower horizontally than vertically. The threshold is lower at higher speed and at higher luminance. In the dark, when there are no comparisons, the threshold increased around 4-fold (Basler, 1908). Despite the key role played by motion perception in peripheral vision, we will not review motion-related work in this paper for reasons of space. 
Concerning the physiological substrate underlying the psychophysical measurements, Fick (1898) and Wertheim (1894) related them to the density of retinal receptor cells. Excellent data on retinal cone and rod receptor densities were provided by Østerberg (1935; Figure 4; note the detail with which these measurements were taken) and still underlie many current textbook figures. Polyak (1932) went one step further and concluded from his anatomical studies that there must be a mathematical function that describes the retinocortical mapping. Talbot and Marshall (1941) studied this in the central part of the visual field and derived a projection factor that could be expressed by a single number. Yet acuity data and receptor densities remained in the center of interest (e.g., Pirenne, 1962). Weymouth (1958) concluded that receptor densities cannot underlie many of the decline functions from his extensive overview of acuity and other spatial visual performance measures (Figure 5), as well as of the neurophysiological literature. Instead, he proposed retinal ganglion cells as the possible neurophysiological substrate (cf. Curcio & Allen, 1990). 
Figure 4.
 
Cone and rod receptor density results by Østerberg (1935). These data underlie many of the current textbook figures.
Figure 4.
 
Cone and rod receptor density results by Østerberg (1935). These data underlie many of the current textbook figures.
Figure 5.
 
MAR functions reviewed by Weymouth (1958). “Comparison of vernier threshold, minimal angle of resolution, motion threshold, and mean variation of the settings of horopter rods” (1958, Figure 13). With permission from Elsevier.
Figure 5.
 
MAR functions reviewed by Weymouth (1958). “Comparison of vernier threshold, minimal angle of resolution, motion threshold, and mean variation of the settings of horopter rods” (1958, Figure 13). With permission from Elsevier.
For decades, acuities had been plotted on the ordinate of a typical graph—i.e., the inverse of a spatial threshold—but Weymouth advocated going back to showing the spatial thresholds directly. He called the latter “minimum angle of resolution” (MAR), a term still used today. 
Cowey and Rolls (1974) and Daniel and Whitteridge (1961) were next to study the relationship between the retinal and the primary cortical mapping (Figure 6), a strand of research that had started with the cortical maps provided by Holmes (1945; Holmes & Lister, 1916) and Inouye (1909). We will come back to the cortical magnification concept in Chapter 3.1: The cortical magnification concept
Figure 6.
 
(a) Retinotopic organization of area V1 by Daniel and Whitteridge (1961). Vertical lines show eccentricity boundaries, horizontal curved lines show radians as in the visual half-field in (b). “This surface is folded along the heavy dotted lines so that F touches E, that D and C touch B, and A folds round so that it touches and overlaps the deep surface of B.” (1961, p. 213). With permission from Wiley.
Figure 6.
 
(a) Retinotopic organization of area V1 by Daniel and Whitteridge (1961). Vertical lines show eccentricity boundaries, horizontal curved lines show radians as in the visual half-field in (b). “This surface is folded along the heavy dotted lines so that F touches E, that D and C touch B, and A folds round so that it touches and overlaps the deep surface of B.” (1961, p. 213). With permission from Wiley.
The history of peripheral vision research is also that of a peculiar neglect of the role of visual spatial attention. In the 19th and the beginning of the 20th century, perceptual scientists were well aware of spatial attention. Johannes Müller in 1825 explained that fixation and attention can be decoupled. Hermann von Helmholtz (1871) showed this experimentally and pointed out that spatial attention is more important than fixation for perceptual performance. The Gestalt psychologists also discussed the role of attention (Korte, 1923; Wagner, 1918). However, at some point, awareness was lost in the study of “low-level” functions, like acuity or light sensitivity, and the study of spatial attention became confined to the predecessor of cognitive psychology (Eriksen & Rohrbaugh, 1970; Jonides, 1981; Posner, Snyder, & Davidson, 1980; Trevarthen, 1968; Yantis & Jonides, 1984). Nakayama and MacKeben (1989) at last brought the concept of attention back to perception research. They pointed out differences in time constants between slow, consciously controlled “sustained” and fast, reflex-like “transient” attention. Pertinent to peripheral vision, MacKeben (1999) showed that sustained attention is anisotropic with a dominance of the horizontal meridian. Since most, if not all, visual acuity measurements outside the fovea were conducted using paradigms where the location of the next target was known to the subject, the anisotropy will have an impact on the results. The modulating influence of spatial attention on perceptual performance, including tasks considered low level, has since been shown in numerous studies (e.g., Carrasco, Penpeci-Talgar, & Eckstein, 2000; Carrasco, Williams, & Yeshurun, 2002; Poggel, Strasburger, & MacKeben, 2007; Talgar, Pelli, & Carrasco, 2004). We return to the role of spatial attention in peripheral vision in Chapter 5: Recognition of patterns in context—Crowding
We finish this brief historical overview with three psychophysical papers. Anstis (1974) helped popularize the phenomena of indirect vision by providing demonstration charts that nicely capture some essentials. Figure 7 shows peripheral letter acuity. Compare this chart with his demonstration of crowding from the same paper that is shown in Figure 19 in Chapter 5: Recognition of patterns in context—Crowding. The complementary approach for characterizing the visual field is by measuring luminance increment (or contrast) thresholds. Harvey and Pöppel (1972) presented detailed perimetry data (Figure 8a) and derived a schematic characterization of the visual field with respect to sensitivity (Pöppel & Harvey, 1973). The interesting point is that isopters are isotropic in the center part of the field but elongated horizontally further out. At the transition, there is a performance plateau on the horizontal but not on the vertical meridian (Figure 8b). We will come back to this in Chapter 4.2: Low-contrast characters
Figure 7.
 
Demonstration of peripheral letter acuity by Anstis (1974) (cut-out). Letter sizes are chosen such that they are at the size threshold (2 sj's, 216 cd/m2) during central fixation. Surprisingly, this is true almost regardless of viewing distance, as eccentricity angle and viewing angle vary proportionally with viewing distance. (To obtain the chart in original size, enlarge it such that the center of the lower “R” is 66 mm from the fixation point). With permission from Elsevier.
Figure 7.
 
Demonstration of peripheral letter acuity by Anstis (1974) (cut-out). Letter sizes are chosen such that they are at the size threshold (2 sj's, 216 cd/m2) during central fixation. Surprisingly, this is true almost regardless of viewing distance, as eccentricity angle and viewing angle vary proportionally with viewing distance. (To obtain the chart in original size, enlarge it such that the center of the lower “R” is 66 mm from the fixation point). With permission from Elsevier.
Figure 8.
 
Characterization of the visual field by Pöppel and Harvey. (a) Perimetry data by Harvey and Pöppel (1972), i.e. light increment thresholds. Reproduced with permission from The American Academy of Optometry 1972. Note that on the temporal side the visual field extends further out than seen by the outer isopter, to around 107°; it is limited here by the test spot used. (b) Schematic representation of the visual field by Pöppel and Harvey (1973) based on the data in a. They distinguish five regions: (A) the fovea which shows highest photopic sensitivity; (B) the perifovea with a radius of around 10° where photopic thresholds increase with eccentricity; (C) a performance plateau extending to around 20° vertically and 35° horizontally where the dashed circle shows the nasal border; (D) peripheral field where thresholds increase up to the border of binocular vision; (E) monocular temporal border region. The two black dots are the blind spots.
Figure 8.
 
Characterization of the visual field by Pöppel and Harvey. (a) Perimetry data by Harvey and Pöppel (1972), i.e. light increment thresholds. Reproduced with permission from The American Academy of Optometry 1972. Note that on the temporal side the visual field extends further out than seen by the outer isopter, to around 107°; it is limited here by the test spot used. (b) Schematic representation of the visual field by Pöppel and Harvey (1973) based on the data in a. They distinguish five regions: (A) the fovea which shows highest photopic sensitivity; (B) the perifovea with a radius of around 10° where photopic thresholds increase with eccentricity; (C) a performance plateau extending to around 20° vertically and 35° horizontally where the dashed circle shows the nasal border; (D) peripheral field where thresholds increase up to the border of binocular vision; (E) monocular temporal border region. The two black dots are the blind spots.
Chapter 3. Cortical magnification and the M-scaling concept
3.1. The cortical magnification concept
Most visual functions2 including form vision in the primate are mediated by the primary retinocortical pathway (receptors–ganglion cells–LGN–area V1), and the pathway's retinotopic organization is reflected in the psychophysical results. If in a given neural layer the circuitry is assumed to be similar across the visual field, it makes sense to consider for the processing power just the neural volume or even just the area dedicated to processing of any small region of the visual field. This idea underlies the concept of cortical magnification. The linear cortical magnification factor M was defined by Daniel and Whitteridge (1961) as “the diameter in the primary visual cortex onto which 1 deg of the visual field project.” It can be used as linear or as areal factor, where the latter is the square of the former. M can be considered for every structure that is retinotopically organized, and indeed, there are now good estimates for many areas, obtained by single-cell studies or fMRI (cf. Chapter 3.3: Schwartz's logarithmic mapping onto the cortex; for reviews of cortical magnification and M-scaling, see, e.g., Drasdo, 1991; Pointer, 1986; Slotnick, Klein, Carney, & Sutter, 2001; Strasburger, Rentschler, & Harvey, 1994; van Essen & Anderson, 1995; Virsu, Näsänen, & Osmoviita, 1987; Wässle, Grünert, Röhrenbeck, & Boycott, 1990). 
Even though M describes neuroanatomical properties, it can be well approximated by psychophysical methods involving low-level tasks (Cowey & Rolls, 1974; Daniel & Whitteridge, 1961; Koenderink, Bouman, Bueno de Mesquita, & Slappendel, 1978; Rovamo & Virsu, 1979; Rovamo, Virsu, & Näsänen, 1978). Two estimation approaches can be distinguished, direct and indirect estimation. Direct estimation determines the variation of a size threshold across the visual field. Examples are optotype acuity, grating acuity, and vernier acuity, i.e., tasks where a size threshold can be meaningfully determined (Weymouth, 1958). In the indirect approach, the targets are size-scaled such that performance on some non-spatial measure like contrast sensitivity equals the foveal performance. It is applicable whenever target size and the criterion measure are in some inverse relationship. Particularly popular has been the application to grating contrast sensitivity by Rovamo et al. (1978). Both in the direct and indirect approaches, the foveal value M0 remains a free parameter and needs to be obtained by some other way. 
Measurements should be taken in polar coordinates, i.e., along iso-eccentric or iso-polar lines in the visual field. M can be determined from anatomical and physiological data (Duncan & Boynton, 2003; Horton & Hoyt, 1991; Larsson & Heeger, 2006; Slotnick et al., 2001; van Essen, Newsome, & Maunsell, 1984) or psychophysically by the minimal angle of resolution (MAR) or the size threshold in low-level psychophysical tasks (Rovamo & Virsu, 1979; Virsu & Rovamo, 1979; Virsu et al., 1987). Figure 9 shows several examples. Weymouth (1958) had proposed plotting MAR on the ordinate instead of its inverse (as was customary before), since the MAR varies as an approximately linear function with eccentricity. In line with that suggestion, Figure 9 shows the inverse of M, which corresponds to visual angle per tissue size. 
Figure 9.
 
Examples of M scaling functions. By definition, only size is considered in the scaling (modified from Strasburger, 2003b). For easy comparison these functions disregard the horizontal/vertical anisotropy. Curve (a): The function used by Rovamo and Virsu (1979), M−1 = (1 + aE + bE3) · M0−1, with the values a = 0.33; b = 0.00007; Mo = 7.99 mm/° (for the nasal horizontal meridian). Curve (b) (dashed line): Power function with exponent 1.1 used by van Essen et al. (1984) for their anatomical results, M−1 = (1 + aE)1.1 · M0−1, but with parameters a and Mo like in (a) for a comparison of the curves' shapes. Curve (c): Same function as in (b) but with values given by van Essen et al. (1984) for the macaque, a = 1.282 and Mo = 15.55 mm/°. Curve (d): Same function as in (b) but with values estimated by Tolhurst and Ling (1988) for the human, Mo estimated by 1.6-fold larger: Mo = 24.88 mm/°. Curve (e) (green, dashed): Inverse linear function with values from Horton and Hoyt (1991): E2 = 0.75 and M0 = 23.07 mm/°. Curve (f) (red, long dashes): Inverse linear function with values from Schira, Wade, and Tyler (2007): E2 = 0.77 and M0 = 24.9 mm/° (root of areal factor). Curve (g) (blue, long dashes): Inverse linear function with own fit to Larsson and Heeger's (2006) area-V1 location data: M0 = 22.5; E2 = 0.785. Curve (h) (purple, dash-dotted): Inverse linear function with values from Duncan and Boynton (2003): M0 = 18.5; E2 = 0. 0.831.
Figure 9.
 
Examples of M scaling functions. By definition, only size is considered in the scaling (modified from Strasburger, 2003b). For easy comparison these functions disregard the horizontal/vertical anisotropy. Curve (a): The function used by Rovamo and Virsu (1979), M−1 = (1 + aE + bE3) · M0−1, with the values a = 0.33; b = 0.00007; Mo = 7.99 mm/° (for the nasal horizontal meridian). Curve (b) (dashed line): Power function with exponent 1.1 used by van Essen et al. (1984) for their anatomical results, M−1 = (1 + aE)1.1 · M0−1, but with parameters a and Mo like in (a) for a comparison of the curves' shapes. Curve (c): Same function as in (b) but with values given by van Essen et al. (1984) for the macaque, a = 1.282 and Mo = 15.55 mm/°. Curve (d): Same function as in (b) but with values estimated by Tolhurst and Ling (1988) for the human, Mo estimated by 1.6-fold larger: Mo = 24.88 mm/°. Curve (e) (green, dashed): Inverse linear function with values from Horton and Hoyt (1991): E2 = 0.75 and M0 = 23.07 mm/°. Curve (f) (red, long dashes): Inverse linear function with values from Schira, Wade, and Tyler (2007): E2 = 0.77 and M0 = 24.9 mm/° (root of areal factor). Curve (g) (blue, long dashes): Inverse linear function with own fit to Larsson and Heeger's (2006) area-V1 location data: M0 = 22.5; E2 = 0.785. Curve (h) (purple, dash-dotted): Inverse linear function with values from Duncan and Boynton (2003): M0 = 18.5; E2 = 0. 0.831.
Various analytic functions have been used to describe the relationship shown in Figure 9; they are summarized in Table 2. However, as already apparent from Wertheim's (1894) data (also used by Cowey & Rolls, 1974), an inverse linear function fits those data nicely:  
\begin{eqnarray}\!\!\!\!\!\!\!\!\!\!\!\!\! \begin{array}{@{}l@{}} {M}^{ - 1} = M_0^{ - 1}\cdot\left( {1 + aE} \right) = M_0^{ - 1}\cdot\left( {1 + E/{E}_2} \right) = bE + c,\\ {\rm{with}}\,b = a/{M}_0 = 1/( {{M}_0{E}_2} )\;{\rm{and}}\;c = M_0^{ - 1}. \end{array}\quad \end{eqnarray}
(6)
 
Rovamo and Virsu added a third-order term to capture the slight non-linearity that they observed in their data (Equation 3 in Table 2; Rovamo & Virsu, 1979; Rovamo et al., 1978; Virsu & Rovamo, 1979). They based their estimate on retinal ganglion cell densities on the assumption that the subsequent mapping in the lateral geniculate is 1:1, such that the scale would be the same in the retina and cortex. This assumption has been shown to be incorrect (see below). The third-order term is small and is not needed in central vision. Note, however, that when it is used (i.e., when b ≠ 0) it will affect both the linear coefficient and the foveal value \(M_0^{-1}\) considerably so that they are not directly comparable to the corresponding values in Equation 1 in Table 2
Table 2.
 
Scaling equations proposed by various authors (modified from Strasburger, 2003b).
Table 2.
 
Scaling equations proposed by various authors (modified from Strasburger, 2003b).
van Essen et al. (1984) used an exponent different from 1 to achieve a slight non-linearity (Equation 4 in Table 2). Tolhurst and Ling (1988) extrapolated data from the macaque (reported by van Essen et al.) to the human using the same function. Virsu and Hari (1996) derived, from geometric considerations, a sine function of which only one-eighth of a period is used for describing that relationship (Equation 5 in Table 2). 
Whether M−1(E) is indeed linear at small eccentricities seems still an unresolved question. Drasdo (1989) explicates this (Figure 10). Drasdo's figure refers to retinal ganglion cell density (the ordinate showing the square root of areal ganglion cell density), but the same argument applies to the cortical cell density. The problem arises from the fact that the density of ganglion cells onto which the receptors in the foveola project cannot be determined directly but needs to be inferred from more peripheral measurements. The anatomical reason is that central ganglion cells are displaced laterally in the retina to not obscure the imaging onto the central receptors. Then again, the length of the connecting fibers of Henle is difficult to measure (e.g., Wässle et al., 1990). For the estimation, in the figure, the hatched area under the curve is set equal to the area under the dashed line. Even if the steep increase of the curve toward smallest eccentricity (corresponding to a decreasing ganglion cell density toward the very center) might overstate the issue, there is no guarantee that ganglion cell density keeps increasing toward the center. More recently, Drasdo, Millican, Katholi, and Curcio (2007) have provided a more precise estimate of the length of the Henle fibers (406–675 μm) and, based on that, estimated the ganglion-cell-to-cone ratio in the fovea's center as 2.24:1—not too different from the value of 3–4:1 previously reported by Wässle et al. (Wässle & Boycott, 1991; Wässle et al., 1990). 
Figure 10.
 
Estimation of ganglion cell density by Drasdo (1989). The continuous line shows the inverse of the linear ganglion cell density as a function of eccentricity. According to the model, the hatched area under the curve is equal to the area under the dashed-line (from Strasburger, 2003b, modified from Drasdo, 1989, Figure 1). With permission from Elsevier.
Figure 10.
 
Estimation of ganglion cell density by Drasdo (1989). The continuous line shows the inverse of the linear ganglion cell density as a function of eccentricity. According to the model, the hatched area under the curve is equal to the area under the dashed-line (from Strasburger, 2003b, modified from Drasdo, 1989, Figure 1). With permission from Elsevier.
Estimates of cortical magnification that rest on estimates of retinal ganglion cell density are based on the assumption that the mapping scale is more or less preserved in the LGN. However, already work from the 1990s suggests that this assumption is highly inaccurate (e.g., Azzopardi & Cowey, 1993, 1996a, 1996b). Furthermore, the mapping scale within the LGN varies with eccentricity and differently for parvo (P) and magno (M) cells: For example, Azzopardi, Jones, and Cowey (1999) reported that the P/M ratio decreases from 35:1 in the fovea (<1°) to 5:1 at 15° eccentricity and showed that this variation does not reflect retinal ganglion cell densities. The high foveal P/M ratio might be an overestimate, since there are data to suggest only little convergence from ganglion cells to LGN relay cells, and the high foveal ratio would imply an unusual degree of divergence from retinal P cells to LGN relay cells (B. B. Lee, personal communication). Even if the ratio is closer to earlier estimates of 10:1 to 16:1 (Grünert, Greferath, Boycott, & Wässle, 1993), the fact remains that the P/M ratio changes with eccentricity. Many perceptual tasks are mediated by both the parvo- and magnocellular pathways where the relative contribution of the two is governed by stimulus characteristics. Thus, even for elementary perceptual tasks that are believed to rely on pre-cortical processing, different scaling functions would be required, depending upon whether—for that task—pre- or post-geniculate processing dominates and whether the parvo or the magno stream contributes more. Drasdo (1991) thus advocates a multichannel and multilevel modeling for the pre-cortical stream. In this context, it should be noted that current views of the roles of M and P pathways differ from earlier textbook accounts. For example, contrary to previous assumptions, the spatial resolution of P and M pathways seems to be comparable, with parasol (P and M) retinal ganglion cells showing a similar size of their receptive field centers and a similar dependency on retinal eccentricity (see review by Lee, Martin, & Grünert, 2010, Figure 5). Lee et al. (2010) further contend that the parvocellular pathway does not support an achromatic spatial channel. In addition, Vernier acuity tasks appear to rely on the magno- rather than the parvocellular pathway (Lee, Wehrhahn, Westheimer, & Kremers, 1995; see the review by Lee, 2011). The conceptual link between afferent peripheral pathways and psychophysical tasks considered here is further complicated by the fact that those pathways can show higher sensitivity than the central mechanisms. For example, parvo cells respond to chromatic modulation at high temporal frequencies (30–40 Hz), whereas chromatic psychophysical sensitivity decreases steeply above 4 Hz. Thus, signals of the parvo pathway do not, in this case, reach conscious perception (Lee, 2011, Figure 2). 
3.2. The M-scaling concept and Levi's E2
It is now well established that for many visual functions the variation of performance across the visual field is based—partly or fully—on the projection properties of the afferent visual pathway. Performance variations with eccentricity can, therefore, be minimized by using appropriately scaled stimuli, i.e., stimuli that are larger in the periphery. However, just which anatomical factor or factors to choose for the scaling for any given task is a matter of debate. Many authors have opted to use size scaling as a predominantly psychophysical rather than a neuroanatomical concept (e.g., Levi & Klein, 1985; Levi, Klein, & Aitsebaomo, 1984; Virsu et al., 1987; Watson, 1987b). Watson (1987b) coined the term local spatial scale effective at a given visual field location to emphasize that an assumption as to which substrate underlies performance for any particular visual task is not required. As Watson (1987b showed, a valid empirical estimate of local spatial scale can be obtained by equalizing the high-spatial-frequency limb of the contrast sensitivity function. 
To compensate for the influence of M, the inverse of any of the functions given in Table 2 can be used, e.g.,  
\begin{eqnarray} S = {S}_0\cdot\left( {1 + E/{E}_2} \right),\quad \end{eqnarray}
(7)
where S is the stimulus size at eccentricity E, S0 is the threshold size at E = 0, i.e., in the center of the fovea, and E2 is a constant related to the slope b of the function:  
\begin{eqnarray} b = {S}_0/{E}_2.\quad \end{eqnarray}
(8)
 
Stimuli according to Equation 7 are called M-scaled, or simply scaled. With E2 properly chosen, they project onto equal cortical areas independent of eccentricity. For a stimulus of arbitrary size S, its projection size Sc (in mm cortical diameter) is predicted by Equation 9:  
\begin{eqnarray} {S}_{\rm{c}} = S\cdot{M}_0/\left( {1 + E/{E}_2} \right).\quad \end{eqnarray}
(9)
 
The parameter E2 in these equations was introduced by Levi and Klein (Levi et al., 1984; Levi, Klein, & Aitsebaomo, 1985) as a single summary descriptor providing a quick way of comparing the eccentricity dependencies across visual tasks. From Equation 7, it can be seen that it corresponds to the eccentricity at which S is twice the foveal value. Another graphical interpretation is that E2 is the function's intercept with the abscissa as shown in Figure 11. Note that the function's slope is not determined by E2 alone and can be inferred from E2 only if the function's foveal value is fixed and known. The intended comparison of slopes on the basis of E2 is thus meaningful, e.g., for fovea-normalized functions. Furthermore, since the empirical functions deviate somewhat from linearity and these deviations are more apparent at larger eccentricities, E2 comparisons are best restricted to central vision. These limitations of using E2 are illustrated in Figure 11 and listed in Table 3. Finally, since E2 can get very small, a ratio of E2 values is not necessarily well defined. Levi et al.'s (1985, Table 1) values vary in a range of 1:40. Mäkelä, Whitaker, and Rovamo (1992) point out that the ratio can get as large as 1:200. 
Figure 11.
 
Schematic illustration of the E2 value. Four functions with same E2 are shown, two linear functions with different foveal values, and two non-linear functions with same foveal value (from Strasburger, 2003b, Chpt. 4).
Figure 11.
 
Schematic illustration of the E2 value. Four functions with same E2 are shown, two linear functions with different foveal values, and two non-linear functions with same foveal value (from Strasburger, 2003b, Chpt. 4).
Table 3.
 
Caveats for using E2.
Table 3.
 
Caveats for using E2.
In summary, caution in interpreting E2 should be used (a) if the foveal value is not measured but is inferred only (e.g., for ganglion cell densitiy) or is unreliable, (b) if the foveal value is not representative for the function, e.g., because the deviation from linearity is substantial, or (c) if a normalization is not meaningful, for example, when the same visual task is compared across subjects (Table 3). 
With these caveats in mind, Tables 4, 5, and 6 show a collection of E2 values taken or inferred from the literature. 
Table 4.
 
E 2 values for various visual tasks and anatomical estimates (first three columns). The last column shows the resulting slope b in Equations 1 and 3, with the foveal value M0 or S0 set to 1. (Table extended from Strasburger, 2003b, p. 78; *Asterisks denote values added by Strasburger).
Table 4.
 
E 2 values for various visual tasks and anatomical estimates (first three columns). The last column shows the resulting slope b in Equations 1 and 3, with the foveal value M0 or S0 set to 1. (Table extended from Strasburger, 2003b, p. 78; *Asterisks denote values added by Strasburger).
Table 5.
 
E 2 and M0 values obtained with non-invasive objective techniques, with psychophysical studies (ΨΦ) added for comparison. Asterisks (*) denote values added by Strasburger.
Table 5.
 
E 2 and M0 values obtained with non-invasive objective techniques, with psychophysical studies (ΨΦ) added for comparison. Asterisks (*) denote values added by Strasburger.
Table 6.
 
E 2 values from Drasdo (1991, Table 19.2 on p. 258) for the horizontal meridian.
Table 6.
 
E 2 values from Drasdo (1991, Table 19.2 on p. 258) for the horizontal meridian.
3.3. Schwartz's logarithmic mapping onto the cortex
The cortical magnification factor M relates cortical sizes to retinal sizes. It is a local mapping in that a small circular patch in the visual field is mapped onto an elliptical area in one of the early visual areas. From the relationship M(E), one can, under the assumption of retinotopy, derive the global mapping function for that cortical area by integrating the function along a meridian starting from the fovea:  
\begin{eqnarray} \delta = \int\limits_{0}^{E}{{M\left( E \right)dE,}}\quad \end{eqnarray}
(10)
where δ is the distance, in millimeters, on the cortical surface from the cortical representation of the fovea's center along the meridian's projection. Schwartz (1980) has exposed this in his cybernetic treatise on cortical architecture and has noted that, if M−1 is proportional to eccentricity, the cortical distance is proportional to the logarithm of eccentricity, i.e.,  
\begin{eqnarray} \delta \propto {\rm{In}}E\quad \end{eqnarray}
(11)
with scaling factors that can be chosen differently between meridians. Empirical mapping functions obtained by fMRI are provided in Duncan and Boynton (2003), Engel, Glover, and Wandell (1997), Sereno et al. (1995), Larsson and Heeger (2006), Popovic and Sjostrand (2001), Schira, Wade, and Tyler (2007), and Schira, Tyler, Breakspear, and Spehar (2009). 
Schwartz's proportionality assumption corresponds to c = 0 and E2 = 0 in Equation 6. It is useful for sufficiently large eccentricities that are of primary interest in anatomical and physiological studies. However, the assumption becomes highly inaccurate below about 3°, and in the center of the fovea (i.e., when E = 0), Equations 611 are undefined or diverge. To solve this problem, we can use the standard inverse linear cortical magnification rule as stated in Equation 6 above and plotted in Figures 9 and 11. Using Equations 6 and 10, we arrive at  
\begin{eqnarray}\!\!\!\!\!\!\!\!\!\!\!\! \begin{array}{@{}l@{}} \delta = \displaystyle\int\limits_{0}^{E}{{M(E)dE}} = \int\limits_{0}^{E} \frac{{{M}_0}}{{1 + E/{E}_2}}dE \\\\= {M}_0{E}_2\ln \left( 1 + {E/ {{E}_2}}\right),\; i.e.,\;\delta = {M}_0{E}_2\ln \left( 1 + {E /{{E}_2}} \right), \end{array}\!\!\!\!\!\!\nonumber\\ \end{eqnarray}
(12)
 
with notations as before (Strasburger & Malania, in revision). This equation uses the notation established in psychophysics, holds over a large range of eccentricities, and is well defined in the fovea. 
In the neuroscience literature, often the inverse function E = E(δ) is used. Engel et al. (1997), for example, use E = exp( + b), i.e., the inverse function to Equation 11. It corresponds to Equation 13, with the constant term “−1” being dismissed, and is undefined in the fovea. With the notations used here, the inverse function to Equation 12 is given by  
\begin{eqnarray} E = {E}_2\left( {{e}^{\frac{\delta }{{{M}_0{E}_2}}} - 1} \right).\quad \end{eqnarray}
(13)
 
Again, this equation uses well-established notation, holds over a large eccentricity range, and is well defined in the fovea. 
3.4. Successes and failures of the cortical magnification concept
The cortical magnification hypothesis has been a story of successes and failures. That in many visual tasks thresholds vary linearly with eccentricity had been long known since Aubert and Foerster's report. It was summarized concisely by Weymouth (1958), who had conjectured that retinal properties are at the basis of this property. The cortical magnification hypothesis, then, brought forward by Cowey and Rolls (1974) and Daniel and Whitteridge (1961), again gave rise to a large number of studies. It culminated in a pointed statement by Rovamo et al. (1978, p. 56) that “a picture can be made equally visible at any eccentricity by scaling its size by the magnification factor, because the contrast sensitivity function represents the spatial modulation transfer function of the visual system for near-threshold contrasts.” By invoking the systems-theoretical concept of the modulation transfer function (MTF, see, e.g., Caelli, 1981), this seemed to provide a causal explanation as to why the first stage of visual processing could be modeled by a signal-processing module, the characteristics of which are captured by a mere change of spatial scale. It was considered a breath of fresh air by visual physiologists since it refuted the prevailing view of separate systems in cognitive psychology (e.g.,Trevarthen, 1968) and allowed for a uniform treatment of fovea and periphery. A great many studies were subsequently published in support of the cortical magnification concept. However, not only was the invoking of the MTF inappropriate in this context, but in the prevailing enthusiasm also a great number of incompatible empirical findings were hushed up, as Westheimer (1982, p. 1613) pointedly criticized. Even today, Westheimer's critique appears valid and up to date. 
Exactly what constitutes a success or a failure is less clear cut as it seems. It will depend on how narrow the criteria of fulfillment are set by the researcher, and conflicting conclusions may result. The strong, all-embracing hypothesis put forward by Rovamo and Virsu (1979; see above) is hardly, if ever, satisfied. Even in the specific case of the grating contrast sensitivity function (CSF), where it had originally been offered, an unexplained factor of two in the change of this function remains. A more cautious explanation with respect to the generality of the claim was given by Koenderink et al. (1978, p. 854) who propose that “if the just resolvable distance at any eccentricity is taken as a yardstick and (stimuli) are scaled accordingly, then the spatio-temporal contrast detection thresholds become identical over the whole visual field. (…) The just resolvable distance correlates well (…) with the cortical magnification factor.” A third, still weaker claim would be to give up constraints with respect to just what the “correct” M factor is and use size scaling such that it optimally equalizes performance (e.g., Watson, 1987b). In the light of the difficulties pointed out in Chapter 3.2: The M-scaling concept and Levi's E2, this pragmatic approach appears highly useful and the M and E2 values summarized above can still be used as a yardstick. Even though the M(E) function that is then used might differ considerably from the anatomical functions, the term “M-scaling” is still often used as a shortcut. A fourth, again more general concept is that spatial scaling is used together with scaling of further, non-spatial variables (e.g., Virsu et al., 1987). We will return to that case in Chapter 3.5: The need for non-spatial scaling
A bewildering variety of visual functions have been studied with respect to whether or not they are scalable. They are summarized in Table 7 and organized in terms of direct and indirect estimation (cf. Chapter 3.1: The cortical magnification concept), with a further subdivision into two cases, where size measurement itself is the criterion: D1, where the size threshold is compared to M, and D2, where a suprathreshold size is compared to M. A typical example for D1 is acuity; an example for D2 would be migraine scotoma size as studied by Grüsser (1995). 
Table 7.
 
Summary of literature reports on successes and failures of cortical magnification and M-scaling. In the first column, three approaches are distinguished: direct estimation of the first kind (D1) where a size threshold is compared with M, direct estimation of the second kind (D2), where a (supra-threshold) size is compared with M, and indirect estimation (Ind), where some other measure is equalized by scaling.
Table 7.
 
Summary of literature reports on successes and failures of cortical magnification and M-scaling. In the first column, three approaches are distinguished: direct estimation of the first kind (D1) where a size threshold is compared with M, direct estimation of the second kind (D2), where a (supra-threshold) size is compared with M, and indirect estimation (Ind), where some other measure is equalized by scaling.
Perceptual functions that have been reported as successfully scalable are a variety of acuity and low-level discrimination tasks, as well as various low-level biopsychological measures like the diameter of Panum's fusion area, migraine scotoma size, and phosphenes from cortical stimulation. An often cited success is grating contrast sensitivity as a function of both spatial and temporal frequency. However, for grating contrast sensitivity, García-Pérez and Sierra-Vásquez (1996) vehemently contradict scalability, listing as many as 46 empirical reports that show a steeper than tolerable, if only moderate, decline with eccentricity. 
Then, there are perceptual functions with conflicting evidence. Best known are hyperacuity tasks, where pro-scaling reports include a crowding Vernier acuity task and contra-scaling reports include bisection hyperacuity. The consensus is that these tasks (like acuities) do not form a homogeneous group. However, there is also disagreement about tasks that have traditionally been considered scaling successes (e.g., orientation sensitivity, two-dot separation). For example, two-dot separation discrimination, which seemed to be size-scalable from the graph in Aubert and Foerster's (1857) classical paper, was shown to be a scaling failure in the near periphery by Foster, Gravano, and Tomoszek (1989). Finally, there are the clear failures of M-scaling, which include a wide variety of tasks, as listed in the table. Tyler (1999) even reports reverse eccentricity scaling for symmetry detection. In our own work, we have concentrated on low-contrast character recognition. 
It is difficult to discern a common pattern as to which visual tasks are scalable. In addition, over the years, tasks that were assumed to be prime examples of scalability were dismissed as beset with problems. Perhaps, a common characteristic of the scalable tasks would be that they are mostly considered depending upon low-level processing (up to V1). From the failure of scaling for results on low-contrast character recognition, Strasburger et al. concluded that higher level tasks require additional scaling along non-spatial variables (Strasburger et al., 1994; Strasburger & Rentschler, 1996). This topic is taken up in the next section. 
3.5. The need for non-spatial scaling
For many visual tasks, M-scaling removes perhaps not all but still a large portion of performance variation across the visual field. Virsu et al. (1987) show in their analysis of seven spatial threshold tasks (including two hyperacuity tasks) that between 85% and 97% of the variance were accounted for. In the cases were unexplained variance remains, additional scaling along some other, non-spatial variable may equalize performance. We can therefore distinguish errors of the first kind, which relate to the specific scaling factor chosen, from errors of a second kind that indicate a fundamental inadequacy of spatial scaling per se. In discussions on the cortical magnification concept, the latter errors have often been played down as being exceptions rather than the rule. Rovamo and Raninen (1984), for example, introduced scaling of retinal illumination, which they call “F-scaling,” as part of their concept. The neglect of non-spatial scaling variables led us to call for attendance to contrast as a key variable in peripheral pattern recognition (Strasburger, 1997a, 2001b, 2003a; Strasburger, Harvey, & Rentschler, 1991; Strasburger & Pöppel, 1997; Strasburger & Rentschler, 1996; Strasburger et al., 1994; cf. Chapter 4: Recognition of single characters). 
The need for scaling non-spatial variables and the crucial role played by contrast are now well accepted. Mäkelä, Näsänen, Rovamo, and Melmoth (2001) contend that, for the identification of facial images in peripheral vision, spatial scaling alone is not sufficient, but that additional contrast scaling does equalize performance. Melmoth and Rovamo (2003) confirm that scaling of letter size and contrast equalizes perception across eccentricities and set size, where set size is the number of alternatives for the letters. 
3.6. Further low-level tasks
We wish to finish the chapter with a brief review of visual functions that had not been considered in the above discussion. 
3.6.1. Reaction time
Reaction time shows large intra- and intersubject variability. Nevertheless, there are some factors that have small but systematic effects, like age, eccentricity, luminance, size, duration, monocular/binocular viewing, and (temporal vs. nasal) side (for reviews, see Schiefer et al., 2001; Teichner & Krebs, 1972). While reaction time is, on the whole, probably the best studied human performance indicator, information on its dependency on retinal eccentricity is relatively scarce. Poffenberger (1912) found an increase of 0.53 ms/deg in the temporal visual field and 0.33 ms/deg in the nasal visual field. Rains (1963) observed an increase of 5 ms/deg in the nasal perifovea and a further shallow increase of 0.4 ms/deg up to 30° nasally but no RT increase in the temporal visual field. Osaka (1976, 1978) studied visual reaction time on the nasal and temporal horizontal meridians from the fovea up to 50° eccentricity in six steps, using four target sizes between 0.3° and 1.9° (luminance 8.5 cd/m2). The studies confirmed the superiority of nasal over temporal RT at any retinal eccentricity and found a steady increase with eccentricity, at a rate between 1.08 ms/deg and 1.56 ms/deg temporally and 0.84 ms/deg and 1.42 ms/deg nasally. 
More recently, Schiefer et al. (2001) observed for an age-homogeneous group of twelve young adults a slope of 1.8 ms/deg in the mean up to 30° eccentricity (0°–15° eccentricity: 0.5 ms/deg; 15°–20° eccentricity: 3.6 ms/deg; 20°–30° eccentricity: 1.6 ms/deg). Interestingly, eccentricity accounted for 6% of the total variance, ranking second after the factor subject (accounting for 13% of the variance). In another study, Poggel, Calmanti, Treutwein, and Strasburger (in press) tested 95 subjects in the age range of 10 to 90 years (mean age: 47.8 years) at 474 locations in the central visual field up to ±27° horizontally and ±22.5° vertically. Again, simple visual reaction times (RTs) showed a steady increase with increasing eccentricity in the visual field of 1.66 ms/deg on average, which concurs with the earlier findings. 
It seems likely that part of the RT increase with eccentricity is linked to retinal properties and stems from reduced spatial summation. An indirect indicator is that RT both in the fovea and the periphery depends systematically on target luminance but is largely independent of target brightness (Osaka, 1982). More direct evidence comes from spatial summation, which is closely linked to retinal receptive field sizes (cf. Chapter 3.6.4: Spatial summation). Receptive field center sizes of broadband cells increase about 13-fold from the fovea (0.1°) to 30-deg eccentricity (1.2°; Equation 18 below; data: De Monasterio & Gouras, 1975). Stimuli in Schiefer et al. (2001) had a diameter of 0.43° and were thus much larger than foveal receptive fields but only about a quarter of the average receptive field size at 30° eccentricity. Targets in Osaka (1978) had 1° diameter, leading to the same effect. Osaka (1976) reported summation up to 1.15° in the fovea but more than their maximum target size of 1.9° at 50°. Indeed, Carrasco and Frieder (1997) showed that a reaction time increase of 0.15 ms/deg between 1.5° and 7° eccentricities was fully neutralized when using stimuli that are scaled according to Rovamo and Virsu's (1979) equation (cf. Equation 3 in Table 2 above). 
3.6.2. Apparent brightness
In the 1960s up to the early 1980s, a line of research sprung up in the following of Stevens (e.g., Stevens, 1966; Stevens & Galanter, 1957) to study the perceptual counterpart of luminance: brightness. The newly established method of magnitude estimation was used to assess suprathreshold perceptual properties of the most basic of the visual senses, that of light and dark. In the present context, we are only interested in studies on brightness in the visual periphery (Marks, 1966; Osaka, 1977, 1980, 1981; Pöppel & Harvey, 1973; Zihl, Lissy, & Pöppel, 1980). 
Brightness of a patch of light in the visual field is not to be confused with lightness, the perceived reflectance of an object (Gilchrist, 2006), even though under restricted conditions the two are indistinguishable. Another separate concept is that of the intensity of the illumination of an object or a scene. Illumination and reflectance together determine the luminance of a surface, which is the proximal (i.e., retinal) stimulus for both the surface's lightness and the corresponding visual area's brightness. To emphasize that brightness is a perceptual rather than a physical measure, the older literature speaks of apparent or subjective brightness. 
For the peripheral visual field, the amazing overall finding is that the brightness vs. luminance function for small patches of light in scotopic, mesopic, and photopic vision at all retinal loci closely follows a power function as described in Stevens' law (Marks, 1966; Osaka, 1977, 1980, 1981; Pöppel & Harvey, 1973; Zihl et al., 1980). However, the exponent of the power function varies substantially. Osaka (1977) studied scotopic brightness summation over time in the range of 1–1000 ms for target sizes of 0.27°–1.9°, at 0°–60° eccentricity with target luminances of 0.86–8.6 cd/m2. With increasing stimulus duration, brightness increased up to 100 ms (concurrent with Bloch's law) and then stayed mostly constant at all retinal loci (with a slight overshoot at certain durations, dependent on locus and duration, known as the Broca–Sulzer effect). Brightness increased a little less than 2-fold with a 7-fold increase of stimulus size. Osaka (1980) followed up on these findings and looked more closely at the brightness exponent (Stevens' constant) as a function of retinal eccentricity (10°, 20°, 30°, 40°, and 60°) under dark- and light-adapted conditions. Stimulus duration was kept fixed at 1 s to be in the constant range observed in Osaka (1977). The exponent was 0.33 foveally, in both adaptation conditions, and increased slightly with eccentricity, to about 0.35 in light-adapted and to 0.38 in dark-adapted conditions. Finally, Osaka (1981) extended the range of stimulus durations tested. The brightness exponent was found to be constant at 0.33 between 100 ms and 3 s (cubic root power function) but increased to much higher values, up to 0.9, for small and large durations. 
As an effect of the described relationships, brightness varies across the visual field in a manner different from that of the luminance threshold, i.e., of standard perimetric measurements. Marks (1966) stated that with dark adaptation a stimulus of fixed luminance appears brighter in the periphery than in the fovea and found it to be maximal at 20° eccentricity. Pöppel and Harvey (1973, p. 145), by contrast, reported subjective brightness of a suprathreshold target to be independent from its position in the visual field, for both photopic and scotopic conditions: “A target with a given luminance will elicit the same brightness sensation at all retinal positions. As a consequence of this brightness constancy throughout the visual field, peripheral targets at threshold appear brighter than foveal targets at threshold because a peripheral target at threshold has more luminance than a foveal target at threshold.” Zihl et al. (1980) confirmed this finding in case of photopic and mesopic adaptation; yet for scotopic adaptation, brightness of constant luminance stimuli decreased beyond 20° eccentricity. 
Astonishingly, this research on peripheral apparent brightness was never taken up again. The results are highly robust and impressively systematic. Stevens' power law is treated in every psychology textbook. Perimetry, out of which the questions partly arose, is the standard tool for assessing peripheral vision. Perhaps, the brightness concept just adds less to perceptual theorizing than was once hoped. Gilchrist (2006, e.g., p. 338), in his extensive treatment on light and dark made the point that brightness is, by and large, irrelevant for gathering information on the really important object property of lightness, i.e., an object's achromatic color that is physically determined by its reflectance. On the other hand, computational models like that of Watt and Morgan (1985) that include a non-linear first stage and thus incorporate an analogon to the brightness concept (collectively termed brightness models by Gilchrist, 2006, p. 205) do not as yet cover peripheral vision. So the role of brightness for understanding peripheral vision is still open. 
3.6.3. Temporal resolution and flicker detection
Temporal resolution is a performance indicator that has found widespread application in applied psychodiagnostics where it is considered to validly operationalize activation of the central nervous system underlying wakefulness and alertness (cf. Smith & Misiak, 1976). It is typically measured by the critical flicker frequency (CFF; also flicker fusion frequency) or, less frequently, by double-pulse resolution or temporal grating contrast sensitivity. 
The CFF is usually determined in foveal vision. The few early investigations that compared temporal sensitivity in the center with that in the periphery typically emphasized a pronounced performance decrease beyond 2° eccentricity (Alpern & Spencer, 1953; Creed & Ruch, 1932; Monnier & Babel, 1952; Otto, 1987; Ross, 1936). Other authors (Hylkema, 1942; Mayer & Sherman, 1938; Miles, 1950; Phillips, 1933; Riddell, 1936) showed an increase of CFF toward the periphery (see Hartmann, Lachenmayr, & Brettel, 1979; Landis, 1953, for a review of the older literature). In a parametric study employing adaptive threshold measurement with constant-size stimuli, Hartmann et al. (1979) obtained a pronounced increase of CFF from the fovea to the periphery up to approximately 30–60° eccentricity, and—beyond a certain, individually variable boundary—a decrease toward the far periphery on the horizontal meridian. Tyler (1987), on the other hand, used stimuli that were scaled according to retinal cone receptor density and, mapping the full visual field, found an overall pronounced increase of CFF up to 60° eccentricity, with local variations. In the 19th century, Exner (1875) had already proposed that the visual periphery is specialized with regard to temporal sensitivity, and Porter (1902) observed that the CFF increases with retinal eccentricity. This is in accord with the empirical findings, if temporal sensitivity in the periphery is compared with other visual functions that show a faster decline. The notion of a periphery that is more sensitive to flicker and motion also concurs with subjective experience, e.g., with the (former) everyday observation that a 50-Hz TV screen appears constantly illuminated in direct view but is perceived as flickering when viewed peripherally (Welde & Cream, 1972). The physiological basis for flicker detection is evidently the magnocellular pathway (Lee, Pokorny, Smith, Martin, & Valberg, 1990; Solomon, Martin, White, Lukas, & Lee, 2002). However, the CFF of both magno and parvo cells increases with eccentricity, with the sensitivity of parvo cells to high-frequency modulation coming close to that of magno cells in the far periphery. This suggests an outer retinal origin of high temporal sensitivity in the periphery (Lee et al., 1990). 
CFF performance depends highly systematically on target size (Granit–Harper law) and on luminance (Ferry–Porter law). Across area, the CFF shows spatial summation that is classically described by the Granit–Harper law (CFF = k × log area; Granit & Harper, 1930), where k is a constant that is independent of eccentricity (Raninen & Rovamo, 1986). However, Tyler and Hamer (1990, 1993) showed that the slope of the Ferry–Porter law [CFF = k(log L − log L0), where L and L0 are target and threshold luminance, respectively] increases with retinal eccentricity (thus contradicting Rovamo & Raninen, 1988; Raninen, Franssila, & Rovamo, 1991). This implies a supremacy of peripheral temporal processing over that of the fovea—and Tyler and Hamer thereby conclude that the slope constant in the Granit–Harper law is also dependent on eccentricity. Based on Tyler and Hamer's (1990) data and analyses, Poggel, Treutwein, Calmanti, and Strasburger (2006) remodeled spatial summation for the CFF and provide further slope coefficients that increase with eccentricity. 
The CFF refers to unstructured stimuli. If the interaction with spatial characteristics is of interest, one uses the temporal contrast sensitivity function (CSF) that reflects the minimum contrast for detection of a temporally modulated or moving sine-wave grating (see Watson, 1986, for a review). To study the temporal CSF's change with eccentricity, Virsu, Rovamo, Laurinen, and Näsänen (1982) presented grating targets that were M-scaled with respect to size, spatial frequency, and drift rate. They found the temporal CSF to be independent of eccentricity up to 30 deg on the nasal horizontal meridian. 
In order to circumvent adaptation to the continuous flicker in CFF measurements, transient measurement is useful. Rashbass (1970) studied the interaction of luminance difference thresholds and timing with double pulses of light or dark spots (see Watson, 1986). The minimum perceivable gap between two light pulses was first investigated by Mahneke (1958); Stelmach, Drance, and Di Lollo (1986) compared foveal and peripheral gap durations (see Treutwein & Rentschler, 1992, for a review). Treutwein advanced that method to arrive at a technique of simultaneous double-pulse resolution measurement at nine locations with stable results (Treutwein, 1989; Treutwein & Rentschler, 1992). DPR thresholds in the central fovea were found to be better than off-center (up to 3.4° visual angle and up to 6° in a related study by Sachs, 1995). 
Poggel and Strasburger (2004) and Poggel et al. (in press) used Treutwein's technique for a systematic cross-sectional study of temporal resolution and other visual performance indicators at 41 locations in the whole central visual field up to 20° eccentricity (95 subjects in a range of 10 to 90 years of age; mean age: 47.8 years). Stimuli had a constant size of 1.15° and a luminance of 215 cd/m2 on a 0.01 cd/m2 background. Thresholds increased (i.e., performance decreased) systematically with eccentricity, from 32.0 ms in the fovea to 51.5 ms at 20° eccentricity. The increase was steep (4.96 ms/deg) up to 2.5° eccentricity and shallow (0.5 ms/deg) beyond 5°, with an average rate of 1.16 ms/deg. The increase was fairly isotropic. There was an interaction with age, such that the periphery showed a slightly higher age-related increase than the center. Interestingly, temporal resolution and RT at any visual field position were statistically fully independent. A marginal correlation between temporal resolution and RT was mediated by subject age, i.e., very young and very old subjects had both increased double-pulse resolution thresholds and increased RTs. 
So, does double-pulse resolution increase or decrease with eccentricity? Like many other visual functions, performance in double-pulse resolution is enhanced by focal spatial attention (Poggel et al., 2006). The use of constant-size stimuli in Poggel et al.'s studies is likely to have put the periphery at a disadvantage. Based on the model calculations in Poggel et al. (2006), and taking into account the influence of attention and summation, Poggel et al. (in press) argue that performance of temporal resolution for targets of constant size decreases with eccentricity but effectively increases with scaled stimuli. 
3.6.4. Spatial summation
Target size is of prime importance for visibility, but the dependency of visual performance on size is often complex: bigger is not necessarily better. However, for the simple task of detecting a homogeneous spot of light on homogeneous background in the visual field (e.g., in a perimeter), the relationship is surprisingly systematic. Across a certain range of sizes, Riccò's (1877) classical law of spatial summation applies.4 It states that the light increment threshold ΔL is inversely proportional to the area A of the light spot, i.e., that their product is constant: 
\begin{eqnarray}\Delta L \cdot A = {\rm{const}}.\quad \end{eqnarray}
(14)
 
Since Weber's law states that ΔL/L = const. over a wide range of luminances, Riccò's law can be restated as  
\begin{eqnarray} \left( {\Delta L/L} \right) \cdot A = {\rm{const}}.\quad \end{eqnarray}
(15)
 
The area of a light spot is proportional to the square of the diameter d. In double logarithmic plot, the dependency of ΔL/L on the diameter therefore is given by a straight line of slope −2. This is how Riccò's law is typically plotted. Figure 12a illustrates this schematically and Figure 12b shows Graham and Bartlett's (1939) classical data (modified from Hood & Finkelstein, 1986, Figure 5.20). Outside the range where Riccò's law applies, there is a gradual flattening of the curve until at a certain size the light increment threshold stays constant, i.e., there is no more summation. The intermediate range where the slope is approximately −1, i.e., where the increment threshold is proportional to the linear diameter, was described by Piper in 1903. This relationship is sometimes referred to as Piper's law. 
Figure 12.
 
Spatial summation for the detection of a homogeneous spot of light in central and peripheral vision. (a) Schematic illustration of Riccò's and Piper's law of spatial summation. (b) Spatial summation in peripheral view for two observers (monocular, 15° nasal, dark adapted, 12.8 ms). Data by Graham and Bartlett (1939, Table 2). (c) Diameter of receptive and perceptive fields for the human, monkey, and cat. Open squares: Human perceptive fields, mean of temporal and nasal data provided by Oehler (1985, Figure 4). Open circles: Monkey perceptive fields, obtained by using the Westheimer paradigm (Oehler, 1985, Figure 8). Filled circles: Monkey receptive field (De Monasterio & Gouras, 1975, Figure 16, broad-band cells). Crosses and filled triangles: receptive fields of the cat (Fischer & May, 1970, Figure 2). Analyses by Strasburger (2003b), figures modified from Strasburger (2003).
Figure 12.
 
Spatial summation for the detection of a homogeneous spot of light in central and peripheral vision. (a) Schematic illustration of Riccò's and Piper's law of spatial summation. (b) Spatial summation in peripheral view for two observers (monocular, 15° nasal, dark adapted, 12.8 ms). Data by Graham and Bartlett (1939, Table 2). (c) Diameter of receptive and perceptive fields for the human, monkey, and cat. Open squares: Human perceptive fields, mean of temporal and nasal data provided by Oehler (1985, Figure 4). Open circles: Monkey perceptive fields, obtained by using the Westheimer paradigm (Oehler, 1985, Figure 8). Filled circles: Monkey receptive field (De Monasterio & Gouras, 1975, Figure 16, broad-band cells). Crosses and filled triangles: receptive fields of the cat (Fischer & May, 1970, Figure 2). Analyses by Strasburger (2003b), figures modified from Strasburger (2003).
There are generalizations of Riccò's law that apply to a larger luminance range but we will not go into detail (see Hood & Finkelstein, 1986; Strasburger, 2003a, 2003b; Chapter 5.4.3: Binding and letter source confusion and Chapter 5.4.4: Spatial attention; equations with empirical parameters are provided in the latter). Here, we are interested in the dependency on eccentricity only (Figure 12c). 
Since we will argue below that the psychophysical results closely match those in receptive field neurophysiology, we start off with a neurophysiological counterpart to Riccò's law formulated by Fischer and May (1970) for the cat retina. Summation in the retina occurs when photons are received within the same receptive field, so it is intuitive that Equation 15 can be expanded, as Fischer and May (1970, Equation 4a, p. 452) did, to yield 
\begin{eqnarray} \left( {\Delta L/L} \right) \cdot A = {c}_0 \cdot {A}_{\rm R},\quad \end{eqnarray}
(16)
where AR is the area of the receptive field and c0 is a system constant (Fischer and May modeled the receptive field by a two-dimensional Gaussian, and AR is the area where sensitivity drops to 1/e). Mean receptive field sizes were shown to depend linearly on eccentricity; these are shown separately for on-center and off-center fields by the triangles and plus signs in Figure 12c (Fischer & May, 1970, Figure 2; comparable results with a flatter increase were obtained by Peichl & Wässle, 1979, Figure 7, for cat Y cells: 0.8° at fovea; 2.3° at 24°). In modern writing (cf. Equation 7), a generalized version of Riccò's law is thus 
\begin{eqnarray} \left( {\Delta L/L} \right) \cdot A = {c}_0\left( {1 + E/{E}_2} \right),\quad \end{eqnarray}
(17)
with the same notation as before. 
Receptive field sizes in the monkey and human are different from those in the cat. De Monasterio and Gouras (1975, Figure 16) describe the sizes of macaque retinal ganglion cells, of which the broadband cells are of interest here. Although their sizes vary widely, their variation with eccentricity is quite regular on average (filled circles in Figure 12c). These cells could represent a physiological substrate for mediating Riccò's law as shown quantitatively by Oehler (1985). 
For that analysis, Oehler used Westheimer's paradigm as a psychophysical estimate of receptive field size. Since light energy of a homogeneous patch of light is proportional to its area, the limit up to which threshold is proportional to area (i.e., up to which Riccò's law applies) provides an estimate of receptive field size. However, as seen in Figure 12a and 12b, the borders of the spatial summation area are not well defined. To achieve a more precise estimate, Westheimer's paradigm interchanges the roles of the variables: The size of the stimulus, whose increment threshold is sought, is kept constant, and the size of a background annulus is varied instead. With increasing size of the latter, the threshold increases to a maximum and then decreases to a plateau further out. This so-called Westheimer function (Westheimer, 1965, first described by Crawford, 1940) is interpreted as showing that, as long as the annulus fits into the mean size of a receptive field, the threshold increases from an increased adaptation level. With a larger background, then, surrounding inhibitory areas slightly decrease the adaptation level. Consequently, the diameter at which the function's maximum is reached is taken as an estimate of the (mean of the) inner, summating part of the receptive field. The beginning of the plateau region is regarded as a psychophysically obtained estimate of the mean total receptive field size including the inhibitory surround. Such estimates were called perceptive fields by Jung and Spillmann (1970; cf. also Spillmann, 1964). 
Psychophysical data from Westheimer's paradigm had previously only been available for the human, whereas receptive field data existed only for cat and monkey. Oehler (1985) provided the missing link, namely, psychophysical data for the monkey, which could then be compared with neurophysiology. The open and filled circles in Figure 12c show the decisive result. The open circles refer to Oehler's perceptive field sizes in the monkey and the filled circles depict De Monasterio and Gouras' receptive field sizes for the broadband cells. It is striking how well the two functions superimpose. Moreover, perceptive field sizes for man and monkey are very similar across an eccentricity range from 5° to 40°. To allow a direct comparison to the aforementioned data, we calculated the mean between temporal and nasal human perceptive field sizes from Oehler's data. The results are shown as open squares in Figure 12c. Again, these data superimpose surprisingly well. The curves are described by the following equations:  
\begin{eqnarray} \begin{array}{@{}r@{\;}c@{\;}l@{}} {D}_{\rm{m}} &=& 0.0761 + 0.0356E\\ {D}_{\rm{h}}&=& 0.1773 + 0.0342E, \end{array}\quad \end{eqnarray}
(18)
where Dm and Dh are the perceptive field diameters for the monkey and the human, respectively. The corresponding E2 values are 2.09 and 5.18; note that the similarity of Dm and Dh is not reflected in these values. Kunken, Sun, and Lee (2005) confirmed Oehler's basic result but concluded from differences in the surround part of Westheimer's curve that both retinal and cortical mechanisms contributed to that curve. In summary, Westheimer's paradigm is rather useful for estimating receptive field sizes psychophysically (Westheimer, 2004). 
3.6.5. Perimetry
Any review of peripheral vision would be incomplete without at least mentioning perimetry, the diagnostic assessment of visual field functions in healthy and impaired subjects. Perimetry evolved from the same roots as the study of peripheral visual function (Chapter 2: History of research on peripheral vision and Chapter 4.1: High-contrast characters) but evolved into a separate specialism in the 1940s and 1950s. We distinguish between two routes: the clinical route in (neuro-) ophthalmology, neurology, and neuropsychology for diagnosis of disorders of the eye, visual pathway, and brain with the intention of therapy and a different route in optometry, ophthalmology, and psychological diagnostics that does not aim at therapeutic intervention, like the assessment of driver or pilot fitness, cockpit design, and driving safety. 
The different needs of the two branches have led to differing technologies. The light sensitivity perimeters that are still used today are based on the technique introduced by Goldmann (1945a, 1945b) or Harms (1952). For reviews of the classical techniques and their applications, see, e.g., Aulhorn and Harms (1972), Lachenmayr and Vivell (1993), Sloan (1961), and Thompson and Wall (2008). However, there are now numerous alternative perimetric techniques for mapping various visual functions. These include high-pass resolution perimetry (Frisén, 1993, 1995), component perimetry (Bachmann & Fahle, 2000), frequency doubling perimetry (e.g., Chauhan & Johnson, 1999; Spry, Johnson, McKendrick, & Turpin, 2001; Wall, Neahring, & Woodward, 2002), flicker perimetric methods (cf. Rota-Bartelink, 1999 for review), methods that include a recognition task like MacKeben's Macular Mapping Test (Hahn et al., 2009), microperimetry, and the scanning laser ophthalmoscope (SLO; Mainster, Timberlake, Webb, & Hughes, 1982; Rohrschneider, Springer, Bültmann, & Völcker, 2005), as well as objective techniques like the multifocal electroretinogram (Sutter & Tran, 1992). Some of the techniques and implemented test algorithms are briefly reviewed by McKendrick (2005). Further information can be found at the Imaging and Perimetry Society's site (http://www.perimetry.org; for example, Thompson & Wall, 2008). Eisenbarth, MacKeben, Poggel, and Strasburger (2008) explored the potential of double-pulse perimetry (cf. Chapter 3.6.3: Temporal resolution and flicker detection) and showed that in age-related macular degeneration temporal thresholds are severely impaired far outside the macula, up to 20° eccentricity. 
A very different route has been taken for driving and pilot fitness assessment. An unpublished study of ours (Strasburger, Grundler, & Burgard, 2006) demonstrated that standard perimetry may not be the best indicator for safe driving as it does not assess temporal sensitivity and attention in the visual periphery. In the US, the Useful field of View (UFOV) test, which mixes sensory and attentional testing, has been shown to be a good predictor of driving fitness (cf. Ball, Owsley, Sloane, Roenker, & Bruni, 1993). In Europe, a test of peripheral temporal sensitivity named PP in the Vienna Test System has been found particularly predictive of driving fitness (Burgard, 2005; Strasburger et al., 2006). 
3.6.6. Other functions
Many more visual functions have been studied with respect to whether and how they change with eccentricity in the visual field. A few are listed in Table 8, together with key references for further information. 
Table 8.
 
Other functions studied in research on peripheral vision.
Table 8.
 
Other functions studied in research on peripheral vision.
Chapter 4. Recognition of single characters
In the previous section, we have reviewed visual tasks that involved unstructured or very simply structured stimuli. Characters can be considered one step further in terms of complexity and might thus be more representative for capturing what is special about form vision. Surprisingly, however, it turns out that the prototypical situation of recognizing single characters at high contrast shares many characteristics with discriminating simpler forms, and it is only at lower contrast or with multiple characters that differences emerge. In the present section, we look at the recognition of individual characters. We start with characters at high contrast where we review letter acuity and issues of recognition proper. From there, we proceed to character recognition at lower contrast, reviewing technical questions of stimulus presentation at low-contrast levels, studies using band-pass filtered letters, and work on contrast thresholds for character recognition, as well as a descriptive model for the latter. 
4.1. High-contrast characters
4.1.1. Letter acuity
Traditionally, the study of single-character recognition in the fovea and the periphery was mostly the study of visual acuity. Purkinje, Aubert and Foerster, Herman Snellen, and Edmund Landolt in the 19th century laid the foundations (Aubert & Foerster, 1857; Snellen, 1862, Snellen & Landolt, 1874a, 1874b). Following Snellen's lead, however, the letters that were tested were typically taken from a designedly limited set. Snellen (1862) introduced his eye chart with stylized letters in a 5 × 5 grid with black and white evenly distributed (he also provided reading charts with various standard fonts). Fick (1898) in his study on peripheral acuity used impoverished Snellen E optotypes (the Snellen E is unlike that in the Snellen chart and consists of three bars with a connecting bar) where the middle bar was removed. Korte (1923) measured eccentricity thresholds for constant-size letters in six observers. He is one of the few who used the whole alphabet, in upper and lower cases and in two fonts, Roman and Gothic. Ludvigh (1941) used the Snellen E in his study of peripheral acuity as did Virsu et al. (1987; for reviews of the early literature, see Aulhorn, 1964; Low, 1951; Sloan, 1951; Westheimer, 1965; Weymouth, 1958). Millodot and Lamont (1974) were the first to measure acuity on the full vertical meridian and employed the Landolt ring. Aulhorn (1964; cf. her Figure 31) used white diamonds vs. circles for testing form vision (“Type a,” introduced by Aulhorn, 1960) and found her data to closely match the grating data of Wertheim (1894). Only in 1959 did Louise Sloan introduce the Sloan letter set (stylized C, D, H, K, N, O, R, S, V, Z) which is used in today's “Snellen” charts (Sloan, 1959). In most of Europe, the Landolt C is the recommended optotype for acuity measurement. It was standardized by the German DIN (industrial norm). When letters are compared to other optotypes, the minimum separable is used, i.e., the gap in the Landolt ring (cf. Schober, 1938) is compared to the gap between the bars in the Snellen E. 
When one speaks of the change of visual acuity with retinal locus, one refers to mean results only. Low (1951, Charts 1 and 2), in his thorough review, plotted the peripheral acuity data of twenty-two studies ranging from Hueck (1840) to Sloan (Mandelbaum & Sloan, 1947). After excluding meridian, age, sex, pupillary size, target color, refractive conditions, movement, psychological factors, adaptation, and training as unimportant sources of variation in these data, there remained stimulus type and, most importantly, “interindividual variability among a group of subjects … as the most likely source of discrepancy” (Low, 1951, p. 95). In modern terms, the interindividual variance exceeds the systematic variance. Anybody comparing optotypes should bear this in mind. Variability in acuity measurements was further studied by Randall, Brown, and Sloan (1966). 
High-contrast character recognition and acuity in eccentric vision are crucially affected by the deployment of spatial attention (Carrasco et al., 2002; MacKeben, 1999; Nakayama & MacKeben, 1989; Talgar et al., 2004). Results depend on whether the subject knows where to expect the stimulus and whether and when there are spatial cues marking the target location. These dependencies have been known since long (cf. Chapter 2: History of research on peripheral vision); to eliminate that influence in acuity measurement, researchers typically have chosen paradigms where the subject knows the eccentric location. However, sustained attention has been shown to be anisotropic with a dominance of the horizontal meridian in the macula (MacKeben, 1999). Performance at disfavored locations was found to be limited by deploying attention, not by holding it there. Attentional anisotropies thus need to be distinguished from anisotropies on the input side (receptors, ganglion cells, LGN, V1). We return to the role of spatial attention in the context of crowding (Chapter 5: Recognition of patterns in context—Crowding). 
4.1.2. Character recognition at high contrast
Character recognition is a task with requirements very different from those of detection and discrimination (cf. Chapter 8: Modeling peripheral form vision). The interest in these aspects arises in reading and dyslexia research, which we will touch only briefly. Korte (1923) studied confusions and misreadings of letters in peripheral vision in the tradition of the Gestalt school. Since in his study letters were presented in the context of syllables, we will come back to his account in the section on crowding (cf. Chapter 5: Recognition of patterns in context—Crowding and Appendix). Geiger and Lettvin (1987) introduced the form-resolving visual field (FRF). It differs from acuity measurement in that (1) simple and more complex forms are used (Zegarra-Moran & Geiger, 1993), (2) attention is divided between a foveal and the perpheral form, and (3) size is kept constant; the dependent measure is percent correct. Gervais, Harvey, and Roberts (1984) also addressed human letter recognition psychophysically outside the tradition of acuity research. The authors compared 26 × 26 confusion matrices for the full 26-letter alphabet with predictions from a template model, a geometric feature model using unstructured feature lists, and a model based on 2D Fourier descriptors weighted by the human contrast sensitivity function (CSF). Letters were above the size threshold but were quite small (0.1°) and were briefly presented so as to produce 50% correct performance. Results were based on 3,900 trials. The highest correlation (0.70) between actual and predicted confusions was attained by the model where letters were filtered by the human CSF, using both letter amplitude and phase spectra, although the contribution of phase was moderate. The template model ranked second and the geometric feature model third. This suggests that peripheral letter recognition depends largely on contrast sensitivity. To our knowledge, later studies of single-letter confusions have not again looked at the full alphabet. 
4.2. Low-contrast characters
4.2.1. Introducing contrast to the study of character recognition
Surprisingly, for over a hundred years, the study of human single-character recognition has been mostly synonymous with determining the size threshold for recognition or discrimination, as discussed in the preceding section. Varying stimulus presentation time as the thresholding parameter was mostly confined to psychological research in the context of reading, and stimulus contrast was neglected despite the availability of techniques for presenting low-contrast patterns. All of this changed with the advent of systems and communication theory during the war and of electronic equipment in the perceptual laboratory in the 1950s and 1960s. In the footsteps of Campbell and Robson (1968), Denis Pelli built equipment for high-resolution contrast control (12 bit) for the PDP-11 in the 1980s, and David Regan and Denis Pelli made hard-copy low-contrast letter charts for improved diagnosis of ophthalmic diseases available to a wide audience (Pelli, Robson, & Wilkins, 1988; Regan, 1988a, 1988b, 1991; Regan & Neima, 1983). The charts have found application in ophthalmic diagnostics, for example, of cataract, glaucoma, retinopathies, and multiple sclerosis. More recently, Arditi (2005) presented a further improved low-contrast letter chart, as did Colenbrander and Fletcher (2004, 2006). 
For basic research, computer-based implementations are more flexible than other approaches (Bach, Meigen, & Strasburger, 1997; Strasburger, 1997b). In our laboratory, we have developed software for measuring character contrast thresholds in peripheral viewing, based on Harvey's (1986, 1997) ML-Pest package and on Pelli's PDP-11 hardware, which we later ported to the PC (Jüttner & Strasburger, 1997; Strasburger, 1997a). Bach implemented a Landolt C contrast threshold measurement in his popular FrACT (Bach, 1996). For contrast threshold measurement, it is essential that more than 8-bit grayscale resolution is available. Even until today, however, except for specialized hardware like the VSG system, all monitors (CRT and LCD alike) and all standard computers (PC and Mac alike) offer 8-bit grayscale only. Work-around solutions are dithering (Bach et al., 1997), which we used in our technique, bit stealing (Tyler, 1997), and Pelli's attenuator (Pelli & Zhang, 1991; see Strasburger, 1995–2011, for an overview on technology). 
Early measurements of foveal contrast sensitivity for letters were conducted by Ginsburg (1978), Legge, Rubin, and Luebker (1987), and van Nes and Jacobs (1981). Ginsburg (1978) found that contrast sensitivity increased with letter sizes increasing from 0.07° to 0.8° and that more contrast was required for identification than for detection of the letters. Legge et al. (their Figure 8), using Pelli's contrast attenuator, measured single-letter contrast sensitivity within a reading study for black-on-white Sloan letters ranging in size from 0.13° to 24° in three observers. They reported a rapid increase of contrast sensitivity with increasing letter width and a gradual falloff at a width of 2° and large values, similar to the foveal curve shown in Figure 13
Figure 13.
 
(a) 3D representation of the contrast-size trade-off functions for one subject (WB) (from Strasburger, 2003b; like Strasburger et al., 1994, Figure 1, but interpolated in the blind spot). (b) Full set of contrast-size functions for the same subject (from Strasburger et al., 1994).
Figure 13.
 
(a) 3D representation of the contrast-size trade-off functions for one subject (WB) (from Strasburger, 2003b; like Strasburger et al., 1994, Figure 1, but interpolated in the blind spot). (b) Full set of contrast-size functions for the same subject (from Strasburger et al., 1994).
4.2.2. Spatial frequency characteristics of letter identification
To understand mechanisms underlying letter recognition, the concept of perception as a noise-limited process (Barlow, 1977; Legge & Foley, 1980; Pelli, 1981) has been applied to letter recognition (Majaj, Pelli, Kurshan, & Palomares, 2002; Parish & Sperling, 1991; Solomon & Pelli, 1994; Sperling, 1989). Parish and Sperling (1991) embedded band-pass filtered versions of the 26 letters of the alphabet in identically filtered Gaussian noise and averaged performance over these letters. Observers used best (42% efficiency) spatial frequencies of 1.5 cycles per letter height over a 32:1 range of viewing distances. Solomon and Pelli (1994) presented the 26 letters unfiltered but masked by high- or low-pass noise. Unlike Parish and Sperling, they obtained filters of about 3 cycles per letter from both high- and low-pass data and an observer efficiency of about 10%. Object spatial frequencies are now often used to characterize filtered letters. However, Petkov and Westenberg (2003) showed that the spectral specification in terms of cycles per letter rather than cycles per degree in Solomon and Pelli's study was misleading. Indeed, in the latter study, letter stroke width had covaried with letter size. Conventional spatial frequency in cycles per degree therefore may still be the most appropriate measure for the recognition of letters as well as of the non-symbolic patterns to which Petkov and Westenberg had extended their study. 
Performance levels for letter identification in central and peripheral vision were directly compared by Chung, Legge, and Tjan (2002). They found spatial frequency characteristics of letter recognition to be the same in the two viewing conditions. Chung and Tjan (2009) used similar techniques to study the influence of spatial frequency and contrast on reading speed, in the fovea and at 10° eccentricity. At low contrast, speed showed tuning effects, i.e., there was an optimum spatial frequency for reading. The spatial frequency tuning and scaling properties for reading were rather similar between fovea and periphery and closely matched those for identifying letters, particularly when crowded. 
4.2.3. Contrast thresholds for character recognition
First measurements of contrast thresholds for peripheral form recognition were performed with the Tübinger perimeter using a diamond vs. circle discrimination task (Aulhorn, 1960, 1964; Aulhorn & Harms, 1972; Johnson, Keltner, & Balestrery, 1978; Lie, 1980) and by Fleck (1987) for characters displayed on a computer terminal. Herse and Bedell (1989) compared letter contrast sensitivity to grating contrast sensitivity at 0°, 5°, 10°, and 15° in two subjects on the nasal meridian. Eccentric viewing resulted in a larger sensitivity loss for letters than for gratings. In their Figure 6, they plotted log contrast sensitivity versus hypothetical spatial frequency, using the rule-of-thumb relation cpd = 30/MAR, and obtained a linear dependency. If the abscissa is converted back to the actual data, hyperbolic functions similar to those in Figure 13 result. 
Strasburger et al. (1991) reported the first extensive contrast threshold measurements for characters where retinal eccentricity and stimulus size were varied independently so as to separate these influence factors. Stimuli were the ten roman digits in a serif font, presented as light patterns on a 62 cd/m2 mean gray background at nine positions from 0° to 16° eccentricity on the left horizontal meridian. Thresholds were determined in a 10-afc task using Harvey's (1986) maximum likelihood algorithm of threshold measurement. The main findings were that: (1) at each retinal position, there is a highly systematic trade-off between (log Michelson) contrast and character size and (2) both threshold size and threshold contrast increase independently in peripheral viewing (Figure 13). The latter result is incompatible with the plain cortical magnification concept and calls for its extension with independently scaled stimulus attributes (cf. Chapter 3.5: The need for non-spatial scaling). Since measurements had been carried out only up to the blind spot (16°), Strasburger et al. (1994) and Strasburger and Rentschler (1996) extended these experiments to cover the full eccentricity range where recognition of characters was possible. 
Strasburger and Rentschler (1996) included further measurements of letter contrast thresholds on the vertical meridian and standard static perimetry in the same subjects to compare visual fields defined by letter contrast sensitivity, on the one hand, with those defined by light spot detection, on the other hand. The results showed that at any given threshold contrast the visual field of recognition is much smaller than the perimetric field of detection (Figure 14). Interestingly, the performance plateau on the horizontal meridian between about 10° and 25°, which is often seen in standard perimetry (Harvey & Pöppel, 1972; Pöppel & Harvey, 1973), also manifests itself in the letter recognition thresholds. 
Figure 14.
 
Visual fields of recognition and detection for one subject (CH). Recognition fields (heavy lines) are obtained from threshold-contrast-vs.-size trade-off functions as shown in Figure 13. The form of the field is approximated by ellipses. Each ellipse shows the border of recognition at a given level of contrast, at the values 1.2%, 2%, 3%, 4%, 6%, 10%, 30% starting from the inner circle (contrast in Michelson units). Note the performance plateau on the horizontal meridian between 10° and 25° (between the 3% and 4% line), similar to the one found in perimetry (Harvey & Pöppel, 1972; Pöppel & Harvey, 1973). The 100%-contrast ellipse represents a maximum field of recognition obtained by extrapolation; its diameter is 46° × 32°. Also indicated in dashed lines are the fields of light-spot detection in standard static perimetry for the same subject. (From Strasburger & Rentschler, 1996, Figure 4.) Note that the dashed line does not represent the full visual field of detection since a small test spot is used for the perimetric data; the full field would extend to around ±107°.
Figure 14.
 
Visual fields of recognition and detection for one subject (CH). Recognition fields (heavy lines) are obtained from threshold-contrast-vs.-size trade-off functions as shown in Figure 13. The form of the field is approximated by ellipses. Each ellipse shows the border of recognition at a given level of contrast, at the values 1.2%, 2%, 3%, 4%, 6%, 10%, 30% starting from the inner circle (contrast in Michelson units). Note the performance plateau on the horizontal meridian between 10° and 25° (between the 3% and 4% line), similar to the one found in perimetry (Harvey & Pöppel, 1972; Pöppel & Harvey, 1973). The 100%-contrast ellipse represents a maximum field of recognition obtained by extrapolation; its diameter is 46° × 32°. Also indicated in dashed lines are the fields of light-spot detection in standard static perimetry for the same subject. (From Strasburger & Rentschler, 1996, Figure 4.) Note that the dashed line does not represent the full visual field of detection since a small test spot is used for the perimetric data; the full field would extend to around ±107°.
In an even more extensive study involving twenty healthy young observers, Strasburger, Gothe, and Lutz (2001) compared contrast sensitivity for recognition to that for detection in a finely spaced raster covering the full central field with a 20-deg radius. Detection stimuli were Gabor patterns (1 cycle/deg, sigma = 1.5 deg; discrimination of vertical vs. horizontal orientation was taken as a measure of detection); recognition stimuli were, as before, the digits 0–9 at a height of 2.4 deg, the contrast thresholds of which were determined at 65 positions in a polar raster. Overall, close to 100,000 observer responses were collected. Results are shown in Figures 15 and 16. All subjects showed stable but interindividually somewhat different sensitivity surfaces. Contrast thresholds for detection and recognition increased in the mean linearly with eccentricity out to 30° eccentricity, by 0.029 log C/deg for Gabor patch detection and by 0.036 log C/deg for character recognition. Recognition contrast thresholds were by 0.25 to 0.50 log units higher than those for detection. There was some variation between subjects but less than between conditions. No difference was observed between the left and right visual fields. Again, there was a performance plateau on the horizontal meridian between 15° and 20° (Figure 15b; Strasburger, 2003b; Strasburger et al., 2001). 
Figure 15.
 
Contrast thresholds for the recognition of characters (lower curves) compared to the detection of Gabor gratings (upper curves); (a) mean over all meridians, (b) horizontal meridian. Character height 2.4°, Gabor patches: 1 cpd, σ = 1.5°. From (Strasburger, 2003b; Strasburger et al., 2001).
Figure 15.
 
Contrast thresholds for the recognition of characters (lower curves) compared to the detection of Gabor gratings (upper curves); (a) mean over all meridians, (b) horizontal meridian. Character height 2.4°, Gabor patches: 1 cpd, σ = 1.5°. From (Strasburger, 2003b; Strasburger et al., 2001).
Figure 16.
 
Contrast thresholds for the recognition of characters (a) compared to the detection of Gabor gratings (b) in the full field up to 30°; same conditions as in Figure 15. Error bars show standard deviations. From Strasburger, 2003b.
Figure 16.
 
Contrast thresholds for the recognition of characters (a) compared to the detection of Gabor gratings (b) in the full field up to 30°; same conditions as in Figure 15. Error bars show standard deviations. From Strasburger, 2003b.
4.2.4. Model description for single characters
In Chapter 3.1: The cortical magnification concept, we had summarized descriptive relationships of how performance for a number of visual tasks, including high-contrast character recognition, depends on eccentricity in the visual field. In light of the violations of M-scaling discussed above (Chapter 3.4: Successes and failures of the cortical magnification concept, Chapter 3.5: The need for non-spatial scaling, Chapter 4.2.1: Introducing contrast to the study of character recognition), what is the corresponding relationship for single-character recognition at high and low contrasts, and how is this reconciled with previous findings? We have addressed this question (Strasburger et al., 1991, Figure 9; Strasburger et al., 1994; Strasburger, 2001a, 2001b, 2003a, 2003b) and give our answer as a set of descriptive, linear and non-linear equations summarized in Figure 17. The parameters therein are based on the data in Strasburger et al. (2001) (see Figures 15 and 16) and previous data, i.e., about 1/4 million subject responses in >20 young subjects. The model starts with a trade-off function between character size and log recognition contrast threshold, approximated by a hyperbola (first row in Figure 17). Its asymptotes, log Coff and Soff, are both shifted with increasing eccentricity, each by a linear function (left graph). The equations can be solved for character size S as a function of eccentricity (second row in the figure), where log C appears as a parameter in the denominator. For high contrast C, the denominator in that equation becomes mostly constant except for high eccentricity (because log 1 = 0) and is thus reduced to conventional M-scaling (black straight line at 100%). At lower contrast, the graphs are bent upward and, at low contrast, quickly approach infinity (colored lines). That equation therefore represents a generalization of M-scaling. Finally, correct performance can be predicted by the psychometric function (Figure 17, third row), which shows the percentage of correct answers Pc as a function of log normalized contrast (c/C). Threshold contrast C (from the trade-off function) acts as position parameter and shifts the psychometric function horizontally. The slope has been shown to be largely independent of stimulus size and position (Strasburger, 2001b). The lower asymptote is given by the rate of guessing γ, i.e., by the inverse of the number of alternatives. The equations predict contrast and size thresholds for recognition, and the proportion of correct recognition, for singly presented characters of arbitrary contrast, size, and position in the visual field (the anisotropy is not incorporated since it is comparably small). 
Figure 17.
 
Prediction of the threshold contrast for character recognition in the central 30° radius visual field. (C: Michelson threshold contrast, E: eccentricity (°), S: size threshold, Pc: percent correct, c: supra-threshold contrast, ln: natural log, β: slope measure). Adapted from Strasburger, 2003a, 2003b; Strasburger & Rentschler, 1996. For the psychometric function and its slope measure see Strasburger (2001a, 2001b).
Figure 17.
 
Prediction of the threshold contrast for character recognition in the central 30° radius visual field. (C: Michelson threshold contrast, E: eccentricity (°), S: size threshold, Pc: percent correct, c: supra-threshold contrast, ln: natural log, β: slope measure). Adapted from Strasburger, 2003a, 2003b; Strasburger & Rentschler, 1996. For the psychometric function and its slope measure see Strasburger (2001a, 2001b).
4.2.5. Spatial summation: Does Riccò's law hold for character recognition?
In Chapter 3.6.4: Spatial summation, we have briefly summarized the laws of areal summation for light spot detection in peripheral vision (see Hood & Finkelstein, 1986; Strasburger, 2003b, for more detailed summaries). Of particular interest is the size range within which the increment threshold is proportional to the light spot's area, i.e., the range where Riccò's law holds (Riccò, 1877) and where the underlying mechanism can be assumed to be light energy summation. The range diameters can be shown to correspond to mean receptive field sizes and thus increase with retinal eccentricity. Character recognition depends on the detection and discrimination of features (in a broad sense), and it is natural to assume that the size dependencies for light detection are somehow reflected in the size dependencies of character recognition. The question thus is whether Riccò's law holds for character recognition, in the fovea and in the periphery. 
The question can be addressed empirically from the data shown in Figure 13. The contrast scale in Figure 13 needs to be converted to Weber contrast (ΔL/L) since that is proportional to the light increment threshold ΔL, and the data need to be shown on double logarithmic axes, so that Riccò's law manifests itself as a straight line with a slope of −2. An example for foveal vision is given in Figure 18a; the corresponding functions for twelve peripheral locations are provided in Strasburger (2003b). A line with slope −1 that corresponds to Piper's (1903) law is also shown. 
Figure 18.
 
(a) Example of a contrast-size trade-off function in the fovea. Plotted is log Weber contrast vs. log size, so as to allow comparison with Riccò's law. (b) Maximum slope in the contrast-size trade-off function as in the figure on the left, at a range of eccentricities on the horizontal meridian (modified from Strasburger, 2003b, Figures 5.4-13 and 5.4-14).
Figure 18.
 
(a) Example of a contrast-size trade-off function in the fovea. Plotted is log Weber contrast vs. log size, so as to allow comparison with Riccò's law. (b) Maximum slope in the contrast-size trade-off function as in the figure on the left, at a range of eccentricities on the horizontal meridian (modified from Strasburger, 2003b, Figures 5.4-13 and 5.4-14).
When Figure 18a is compared to the corresponding areal summation functions in Chapter 3.6.4: Spatial summation, it is obvious that Riccò's law is violated. The steepest slope should be −2 and it should be attained at small target sizes. The maximum slope for the foveal curve is around −3 and is therefore much larger. Figure 18b summarizes the maximum slope values extracted from the twelve peripheral trade-off functions. The values vary quite a bit (since the steep function part is comparably short), but it is clear that they are even higher than the foveal slope. The mean of these maximum slopes, between 2° and 36° eccentricities, is −5.75 ± 0.98. Thus, the increment threshold ΔL for character recognition in peripheral vision (at a given luminance L) decreases with increasing area to the third power instead of linearly, i.e., much more profoundly. In short, small letters need much more contrast for recognition relative to large characters, and even more so in the periphery. This is further evidence on how recognition performance is only loosely coupled to lower level task characteristics. 
Chapter 5. Recognition of patterns in context—Crowding
In peripheral vision, the recognition of detail is radically impeded by patterns or contours that are nearby. This phenomenon is known (or has been studied) under a number of terms—crowding (Ehlers, 1953; Stuart & Burian, 1962), contour interaction (Flom, Heath, & Takahaski, 1963; Flom, Weymouth, & Kahnemann, 1963), interaction effects (Bouma, 1970), lateral inhibition (Townsend, Taylor, & Brown, 1971), lateral interference (Chastain, 1982; Estes, Allmeyer, & Reder, 1976; Estes & Wolford, 1971; Wolford, 1975), lateral masking (Geiger & Lettvin, 1986; Monti, 1973; Taylor & Brown, 1972; Wolford & Chambers, 1983), masking (Anstis, 1974), and surround suppression (following V1 neurophysiology; Petrov, Carandini, & McKee, 2005). These terms mean slightly different things, and some imply an underlying mechanism, whereas others do not. The term crowding has recently become the most popular and preferred one in many studies, so we will use it here (cf. Strasburger et al., 1991; Strasburger & Rentschler, 1995; Pelli, Palomares, & Majaj's, 2004, thorough treatment, and the special issue in the Journal of Vision by Pelli, Cavanagh, Desimone, Tjan, & Treisman, 2007). 
The susceptibility to crowding may be one of the most characteristic traits of peripheral vision, although it appeared like a niche interest for many decades. More recently, this has radically changed and there is now much more research on the subject than we can discuss here. Fortunately, there are recent reviews (Levi, 2008, Pelli & Tillman, 2008; Strasburger, 2005) and a critical comment on the matter (Tyler & Likova, 2007), so we can concentrate on special aspects and recent developments. In the following, we will first provide a brief historic account of crowding research (Chapter 5.1: The origin of crowding research). We will then review work on letter crowding at low contrast (Chapter 5.2: Letter crowding at low contrast), present an extension of Bouma's rule (Chapter 5.3: Bouma's law revisited—and extended), and finally discuss potential mechanisms that may underlie crowding (Chapter 5.4: Mechanisms underlying crowding). 
5.1. The origin of crowding research
The first elaborate experimental study on letter and word recognition in eccentric vision was conducted by Korte (1923), and his 66-page treatment from the era of Gestalt psychology remained the most extensive for a long time. Along with eccentricity thresholds, it presented a phenomenological description of the perceptual process based on extensive data from eight observers. In addition to letter stimuli, Korte used both meaningful and meaningless words to exclude cognitive factors. As stimuli, he used the lower and upper case letters of the Roman and Gothic fonts. The paper starts off with the observation that, in normal reading, most letters are only seen extrafoveally, making indirect vision of fundamental significance for reading (and vision in general; Korte, 1923, p. 18), an insight that has nicely been verified by Pelli, Tillman et al. (2007; cf. McConkie & Rayner, 1975) in a recent paper on the perceptual span. 
Because of its seminal role and because that tradition was tragically discontinued, we give a brief summary of Korte's account in Appendix below. In short, Korte extracted seven phenomena from his data: (a) Absorption and false amendment, where “a feature of a letter or a whole letter is added to another letter”; (b) false localization of details both of features (b1) and whole letters (b2); (c) puzzling intermediate perceptual states (Korte pointed out the processual quality of perception, reminiscent of the settling of a neural network; e.g., McClelland & Rumelhart, 1981); (d) prothesis and methathesis, adding non-existent letters to a word on the left or right (rare); (e) shortening of the perceptual image in a certain area in the visual field (pp. 65–70); (f) assimilation of details to the perceived whole; (g) false cognitive set, e.g., the impact of prior knowledge of font and letter case and whether the syllables are meaningful or not. Four of these phenomena (a, b, e, f) are related to or underlie the crowding effect as we conceive it today—as the impairment of discriminating detail or recognizing a pattern in the presence of other details or patterns. Some are reflected in formal theories of pattern recognition (a, b1, c, g). The others are still awaiting integration into future theories. 
The phenomenon of crowding was probably familiar to ophthalmologists soon after the introduction of acuity measurements but was first explicitly described by the Danish ophthalmologist Ehlers (1936, p. 62; Ehlers, 1953, p. 4325). Ehlers noted, in the context of normal reading and use of letter acuity charts, that there are visual, non-cognitive difficulties of recognizing letters among other letters in eccentric vision. He also observed that the number of letters recognized is independent of angular letter size at varying viewing distance (p. 62). Stuart and Burian (1962) later referred to the phenomenon described by Ehlers as the “crowding effect.” 
Further early work on the crowding effect was carried out by Davage and Sumner (1950) on the effect of line spacing on reading. Müller (1951) used a matrix of 15 × 15 Snellen Es, and Prince (1957, p. 593) somewhat airily mused that “there is a psychological element which obviates the known laws of optics in the recognition of patterns.” 
Averbach and Coriell (1961) started modern research on both crowding (which they called lateral masking) and spatial visual attention. They used Sperling's (1960) iconic memory paradigm but controlled visual attention within a row of letters by marking one with an enclosing circle (Figure 19a)—a spatial cue or probe in modern terms (they called it a circle indicator). They also used a pointing line that they referred to as bar marker and that later became known as a symbolic cue. Both markers had the desired attention-attracting effect. However, the circle, unlike the bar, also had the effect of decreasing perceptual performance. Averbach and Coriell thus discovered contour interaction and motivated Flom et al.'s (Flom, M. C., Heath et al., 1963; Flom, Weymouth et al., 1963) well-known work that was published shortly thereafter (Figure 19b). 
Figure 19.
 
Stimulus configurations in letter crowding studies. (a) Averbach and Coriell (1961); (b) Flom et al. (1963); (c) Eriksen and Rohrbaugh (1970); (d) Wolford and Chambers (1983); (e) Strasburger et al. (1991), with permission from Springer Science+Business Media; (f) Toet and Levi (1992), with permission from Elsevier; (g) Anstis' (1974) crowding demonstration chart. Bouma's (1970) stimuli are not shown, with permission from Elsevier; he used twenty-five lower case letters in Courier-10 font of 0.22° height. (Graphics modified from Strasburger, 2003b).
Figure 19.
 
Stimulus configurations in letter crowding studies. (a) Averbach and Coriell (1961); (b) Flom et al. (1963); (c) Eriksen and Rohrbaugh (1970); (d) Wolford and Chambers (1983); (e) Strasburger et al. (1991), with permission from Springer Science+Business Media; (f) Toet and Levi (1992), with permission from Elsevier; (g) Anstis' (1974) crowding demonstration chart. Bouma's (1970) stimuli are not shown, with permission from Elsevier; he used twenty-five lower case letters in Courier-10 font of 0.22° height. (Graphics modified from Strasburger, 2003b).
The crowding effect is highly important for the understanding of amblyopia and eccentric vision, where it is particularly pronounced, whereas it is small and often seems to be absent in normal foveal vision. It is therefore surprising that it was first quantitatively described in normal foveal vision by Thomas-Decortis (1959, p. 491). She reported a reduction of acuity by a factor of 1.3 for normally sighted subjects. Shortly thereafter, the works of Flom, Heath et al. (1963) and Flom, Weymouth et al. (1963) gave a quantitative and detailed description of the foveal effect under the label contour interaction. Unlike Averbach and Coriell, who came from experimental psychology, Flom et al. used the Landolt ring, in the tradition of optometry/ophthalmology (Figure 19b). 
The first detailed quantitative study on the peripheral crowding effect was published by Bouma (1970, 1973). This occured at a time when the dependency of visual performance on eccentricity had been thoroughly reviewed for many visual functions (Weymouth, 1958), and the use of a perimeter had been part of ophthalmological routine for a decade (Aulhorn, 1960). Bouma (1970) also suggested the (now widely cited) rule of thumb that the critical free space between flanking letters and target in the standard letter crowding paradigm, below which crowding sets in, is about half the eccentricity of the target. The rule was thoroughly reviewed by Pelli et al. (2004, Table 4) and, except for variations of the coefficient, was found to be valid over a wide range of visual tasks. Note that Bouma's original rule is well defined and gives better fits compared to how it is currently cited; we return to this in Chapter 5.2: Letter crowding at low contrast and Chapter 5.3: Bouma's law revisited—and extended
At the same time, Averbach and Coriell's (1961) study was followed up by Eriksen and his group regarding its “cognitive” implications. Eriksen and Collins (1969) explored the time course for the cueing effect and found ∼100 ms to be an optimum precueing time (cf. Nakayama & MacKeben, 1989). Eriksen and Rohrbaugh (1970) discovered that focusing attention by a spatial cue worked but did so only partially and that an important source of remaining perceptual errors were confusions with a neighboring, and only a neighboring, pattern. This confirmed Korte's phenomenon b2 (see above). Eriksen and Rohrbaugh's idea of analyzing not only the correct but also the incorrect responses was rediscovered by us (Strasburger et al., 1991; Strasburger & Rentschler, 1995) without knowing about their work. Eriksen's stimulus configuration is shown in Figure 19c; the central bar constitutes what is sometimes called a symbolic cue. Interestingly, Eriksen and Rohrbaugh (p. 337) discussed an influence of lateral masking based on Flom, Heath et al. (1963) and Flom, Weymouth et al. (1963)—and erroneously dismissed it. They argued that the range of interaction reported by Flom et al. was too small to explain their results. However, they overlooked that Flom et al.'s results were obtained for the fovea, whereas their own measurements were obtained at 2.2° eccentricity where the crowding effect is much larger—a missed opportunity for an early convergence of cognitive and perceptual research. 
Instead, the mutual neglect persisted through the 1970s. Six pertinent papers in perception journals ignored Bouma's work: One of them is the paper by Townsend et al. (1971), which otherwise includes a comprehensive literature review. Another one is the follow-up study by Taylor and Brown (1972), who showed that crowding is probably of cortical origin. It disregarded Bouma (1970), Flom, Heath et al. (1963), and Flom, Weymouth et al. (1963) even though the latter had already convincingly demonstrated the cortical origin. Monti (1973) is another example. The neglect of Bouma's work by Eriksen and Eriksen (1974) is unfortunate, since their publication was seen as a milestone paper in experimental psychology. The stimulus configuration in that paper was rather similar to Bouma's—a target letter flanked on the left and right by another letter at variable distances. Wolford (1975) presented his seminal model on feature perturbations in lateral masking ignoring both the work by Bouma and Flom et al. Estes et al. (1976) isolated a loss of positional information in peripherally seen four-letter strings and presented the important concept of positional uncertainty. Comparisons are further complicated by differences in terminology, with the flankers being called noise letters in the experimental psychology literature, the task a non-search task (cf. also Eriksen & Hoffman, 1974), and the phenomenon being referred to as lateral masking or lateral interference. The same applies to Mewhort, Campbell, Marchetti, and Campbell (1981), who followed up on Eriksen and Rohrbaugh's (1970) error analysis mentioned above. 
Further important work of that time is Shaw's (1969) study on the interaction of letters in words. It stressed the decisive role of spaces in rows of letters. Bouma (1970) demonstrated in his famous paper (where he had coined the critical distance rule, cf. Chapter 5.2: Letter crowding at low contrast and Chapter 5.3: Bouma's law revisited—and extended) also an inward–outward asymmetry in recognizing border letters in a word; the follow-up paper by Bouwhuis and Bouma (1979) presented a model for recognizing three-letter words based on single-letter recognition. Anstis (1974) popularized the crowding effect with his demonstration chart, shown in Figure 19g. Lettvin (1976) wrote a beautiful paper “On seeing sidelong,” demonstrating the crowding effect and related phenomena (under the heading “Texture”)—along with puzzling phenomena in the blind spot. 
To our knowledge, Wolford and Chambers (1983) were the first who, after a long time of separation, temporarily reunited cognitive and perceptual research in peripheral vision. They argued that they could isolate the contribution of spatial attention from that of contour or feature interaction in peripheral vision. The distribution of spatial attention in their paradigm was varied indirectly by adding further characters above and below a masking flanker. A sample stimulus is shown in Figure 19d. Whereas a simple masking concept would predict that more flankers produce more masking, it turned out that the maskers could be more easily separated from the target by grouping them. The authors interpreted their findings as showing that contour interaction is the dominant factor at low lateral distance and spatial attention is dominant at greater lateral distance. Note that greater distances have been preferred in many cognitive studies. Investigating the influence of grouping on crowding has recently attracted new interest (e.g., Levi & Carney, 2009; Livne & Sagi, 2007, 2010; Malania, Herzog, & Westheimer, 2007, May & Hess, 2007). 
All the work on crowding so far has used letters as stimuli (if we count the Landolt C as a letter). However, the phenomenon of decreased performance with nearby contours also occurs with less structured stimuli (cf. Greenwood, Bex, & Dakin, 2010; Levi & Carney, 2009; Livne & Sagi, 2010; Parth & Rentschler, 1984; van den Berg, Roerdink, & Cornelissen, 2007). Levi et al. (1984) studied the effect with Vernier targets, in the fovea and periphery. This detailed study was the first to present a perceptive field (cf. Chapter 3.6.4: Spatial summation) for the foveal crowding effect (Levi et al., 1984, Figure 6; see also Levi, 1999, for a review). Both Vernier acuity and critical crowding distance were found to scale with cortical magnification. By contrast, Toet and Levi (1992) reported a much steeper dependency for the interaction range with letter T targets, which was incompatible with cortical magnification. Toet and Levi's study was the first to determine these fields of interaction in two dimensions (Figure 20). The interaction fields turned out to be of roughly elliptic shape, with the main axis oriented radially away from the fovea. Similar interaction fields for letters were observed by Pelli, Tillman et al. (2007) and further discussed in Pelli (2008). 
Figure 20.
 
Sample crowding interaction ranges (enlarged for better visibility by a factor of two) at three eccentricities for one subject, given by Toet and Levi (1992, Figure 6). Toet & Levi's stimulus configuration (for closest lateral distance) is shown in Figure 19f. With permission from Pion Ltd, London.
Figure 20.
 
Sample crowding interaction ranges (enlarged for better visibility by a factor of two) at three eccentricities for one subject, given by Toet and Levi (1992, Figure 6). Toet & Levi's stimulus configuration (for closest lateral distance) is shown in Figure 19f. With permission from Pion Ltd, London.
Figure 19f shows Toet and Levi's (1992) stimulus configuration depicting the special case where the flankers are so close that, unlike in many other studies, a crowding effect is found even in the fovea. The required flanker distance for foveal crowding was 0.07° (p. 1355) or even 0.04° (from their Figure 5). It thus seems that the patterns must be shaped so that they can overlap to some degree for achieving a foveal effect. 
Crowding is of particular significance in two groups of disorders, amblyopia and dyslexia (cf. in particular, Levi, Sireteanu, Hess, Geiger, and Lettvin; see Strasburger, 2003b). Furthermore, the absence of foveal crowding in the adult seems to be a result of development. Atkinson, Pimm-Smith, Evans, Harding, and Braddick (1986), for example, reported that while 6-year-old children have fully developed acuity they do show a pronounced foveal crowding effect. 
5.2. Letter crowding at low contrast
Our own research on crowding started with a parametric study (Strasburger et al., 1991), where we introduced a new paradigm by measuring the contrast threshold for recognition of a character in the presence of flankers with the same contrast (Figure 19e). Thresholds were determined by an adaptive (maximum likelihood) algorithm. We measured contrast vs. target size trade-off functions at 2°, 4°, 6°, 8°, 10°, and 12° eccentricities and varied flanking distance at two fixed locations (0° and 4° eccentricities) from the minimum possible up to 2°. Furthermore, we employed an error analysis (expanded upon in later studies) where the incorrect answers were classified into confusions with the left or right flanker and random errors. There were five main results: (1) As in unflanked character recognition, there was a trade-off between contrast and size (similar to Figure 13). However, the trade-off functions differed from those in the unflanked condition in a complex way (i.e., greater differences occurred with small rather than with large letter sizes). Thus, crowding at high contrast or small size is just a special case that cannot be generalized to crowding at low contrast or large size (Strasburger et al., 1991, Figures 4 and 5). (2) There was no reliable crowding effect in the fovea but a strong effect emerged already at 2° eccentricity. (3) Bouma's rule of thumb was confirmed, i.e., critical flanker distance was proportional to eccentricity (Strasburger et al., 1991, Table 1 and Figure 6). (4) Critical distance depends mostly on visual field position (target eccentricity) but hardly on target size (Strasburger et al., 1991, Figure 6B). This finding was later confirmed by Pelli et al. (2004), who considered it to be the key characteristic distinguishing crowding from what they referred to as ordinary masking (Table 2 on p. 1143, line “f”). (5) Many incorrect responses turned out to be confusions with a flanker (Strasburger et al., 1991, Table 2). This confirmed Eriksen and Rohrbaugh's (1970) result, Estes et al.'s (1976) concept of positional uncertainty, and Korte's mechanism b2. We proposed that part of the crowding effect is caused by imprecise focusing of attention. The importance of spatial attention in crowding has also been stressed by He, Cavanagh, and Intriligator (1996), He and Tjan, 2004, and Fang and He (2008). 
We followed up the attention hypothesis in three later papers (Strasburger, 2005; Strasburger & Malania, in revision; Strasburger & Rentschler, 1995). To explicitly steer spatial attention, we chose using a ring cue around the target of sufficient size (to avoid possible masking) presented at an optimal SOA of 150 ms before the target to maximize the transient attention effect (Eriksen & Collins, 1969; Nakayama & MacKeben, 1989). Our main findings in these studies were: 
  • 1. The crowding effect, as measured by a changed target contrast threshold, stems partly from whole-letter confusions with a flanker and partly from other sources (possibly feature misallocation; Strasburger et al., 1991, Table 2; Strasburger, 2005, Figure 3).
  • 2. The cue has a gain control effect on contrast thresholds (Strasburger, 2005, Figure 3; Strasburger and Malania, in revision, Figure 4), but the cue has no effect on positional errors (Strasburger, 2005, Table 4; Strasburger and Malania, in revision, Figure 5).
  • 3. The gain control effect is highest with flankers at a relatively close distance. These functions scale with eccentricity, i.e., are similar in shape but are shifted to larger flanker distances with increasing eccentricity (Figure 21a).
  • 4. The cueing effect on target threshold contrast is independent of cue size.
  • 5. Positional errors are highest with relatively close flankers; these functions also scale with eccentricity (Figure 21b).
Figure 21.
 
Cue effects in low-contrast letter crowding vs. flanker distance (from Strasburger & Malania, 2011). (a) Cue gain-control effect on contrast thresholds; (b) positional errors; (c) “Doughnut model”: The transparent gray mask visualizes log-contrast gain control from transient attention taken from (a). On the left is the fixation point. Note the (bright) excitatory spotlight on the target and the (dark) inhibitory surround.
Figure 21.
 
Cue effects in low-contrast letter crowding vs. flanker distance (from Strasburger & Malania, 2011). (a) Cue gain-control effect on contrast thresholds; (b) positional errors; (c) “Doughnut model”: The transparent gray mask visualizes log-contrast gain control from transient attention taken from (a). On the left is the fixation point. Note the (bright) excitatory spotlight on the target and the (dark) inhibitory surround.
Bouma's rule can be extended to describe where the maximum of these functions occurs and where the effect completely disappears, i.e., at the critical distance. We return to this in Chapter 5.3: Bouma's law revisited—and extended
Particularly, influential work on letter crowding at low contrast has been conducted by Pelli et al. (2004) who conducted a large-scale, parametric study on letter crowding in a contrast threshold paradigm. It quantitatively explored the effects of spacing, eccentricity, target size, flanker size, font, number of flankers, flanker contrast, task type (identification vs. detection), and target type (letter vs. grating). Unlike in the work by Strasburger et al., flanker contrast and size were varied independently of target contrast and size. Pelli et al. proposed a taxonomy of crowding, including seven characteristics that set it apart from lower level interaction effects (which the authors referred to as ordinary masking; Pelli et al., 2004, Table 3). Some key properties are: (a) in crowding, critical spacing is proportional to eccentricity (Bouma, 1970) and independent of size (Levi, Hariharan, & Klein, 2002; Strasburger et al., 1991), whereas in ordinary masking critical spacing is proportional to size and independent of eccentricity. (b) Crowding is specific to tasks that cannot be performed based on single feature detection (cf. Chapter 8.1: Parts, structure, and form). (c) Distinct feature detectors mediate the effects of mask and signal. (d) Crowding occurs because small feature integration fields are absent in the periphery, whereas eccentricity has no effect on ordinary masking. Property (a) is considered the hallmark of crowding. Refer to Pelli's full table for complete references and findings from which these statements were distilled. 
5.3. Bouma's law revisited—and extended
Bouma (1970) paved the way for a surprisingly simple insight into the crowding effect: The spatial range for lateral interactions between a flanker and a target pattern does not much depend on the content (i.e., the what) but on the eccentricity (i.e., the where) of the target in the visual field. Bouma formulated a rule of thumb stating that the critical flanker distance d, below which crowding sets in, when expressed as free space between the letters, is about 50% of the target's eccentricity. Pelli et al. (2004, Table 4) presented a review of critical spacing values reported in the relevant literature. Values range between 0.1 and 2.7 in the reviewed publications, with a median of 0.5 and an interquartile range from 0.3 to 0.7. This confirms Bouma's rule nicely. Further examples were given by Levi, Song, and Pelli (2007), Scolari, Kohnen, Barton, and Awh (2007), Strasburger (2005), Strasburger and Malania (in revision), van den Berg et al. (2007), and Yeshurun and Rashal (2010). 
Bouma's rule is often stated as  
\begin{eqnarray} d = bE,\quad \end{eqnarray}
(19)
(where b is 0.5). However, nowadays flanker spacing d is typically measured not as free space but as center-to-center distance. Bouma's original rule then translates into  
\begin{eqnarray} d = bE + w,\quad \end{eqnarray}
(20)
where w is the width of the letters (cf. Strasburger, 2005 for a discussion). Interestingly, the relationship is not proportionality, as is commonly quoted, but is linear with a positive y-axis intercept. The intercept on the ordinate is equal to letter size w. The non-zero intercept is important for consistency: Proportionality would be ill-behaved in and around the fovea since flankers would then need to superimpose with the target before they can crowd. Bouma's equation, in contrast, is well behaved. Note that for tasks with foveal targets, where crowding does not occur, Equation 20 is still the better description compared to proportionality (Equation 19). This is because it does imply the vanishing of crowding at the closest possible spacing. The slope b is Bouma's factor and the y-intercept w is a prediction of the critical crowding distance in the fovea, measured center to center. 
Critical spacing is often loosely defined as the minimum spacing where crowding disappears. However, Strasburger and Malania (in revision) showed that with suitably chosen axes one can obtain highly reliable estimates of the minimum spacing by way of linear regression. In their contrast threshold crowding paradigm, a log-linear scale was used for the contrast thresholds and a linear scale for the confusions of flanker and target (see Figures 21a and 21b). The resulting values of d at three eccentricities (2°, 4°, and 6°) were then fitted by the Bouma equation:  
\begin{eqnarray} {d}_{{\rm{contr}}} = 0.7E + 0.3^\circ ,\quad \end{eqnarray}
(21)
 
\begin{eqnarray} {d}_{{\rm{conf}}} = 0.625E + 0.48^\circ.\quad \end{eqnarray}
(22)
 
The resulting Bouma factors b of 0.7 and 0.625 are comparable to Bouma's (1970) original estimate of 0.5. 
In the same paper, Strasburger and Malania found that crowding does not monotonously increase with decreasing flanker distance. Instead, there is a maximum of interaction when flankers are very close (Figure 21), similar to what Flom et al. reported in 1963. This flanker distance, dmax, where a maximum of interaction occurs, scales in a similar way with eccentricity as the critical distances, i.e., it obeys Equation 20. The fitted equations are  
\begin{eqnarray} d_{{\rm{contr}}}^{{\rm{max}}} = 0.125E + 0.25^\circ ,\quad \end{eqnarray}
(23)
 
\begin{eqnarray} d_{{\rm{conf}}}^{{\rm{max}}} = 0.188E + 0.07^\circ ,\quad \end{eqnarray}
(24)
for the contrast threshold and confusion graphs, respectively, where the respective slope values are b = 0.125 and b = 0.188. 
Bouma's rule thus seems to apply to an annulus-like zone around the target, the size and shape of which depend on visual field location and scale in analogy to M-scaling (Equation 1 or 2 in Table 2) or Watson's (1987b) concept of the local spatial scale. Bouma values—in the original meaning not as fraction but as slope in Equation 20—were 70% for the cue's effect on contrast thresholds and 63% for its effect on flanker confusions and 12.5% and 19% for the respective maxima. Y-intercepts in Equations 21 to 24 are all positive and in the order of 20 arcsecond of visual angle. 
Interestingly, Petrov and McKee (2006) found very similar relationships for surround suppression with Gabor gratings. The log-contrast threshold-elevation functions are highly linear with target–surround separation for a range of conditions so that critical spacings can be reliably determined by linear regression. Furthermore, these critical spacings nicely followed a linear relationship with eccentricity, with a positive y-intercept of 0.41° (Petrov and McKee, 2006, Figure 8). From their figure, the relationship for the data of Petrov and McKee is 
\begin{eqnarray}r = 0.1E + 0.41^\circ .\quad \end{eqnarray}
(25)
 
This is quantitatively comparable with the scaling behavior of the maximum cue effect on log contrast thresholds shown in Figure 21 and described in Equation 23
5.3.1. Bouma's rule mapped onto the cortex
Formally, Equation 20 is equal to M-scaling (Equation 1 or 2 in Table 2), so it might be regarded as reflecting scaling properties of the visual pathway. Following the idea of Pelli (2008) to consider the mapping of critical spacing onto cortical areas, the analogy between M-scaling and Bouma's rule can be taken one step further: In Chapter 3.3: Schwartz's logarithmic mapping onto the cortex, we discussed Schwartz's (1980) logarithmic mapping of visual field positions onto early visual areas. Based on this, Pelli (2008, Equation 3) asserts that, if the locations of target and flankers in a crowding task are mapped onto the cortex, and if critical spacing in the visual field is proportional to eccentricity (Equation 19), then the critical spacing on the cortex is independent of eccentricity. Now, at small eccentricities, Schwartz's proportionality assumption and the resulting logarithmic mapping are not valid. In particular in the fovea, the mapping is ill-defined (since the log at zero approaches minus infinity). However, the same reasoning can be generalized to use, instead of proportionality, the standard inverse linear cortical magnification rule [Equation 2 in Table 2, where M0 is the foveal cortical magnification factor and E2 is Levi's value (at which M−1 doubles)] discussed in Chapter 3: Cortical magnification and the M-scaling concept. By integration and variable substitution, we obtain that—in analogy to Schwartz's mapping—cortical distance δ from the fovea is given by  
\begin{eqnarray} \begin{array}{@{}l@{}} \delta = \displaystyle\int\limits_{0}^{E}{{M(E)dE}} = \int\limits_{0}^{E}{{{M}_0{{\left( 1 + {E /{{E}_2}}\right)}}^{ - 1}}}\\ dE = {M}_0{E}_2\ln \left( 1 + {E /{{E}_2}}\right), \end{array}\quad \end{eqnarray}
(26)
with notations as before. We refer to this as a generalized logarithmic cortical mapping rule. Unlike the original logarithmic mapping, this rule is well defined in the fovea. 
Bouma's rule (Equation 20), in turn, can be written in analogy to Equation 2 in Table 2 as  
\begin{eqnarray} d = {d}_0\left({1 + {E /{{{\bar{E}}}_2}}} \right),\quad \end{eqnarray}
(27)
where d0 is the foveal critical spacing, and \(\bar {E}_2\) (which is not necessarily equal to E2) is the value where the foveal critical spacing d0 doubles. With Equation 2 in Table 2 and Equation 27, we can then derive how critical spacing on the cortex, Δδ, varies with eccentricity E:  
\begin{eqnarray} \Delta \delta = {M}_0{E}_2\ln \left( 1 + {d}_0\frac{1 + {E /{{{\bar{E}}}_2}}}{1 + {E /{{{\bar{E}}}_2}}}\right).\quad \end{eqnarray}
(28)
The behavior of this equation depends on the ratio E/\(\bar {E}_2\): Cortical critical spacing Δδ takes the value  
\begin{eqnarray} \Delta {\delta }_0 = {M}_0{E}_2\ln \left( {1 + {d}_0} \right),\quad \end{eqnarray}
(29)
in the fovea and from there increases or decreases with eccentricity depending on that ratio. For an eccentricity E larger than the maximum of E2 and \(\bar {E}_2\), Equation 28 quickly converges to the constant expression d0(E/\(\bar {E}_2\)). 
In summary, under the general logarithmic mapping rule (Equation 26), cortical critical spacing remains constant beyond a certain eccentricity (we expect beyond 3°), as Pelli (2008) has shown. In the fovea, however, it may be smaller or larger than that value, depending on the parameter describing cortical magnification (E2) and describing Bouma's rule (\(\bar {E}_2\)). 
5.4. Mechanisms underlying crowding
5.4.1. Classification of concepts
Crowding is one of the key characteristics that distinguish peripheral from foveal vision. Aubert and Foerster (1857) already asked themselves how to grasp that “strangely nondescript”6 percept. The question of what underlies crowding is intriguing because it goes beyond simple pattern recognition concepts and most neurocomputational modeling except for the most recent models reviewed in Chapter 8: Modeling peripheral form vision
Theories on crowding are abundant, mostly informal, and not necessarily distinct. There have been a number of attempts at their classification (Levi, 2008; Strasburger et al., 1991; Tyler & Likova, 2007). Tyler and Likova (2007) list six theoretical accounts of the neural basis of crowding in a context of theories of letter and pattern recognition: template matching, feature integrator, attentional feature conjunction, propositional enumeration, attentional tracking, and relaxation network. They warn that most accounts are far from being linked to explicit neural processes. Levi (2008) distinguishes optical, neural, and computational proposals: He lists spatial scale shift, perceptive hypercolumns, long-range horizontal connections, and contrast masking under the neural proposals. The computational proposals are organized into abnormal feature integration, loss of position information, crowding as texture perception, configural grouping, and several attentional proposals. Greenwood et al. (2010) classifies the explanatory accounts into those that rely on information loss, with crowded items being either suppressed (e.g., Chastain, 1982; Krumhansl & Thomas, 1977) or lost (He et al., 1996; Petrov & Popple, 2007), and what they call change-based models such as averaging (Parkes, Lund, Angelucci, Solomon, & Morgan, 2001) and flanker substitution (Strasburger et al., 1991; Wolford, 1975). 
Many (formal and informal) theories assume processing in two or more consecutive stages, where the first stage involves the detection of simple features and a second stage performs the combination or interpretation of the features as an object (e.g., Pelli et al., 2004; Strasburger & Rentschler, 1996). That core is then expanded. Levi and Carney (2009), for example, add a grouping mechanism acting on certain features. Pelli et al. (2004) put forward the idea of a so-called integration field, within which feature integration takes place and which is synonymous with the area circumscribed by the measured critical spacing around the signal. The concept is seen as an alternative to spatial attention, and the authors attribute crowding to the peripheral “absence of small integration fields rather than a lack of focal attention” (p. 1155). 
With a view on computational implementations, we will discuss three issues below: contrast processing including processing of content and featural errors, confusions of letter position, and the role of spatial attention. At an early stage, pattern contrast is encoded as signal intensity by the neural code, so we subsume feature detection and correct for faulty feature integration under contrast processing. Second, while we agree that faulty feature integration (as already proposed by Wolford, 1975) is a viable concept for crowding, we will argue that on its own it might not suffice for explaining the important phenomena in crowding. The whole-letter confusions observed in Strasburger et al. (1991), Strasburger (2005), or Chung and Legge (2009) seem to require a further mechanism that binds together pattern parts and assigns a position code to the assembly. Like the features' position code, this assembly position code might be processed separately and could also get lost. Separate processing of object positions is also implicit in the concept of separate processing of what and where in the ventral and dorsal streams (Ungerleider & Haxby, 1994; Ungerleider & Mishkin, 1982). Third, transient and sustained spatial attention act on or interact with those bottom-up mechanisms. Further mechanisms that we address below are surround suppression and supercrowding
5.4.2. Contrast processing and erroneous feature combinations
Wolford (1975) was the first to present a quantitative model of lateral masking, in which he introduced the concept of “feature perturbations.” Features from nearby letters intrude into the target's percept (cf. also Krumhansl & Thomas, 1977), a process termed source confusion. Pelli et al. (2004, p. 1137) called it jumbling of features. The feature space in Wolford's model was taken from Lindsay and Norman (1972); there were seven types of features including vertical lines, acute angles, and continuous curves. A letter is characterized by a certain number of each type of feature. There is a sensory store, the information of which “is processed in serial in order to identify the letters. The first task of the processor is to parse the various features into groups. … The perturbation process then becomes a random walk, where the states are represented by the various feature groups” (Wolford, 1975, pp. 191–192). 
From the terminology, the concept of feature perturbations is typical for modeling in the symbolic, non-connectionist tradition (cf. Hinton, McClelland, & Rumelhart, 1986; Smolensky, Legendre, & Miyata, 1992): There is a “processor,” which “parses” an array of “information”; features are essentially letter parts that are extracted at an earlier stage and are then entities that can or cannot move. By contrast, Strasburger and Rentschler (1996) advocated a neurally inspired two-stage theory in which features, once detected, need surplus contrast to be combined for character recognition in a subsequent neural “feature combination” stage. The difference is that in the latter, connectionist view, features are not symbols that result from parsing but are emergent properties from feedforward (and feedback) connections. The neural code at an early stage in the system is proportional to stimulus contrast. Feature detection and their combination for pattern recognition should thus be conceptualized in a stream of contrast processing. In the macaque, a possible site for feature extraction is the inferotemporal cortex (Tanaka, 1996); in humans, a candidate region is the fusiform gyrus. 
In Chapter 8: Modeling peripheral form vision, we will discuss models that are rigorous formalizations within such a framework. In particular, a new class of models has emerged, which build upon ensemble properties of the input patterns (Balas, Nakano, & Rosenholtz, 2009; Parkes et al., 2001; van den Berg, Roerdink, & Cornelissen, 2010). Compared to that of Parkes et al., the model by Balas et al. (2009) encompasses a much wider range of input patterns—including letter stimuli—and is of particular interest here. A further, neurocomputational model for crowding is based on (feedforward and feedback) neural networks (Jehee, Roelfsema, Deco, Murrea, & Lamme, 2007). Finally, Nandy and Tjan (2007) model crowding based on reverse correlation (Ahumada, 2002) and extract features that are actually used by an observer. This can be done independently of the feature's position—thus permitting to quantitatively separate degraded contrast processing of pattern content from the intrinsic positional uncertainty of features. Their approach therefore covers feature mislocalization or feature source confusion. 
5.4.3. Binding and letter source confusion
With regard to the necessity of a binding mechanism, we ought to address the following question: “What is the difference between the floating of individual features and that of a whole letter?” In a featural approach, like that of Wolford (1975) discussed in Chapter 5.4.2: Contrast processing and erroneous feature combinations, the percept of a letter arises when, for example, a majority of the detected features (or letter parts) are characteristic of that letter. So, if most of the constituting features float in synchrony, the entire letter will float as a result. If the individual features moved independently, the combined likelihood for such a synchrony would be rather low, much lower than the high frequency of letter confusions observed, e.g., by Strasburger (2005). The independent feature movements therefore must be constrained, i.e., the features (or letter parts) have to be bound in some way. We therefore propose two kinds of source confusion: In feature source confusion, individual features lose their position code, i.e., they lose the marking denoting which character they belong to (Korte's mechanism b1, cf. Appendix; Krumhansl & Thomas, 1977; Pelli et al., 2004; Saarinen, 1987, 1988; Wolford, 1975, p. 1137; Tyler & Likova, 2007, Figure 2a). By contrast, in letter source confusion, the features keep their marking denoting which character they belong to and how they are related to each other (i.e., they remain bound), but the entire character loses its position code. This is the phenomenon Korte (1923) originally described when he spoke of a “dance” of letters (Korte's Process b2). The required whole–part relationship can be made neurocomputationally explicit as shown by Hinton (1981), who proposed a distributed implementation of the relationship between wholes and parts by what he calls identity/role combinations (Hinton et al., 1986). Grouping accounts, like those of Livne and Sagi (2007, 2010) or May and Hess' (2007) “snakes & ladders”, fit into that framework as they provide the glue by which features are connected. The recent computational model by Balas et al. (2009, p. 13; cf. Chapter 8: Modeling peripheral form vision) has emerging features that “piece together simple structures”. The Gestalt concept of closure refers to the same phenomenon. Confusion of letter position has recently been confirmed in the context of a typical crowding paradigm by Chung and Legge (2009) who also present a quantitative model to predict the extent of the effect with varying eccentricity. 
5.4.4. Spatial attention
Spatial attention has attracted considerable research in the context of crowding. Generally speaking, spatial covert attention (i.e., allocating attention without eye movements), which is of particular interest here, represents just one aspect of visual selective attention (for reviews, see, e.g., Bundesen, 1990, 1998; Chalupa & Werner, 2004; Gazzaniga, 1995; LaBerge, 1995; Pelli, Cavanagh et al., 2007; Schneider, 1993; van der Heijden, 1992; for computational models of overt attention and saliency, see, e.g., Bruce & Tsotsos, 2009; Chalupa & Werner, 2004; Dashan, 2009; Kanan, Tong, Zhang, & Cottrell, 2009; Rosenholtz, 1999). For letter crowding, Wolford and Chambers (1983) were the first to quantitatively separate the effects of spatial attention and feature interaction. Strasburger et al. (1991) followed this up by proposing that the limited resolution of spatial attention underlies uncertainty about letter position in crowding. Similarly, He et al. (1996, followed up by Cavanagh & Holcombe, 2007; Fang & He, 2008) argued that peripheral crowding results from limitations set by attentional resolution. Vul, Hanus, and Kanwisher (2009, Figures 12–14) measured the shape of spatial uncertainty underlying flanker confusions (in a stimulus arrangement similar to that in Figure 19c) and predicted their data within a framework of Bayesian cognitive inference. Petrov and Meleshkevich (2011) link the inward–outward anisotropy often found in crowding to the spatial resolution of attention. 
Covert spatial attention is often operationally defined as the influence of a spatial cue (Eriksen & Collins, 1969; Eriksen & Hoffman, 1974; Eriksen & Rohrbaugh, 1970). It is commonly divided into two types, sustained versus transient (Nakayama & MacKeben, 1989), or similarly voluntary versus automatic attention (Jonides, 1981; Yantis & Jonides, 1984). Sustained attention has been shown to be anisotropic with a dominance of the horizontal meridian (MacKeben, 1999). Strasburger (2005) used an attention-attracting ring cue that produced an interesting differential effect: while the cue substantially improved recognition performance, it left confusions with flankers unchanged. This improvement in recognition provides evidence that spatial attention is concentrated at the target, either by enhancing neural activity at the target position or by suppressing activity at neighboring positions. In terms of types of attention, the standard crowding task involves sustained (voluntary) attention since subjects are aware in advance of where the stimulus will appear. In contrast, a preceding positional cue increases transient attention providing a “brighter spotlight”—while leaving position coding of the flankers unaffected. One way of implementing enhanced processing is recurrent coupling, which we return to in Chapter 8.4.2: A feedforward–feedback model of crowding
A surprising aspect of overt attention (that might or might not also hold up for covert attention) has been highlighted by Mounts (2000; followed up by McCarley & Mounts, 2007, 2008, and modeled by Cutzu & Tsotsos, 2003): Whereas the “spotlight of attention” is typically assumed to decay monotonically around its center, there may be an inhibitory surround. Mounts' results show that the inhibitory annulus is of limited extent, showing an inversion further out (reminiscent of a “Mexican hat”). Crowding data, on the other hand, only show a decay of the flankers' effects with increasing distance. How this apparent difference can be resolved is an issue of future research. 
5.4.5. Surround suppression
A further mechanism discussed in the context of crowding is that of surround suppression (Petrov et al., 2005; Petrov & McKee, 2006; Petrov & Popple, 2007; Petrov, Popple, & McKee, 2007). Based on receptive field neurophysiology, Petrov et al. (2005) defined surround suppression as impaired identification of a Gabor patch by the presence of a surrounding grating. Surround suppression has been shown to be tightly tuned to the orientation and spatial frequency of the test stimulus. Petrov and McKee (2006) compiled similarities and differences to crowding. Surround suppression and crowding share a peripheral locus, a radial–tangential anisotropy, and a tuning to orientation and spatial frequency. Furthermore, the effect in both cases depends on eccentricity rather than on stimulus size or spatial frequency. A difference is that crowding is commonly observed when target and flankers have the same contrast, whereas surround suppression occurs only when the surround contrast is higher than that of the target. Petrov et al. (2007) noted that crowding, unlike surround suppression, shows outward–inward anisotropy. However, the evidence here is mixed (Strasburger & Malania, in revision; van den Berg et al., 2007, Supplementary Figure 7). Some of the discrepancies on the anisotropy issue may be explained by the focusing of sustained attention (Petrov & Meleshkevich, 2011). 
Petrov et al. (2007) suspected that many of the similarities between crowding and surround suppression only arise because the effects in the contrast threshold crowding paradigm are confounded with surround suppression (Chapter 5.2: Letter crowding at low contrast). However, this criticism rests on the assumption that flanker contrast is considerably higher than the contrast of the test target. While this is the case in certain of the conditions described by Pelli et al. (2004), it does not apply to the paradigm used by Strasburger et al. (Strasburger, 2005; Strasburger et al., 1991; Strasburger & Malania, in revision; Strasburger & Rentschler, 1995), where target and flankers had the same contrast. So all the characteristics of crowding reported in that work remain valid, i.e., scaling with eccentricity, the relationship with confusion, and, in particular, the dependence on visual field location independent of target size, which was chosen by Pelli et al. (2004) as the main distinguishing feature of crowding. 
From the results in Petrov et al. (2005) and Petrov and McKee (2006), we have estimated the extent to which surround suppression could contribute to the results when target and flankers are of same contrast (Strasburger & Malania, in revision). We found the contribution to be rather small in a typical letter crowding paradigm—around 2.5%.7 So, while the similarities of surround suppression and crowding (summarized in Petrov et al., 2007, Table 1) are intriguing, the role played by surround suppression in letter crowding seems insignificant. 
5.4.6. Further mechanisms: Supercrowding
An interesting question has been brought up by Vickery, Shim, Chakravarthi, Jiang, and Luedeman (2009): What is the relative importance of the various mechanisms, and do their contributions act over- or under-additively? They reported an intriguing example of dramatic over-additivity, which they termed supercrowding. The authors showed that a white rectangular box around a letter T target in a crowding task vastly increased the flankers' masking effect by reducing accuracy almost by 50%, particularly at flanker distances larger than Bouma's 0.5 × eccentricity limit, where crowding is normally weak. Note that the square was presented simultaneously with the target and thus exerted a (weak) masking effect. In contrast, the attention-drawing spatial cues, like in Strasburger (2005), need to be presented at a certain SOA before the target. 
Chapter 6. Complex stimulus configurations: Textures, scenes, and faces
The majority of studies on extrafoveal pattern vision, which we reviewed in the preceding sections, have used letter-like stimuli. Here, we turn to research that employed other types of stimuli in order to explore object recognition in the peripheral visual field or mechanisms of perceptual organization that subserve this process. We will begin our overview with the latter by considering issues of texture segregation and contour integration. This proceeds on to studies involving the memorization of natural scenes, as well as their categorization, both with regard to their gist and the presence of certain classes of target objects. Finally, we will discuss some recent results on the recognition of faces and facial expressions of emotions. 
6.1. Texture segregation and contour integration
The segmentation of visual input into texture-defined regions and the extraction of contours constitute important stages of pre-processing in pattern and object recognition. Texture segmentation is generally assumed to be automatic and to proceed in parallel across the visual field (e.g., Julesz, 1981, 1986; Nothdurft, 1992; Sagi & Julesz, 1985). There is converging evidence—consistent with related studies on feature search (e.g., Fiorentini, 1989; Meinecke, 1989; Meinecke & Donk, 2002)—that optimal texture segregation does not peak in foveal vision but in the near periphery. Kehrer (1987, 1989) presented observers with brief, backward-masked texture targets composed of uniformly oriented lines that were embedded in orthogonally oriented background elements. Performance—both in terms of accuracy and reaction time—was found to be optimal in the near periphery. Decreasing the fundamental frequency of the texture display by reducing the spacing of the texture elements led to shifts in maximal performance to more eccentric locations. Meinecke and Kehrer (1994) extended these findings by showing that the eccentricity of peak performance also depends on shape properties of the local texture elements. Saarinen, Rovamo, and Virsu (1987) found a slight parafoveal advantage in texture segmentation using M-scaled dot stereograms. 
Joffe and Scialfa (1995) replicated and extended Kehrer's results by manipulating element distance and element size separately, thereby disentangling effects of spatial frequency and texture gradient. Similar to Kehrer, they found an inverse relationship between spatial frequency and the eccentricity optimal for performance, with a maximum of sensitivity at 4.7° eccentricity for low-frequency displays and at 2.6° eccentricity for high-frequency displays. Joffe and Scialfa attribute the decline of texture segmentation in foveal vision to the preponderance of smaller cells exhibiting slower response latencies, conduction velocities, and a preference for higher spatial frequencies (e.g., Shapley & Perry, 1986), as well as to the increasing number of magno cells outside the fovea (e.g., LeVay, Connolly, Houde, & van Essen, 1985). With larger eccentricities, the decreasing spatial resolution becomes the limiting factor eventually leading to a rapid drop in segmentation performance. 
In a related study, Gurnsey, Pearson, and Day (1996) observed a shift in peak performance for texture segmentation to larger eccentricities by reducing viewing distance. They attribute the drop found for more central and more peripheral test locations to a mismatch between the scale of the texture and the average size of the filters governing spatial resolution in the visual system. Accordingly, spatial filter size may be too small (i.e., resolution too high) in foveal vision; with increasing eccentricity, filter size increases and eventually reaches an optimal value before becoming too big (i.e., resolution too low) in the far periphery. 
The optimal eccentricity for texture segregation is also subject to attentional modulation, which indicates a certain susceptibility to top-down processing. Yeshurun and Carrasco (1998) showed that cueing the potential location of the target led to a performance increase at all eccentricities except for the fovea where it led to a decline. The authors attribute these differential effects to an attention-driven enhancement in spatial resolution (cf. Carrasco, Loula, & Ho, 2006), which would increase the mismatch between filter size and texture scale in foveal vision while reducing it in peripheral vision. More recently, it has been demonstrated that such an interpretation only applies to manipulations of transient attention. By contrast, for directed sustained attention, an increase of performance is found across all eccentricities including the fovea (Yeshurun, Montagna, & Carrasco, 2008). Thus, sustained attention may be a more flexible mechanism that is capable of both enhancing and reducing spatial resolution to improve performance. 
Unlike texture segmentation studies with their focus on the near periphery, experiments on contour integration have considered larger viewing fields. Such studies typically employ fields of Gabor elements that are positioned along a smooth path and embedded among distractor elements. Hess and Dakin (1997, 1999) found that the detectability of Gabor-defined contours shows a dependency on retinal eccentricity that cannot be easily explained in terms of low-level factors like acuity or contrast sensitivity. They used contours formed by Gabor elements with either the same or alternating phase relations between neighboring elements. For the former, detection performance displayed an eccentricity-dependent falloff that increased with curvedness, with performance for straight contours being almost constant up to eccentricities of 20°. By contrast, contours defined by alternating phase Gabors became undetectable at eccentricities beyond 10°, suggesting a qualitative change of contour processing in peripheral vision around that critical eccentricity. 
More recent work, however, has produced conflicting results that question this interpretation. Nugent, Keswani, Woods, and Peli (2003) replicated some of Hess and Dakin's findings but failed to observe a clear dissociation between same and alternating phase Gabor contours. Instead, they found a gradual decrease in performance with increasing eccentricity for values up to 30 deg in both conditions. For closed, recognizable shapes, Kuai and Yu (2006) demonstrated that detection performance for contours made up by alternating phase Gabors is almost constant for eccentricities up to 35 deg. Such easy recognition could be the result of top-down influences favoring the figure–ground segmentation for closed shapes (cf. Kovacs & Julesz, 1993) or of long-range interactions facilitating the processing of contours that are curved in one direction (Pettet, McKee, & Grzywacz, 1998). 
While further research is required to resolve the incoherent data regarding alternating phase Gabor contours, there is evidence that suggests—at least for same-phase Gabor contours—a theoretical link between the eccentricity dependence of contour integration and the phenomenon of crowding. May and Hess (2007) propose a model that combines elements of Pelli et al.'s (2004) crowding model (cf. Chapter 5.2: Letter crowding at low contrast) and Field, Hayes, and Hess' (1993) theoretical account of contour integration. According to the latter, elements along a smooth contour are integrated by an association field that is stronger along the axis of an element than orthogonal to it. May and Hess point out that the association field could be interpreted as an example of an integration field, which—in the context of Pelli et al.'s (2004) model—determines the spatial extent across which outputs of simple features are combined. One important prediction of May and Hess' account is that association fields increase in size with increasing eccentricity. To test their model, they compared the integration of snake and ladder contours derived from contour elements aligned either tangentially or perpendicularly to the path, respectively. In the periphery, they found the detection of ladder contours severely disrupted compared to snake contours, a result that is compatible with the idea that association fields in the periphery are larger than in the fovea. Using computer simulations applied to groups of three-letter stimuli made from short line segments, May and Hess further demonstrated that their model predicts three key characteristics highlighted by Pelli et al. (2004), namely (1) the independence of the critical spacing from letter size, (2) the linear scaling with eccentricity, and (3) a greater interference of flankers on the peripheral side of the target. 
6.2. Memorization and categorization of natural scenes
Real-world scenes take up the entire visual field, and even under laboratory conditions, depictions of natural scenes shown on a computer screen occupy a proportion of the visual field that typically includes both foveal and extrafoveal regions. There is ample evidence to suggest that observers can pick up and extract semantic information from natural scenes even at very brief presentation times down to less than 50 ms (e.g., Antes, Penland, & Metzger, 1981; Bacon-Mace, Mace, Fabre-Thorpe, & Thorpe, 2005; Fei-Fei, Iyer, Koch, & Perona, 2007; Loftus, 1972; Potter, 1976; Sanocki & Epstein, 1997). However, the impact of eccentricity on the encoding of information and scene-gist recognition has only recently been investigated more systematically. Velisavljevic and Elder (2008) examined visual short-term memory for natural scenes by measuring recognition performance for image fragments as a function of eccentricity for coherent and scrambled natural scenes. Images of coherent or scrambled natural scenes subtending 31° × 31° were briefly presented for 70 ms followed by two smaller image blocks sized 3.9°, a target block drawn from the presented image and a distractor block from an unseen image. Participants had to identify the target block in a forced-choice task. Even though the target blocks only contained image fragments rather than complete objects taken from the scene there was a distinct recognition advantage of coherent over scrambled scene images for targets presented near fixation. This advantage declined with increasing eccentricity in a roughly linear fashion and disappeared at a value of around 15°. Recognition thresholds for scrambled images were above chance, with no variation across the range of eccentricities tested. 
Control experiments showed that the coherent-image advantage could not be attributed to the greater saliency of image content near the centre of the photograph, and that its decline with increasing eccentricity was the result of a breakdown at the stage of detection and encoding rather than at that of retrieval. Inverting the images with regard to orientation and/or color reduced, but did not eliminate, the advantage of coherent scenes, nor did it affect the differences in eccentricity dependence. 
Together these results indicate that the advantage of coherent images is not the result of semantic cues. The dissociation between coherent and scrambled conditions also argues against the impact of low-level factors, such as visual acuity, which should have affected performance in both conditions in the same way. Instead it suggests that visual short-term memory relies to a substantial degree on mid-level configural cues regarding shape, figure/ground segmentation, and spatial layout. Such cues seem to be effective only within the central 30° of the visual field. Velisavljevic and Elder relate the ability to detect these cues to the field defined by the critical eccentricity for curvilinear contour-binding mechanisms proposed by Hess and Dakin (1997), even though the estimated spatial extent for the latter has been somewhat smaller (20°) and the dissociative nature of such a field now appears controversial (cf. Chapter 6.1: Texture segregation and contour integration). 
Larson and Loschky (2009) investigated the relative importance of central versus peripheral vision for recognizing scene gist, here defined as the ability to categorize it at the basic level with a single word or phrase. Scenes were presented for 106 ms in three experimental conditions: a “Window” condition involving a circular region showing the central portion of a scene and blocking peripheral information, a “Scotoma” condition in which the central portion of a scene was blocked out, and a “Control” condition showing the full image. The scene images subtended 27° × 27° of visual angle, while window and scotoma size varied between 1° and 13.6°. On each trial subjects had to decide whether a post-cue (“beach,” “forest,” “street”) matched the preceding target scene. 
The results showed that peripheral vision is more useful for gist recognition than central vision. For scotoma radii less than 11°, performance did not differ significantly from the control condition, whereas for window radii of only 11° or larger recognition accuracy approached the level of the control condition. A critical radius of 7.4° was found where the performance curves for the Scotoma and the Window conditions crossed, i.e., yielded equal performance. The advantage of the periphery proved to be due to a difference in size of the viewing field. When performance was normalized by viewing-field size, there was an advantage of central vision, indicating a higher efficiency for gist recognition. However, this central advantage could not be explained in terms of cortical magnification. Predicting the critical radius from cortical magnification functions (here: Florack, 2007; van Essen et al., 1984) based on the assumption that equal V1 activation would produce equal performance, Larson and Loschky obtained values in the range of 2.4° to 3.2°—substantially less than the empirically observed value of 7.4°. Thus, peripheral vision plays a more important role in gist recognition than predicted by cortical magnification. 
One factor not controlled for in this study is the presence and spatial distribution of diagnostic objects that could facilitate recognition of scene gist (e.g., Bar & Ullman, 1996; Davenport & Potter, 2004; Friedman, 1979). However, an explanation in terms of such objects would imply that the periphery conveys more diagnostic information than the centre. Photographic pictures typically show the opposite effect, i.e., a bias toward the centre to show important details (see also Experiment 2 in Velisavljevic & Elder, 2008), even though it cannot be ruled that this may be offset by the larger area across which information is being sampled in the periphery. Irrespective of the possible modulatory effects of diagnostic information, Larson and Loschky prefer to attribute the observed specialization of peripheral vision for gist recognition to the involvement of higher levels of processing beyond the primary visual cortex. A candidate region is the parahippocampal place area (PPA) for which a bias toward a more eccentric processing of feature information relating to buildings and scenes has been demonstrated (Hasson, Levy, Behrmann, Hendler, & Malach, 2002; Yeshurun & Levy, 2003). Such bias may be assisted by mid-level configural cues regarding shape and figure/ground segmentation, which—as demonstrated by Velisavljevic and Elder (2008)—may be encoded across larger parts of the visual field for eccentricities up to approximately 15°. 
Few studies have investigated recognition performance in natural scenes for eccentricities above 10°. Thorpe, Gegenfurtner, Fabre-Thorpe, and Bülthoff (2001) examined the detection of animals in natural scenes that were briefly presented for 28 ms. Observers had to indicate the presence of an animal in a go/no-go task. The photographs could appear at random locations across almost the entire extent of the horizontal visual field. Accuracy was 93% for central vision and decreased linearly with increasing eccentricity. However, even at the most extreme eccentricity (70°), subjects scored 60.6% correct answers—significantly above chance (50%). This level was achieved despite the fact that the position of the image was unpredictable, ruling out the use of pre-cued attention to target locations. Successful recognition often occurred in the absence of conscious awareness (i.e., the subjects claimed to be guessing), but remained fairly unspecific. It did not allow the identification of animals beyond a mere superordinate categorical decision (i.e., animal present/absent). 
The mechanisms underlying such abilities to categorize objects in the far periphery are still unclear. Thorpe et al.'s analysis does not suggest any particular type of image feature that could support the task. Indeed, the large number of pictures and their variety seem to rule out any explanation based on the detection of a single diagnostic attribute. The use of simple heuristics based on properties of the power spectrum of natural images is also uncertain. Such techniques have been proposed (Torralba & Oliva, 2003) to explain the relative ease with which humans can spot animals in natural images in near-foveal view (e.g., Thorpe, Fize, & Marlot, 1996), but their actual use by humans has been questioned (Wichmann, Drewes, Rosas, & Gegenfurtner, 2010). In any case, their applicability would appear limited in case of extremely low-pass filtered images in the far periphery. Nevertheless, the considerable size of the pictures used in Thorpe et al.'s experiments (20° × 29°) makes it likely that target objects even at large eccentricities were shown above the acuity threshold. While crowding effects may prevent the identification of such stimuli, fragmentary feature information may still be sufficient to permit a coarse categorization at superordinate level. The latter may be assisted by an evolutionary specialization to spot animals (New, Cosmides, & Tooby, 2007), even though there is no evidence that learning or deprivation of foveal vision make its use more likely (Bourcart, Naili, Despretz, Defoort-Dhellemmes, & Fabre-Thorpe, 2010). More research is clearly needed in this area. 
6.3. Recognizing faces and facial expressions of emotions
Faces represent an important and particularly challenging type of stimulus for visual processing, but relatively few studies have specifically explored face recognition in peripheral vision. In an early study, Hübner, Rentschler, and Encke (1985) demonstrated that even for small eccentricities (here: 2 deg) size scaling according to cortical magnification (Rovamo & Virsu, 1979; cf. Chapter 3.1: The cortical magnification concept) was insufficient to equate foveal and extrafoveal recognition performance for faces embedded in spatially correlated noise. 
Mäkelä et al. (2001) measured contrast sensitivity for face identification as a function of image size (0.2° to 27.5°) and eccentricity (0° to 10°). The experiments involved a set of four black and white face images that were cropped to include only facial features and size-adjusted for equal interpupillary distance. Contrast thresholds were measured using a staircase procedure. In each trial, one stimulus which the subject had to identify in a 4-AFC procedure was shown for 500 ms. Similar to the findings of Hübner et al. (1985) for faces and Strasburger et al. (1994) for letter-like stimuli pure size scaling proved insufficient to equate foveal performance in peripheral vision. As in the study by Strasburger et al., such equivalence could be only be obtained by increasing both size and contrast. In a second experiment involving the identification of the face stimuli in two-dimensional spatial noise, the peripheral inferiority was found to be the result of a reduced efficiency in the use of contrast information for pattern matching rather than the consequence of an eccentricity-dependent attenuation in the peripheral retina and subsequent visual pathways. 
Further insight into the mechanisms underlying this decrease in recognition performance is provided by Martelli, Majaj, and Pelli (2005). These authors compared the impact of crowding on face and word recognition. They measured contrast thresholds of letters and face parts (here: the mouth region) in the absence and presence of flanking characters or other facial features, referred to as context. Stimuli were presented for 200 ms in the right visual field and with eccentricities of up to 12 deg. In each trial, the subject had to identify a target stimulus (one of five letters or one of three mouths, respectively). For peripheral vision, the presence of context features led to similar impairments, regardless whether the target was a letter or a mouth (taken from a photograph or caricature). In a further experiment involving words and face caricatures only, the impairments could be compensated for by increasing the distance of the target features (letter/mouth) from the rest of the stimulus (Figure 22a). The critical distance (defined by the onset of the impairment, cf. Figure 22b) was found to vary proportionally with eccentricity (Figure 22c) and to be independent of stimulus size (Figure 22d)—characteristics typically seen as the hallmark of crowding (Pelli et al., 2004). The proportionality constant of .34 reported by Martelli et al. is somewhat smaller than the rule-of-thumb value of .5 in Bouma's law (Bouma, 1970), but well within the range of proportionality values for crowding tasks (Pelli et al., 2004). The results suggest an extension of this law—originally established for character recognition to describe the interference between separate objects—by considering the possibility of internal crowding between parts belonging to the same object. Thus, even the recognition of a single object in peripheral view will deteriorate if diagnostic parts of this object are separated from each other by less than the critical crowding distance (see also Pelli & Tillman, 2008). Given the fundamental role of parts and features in structural models of object recognition (cf. Biederman, 1987; Hummel, 2001; Marr & Nishihara, 1978), these results imply that it is crowding that constitutes the major constraint on peripheral object recognition in general. 
Figure 22.
 
Crowding in words and faces (modified from Martelli et al., 2005, Experiment 2). (a). Illustration of critical distance. When fixating the square, the identification of a target feature (here: the central letter in the words (top), or the shape of the mouth in the face caricature (bottom)) is impaired by surrounding features (left) unless there is sufficient spatial separation (right). (b). Threshold contrast for target identification as a function of part spacing. For each eccentricity, the floor break point of the fitted lines defines the critical spacing. (c). Critical spacing as a function of eccentricity of target. The data show a linear increase of critical distance with eccentricity (average slope: 0.34). The gray diamonds refer to estimates based on the data of the face identification study by Mäkelä et al. (2001). (d). Critical distance as a function of size of target (eccentricity 12°). The data show that critical distance is virtually unaffected by size (average slope: 0.007).
Figure 22.
 
Crowding in words and faces (modified from Martelli et al., 2005, Experiment 2). (a). Illustration of critical distance. When fixating the square, the identification of a target feature (here: the central letter in the words (top), or the shape of the mouth in the face caricature (bottom)) is impaired by surrounding features (left) unless there is sufficient spatial separation (right). (b). Threshold contrast for target identification as a function of part spacing. For each eccentricity, the floor break point of the fitted lines defines the critical spacing. (c). Critical spacing as a function of eccentricity of target. The data show a linear increase of critical distance with eccentricity (average slope: 0.34). The gray diamonds refer to estimates based on the data of the face identification study by Mäkelä et al. (2001). (d). Critical distance as a function of size of target (eccentricity 12°). The data show that critical distance is virtually unaffected by size (average slope: 0.007).
In addition to identity information faces also convey cues about emotions. While the processing of facial identity and emotional expression is commonly assumed to involve separate functional and neural pathways (Bruce & Young, 1986; Hasselmo, Rolls, & Baylis, 1989; Sergent, Ohta, MacDonald, & Zuck, 1994; Ungerleider & Haxby, 1994) both are seen to rely on similar mechanisms for analyzing the configuration of facial components (Calder, Young, Keane, & Dean, 2000; Leder, Candrian, Huber, & Bruce, 2001). Unlike identification the recognition of face expressions is subject to effects of categorical perception (Etcoff & Magee, 1992), suggesting that emotions may be particularly discernible even in peripheral vision. 
Goren and Wilson (2008) compared categorization of emotional expressions in foveal and peripheral view (at an eccentricity of 8°) using sets of synthetic, bandpass-filtered face images. Facial expressions associated with the emotions happiness, fear, anger and sadness were parametrically controlled through geometric changes of ten facial features (like brow distance and mouth width). Categorization thresholds for the four emotions were measured for face stimuli with a peak spatial frequency of 10 cycles per face (8 cpd), which was halved (and picture size doubled) for eccentric presentations. Despite the scaling, thresholds distinctly increased in peripheral vision for most emotions, by about 60% –120% relative to a foveal stimulus presentation. There was no significant effect of viewing condition only for happy faces. 
Goren and Wilson conclude that emotion recognition in general may require high-spatial frequency information and therefore particularly suffer from the degradation of such frequencies in peripheral vision. They attribute the advantage of happy faces to their particular saliency. However, the origin of such saliency remains elusive. Another experiment in their study assessing discrimination thresholds between emotional and neutral bandpass-filtered faces found happy faces no more discernible than sad or fearful ones—emotions that are much harder to recognize in peripheral view. 
Calvo, Nummenmaa, and Avero (2010) assessed the recognition advantage of extrafoveally presented happy faces using a matching paradigm. Subjects were required to match a briefly presented face target with a probe word that could either represent the target emotion or not. The target face stimuli were shown for 150 ms at an eccentricity of 2.5° randomly to the left or right of fixation to avoid effects of covert attention. Happy faces attracted significantly faster correct responses than others and were less affected by stimulus inversion, a transformation known to disrupt configural processing, particularly in faces (e.g., Carey & Diamond, 1977; Yin, 1969). Calvo et al. interpret the happy-face advantage in peripheral vision as evidence of predominantly feature-based (rather than configural) processing. The latter conclusion contradicts Goren and Wilson's finding that—at least for foveally presented bandpass-faces—the categorization of happy faces showed the strongest impact of inversion. Given the many differences between the two studies, in particular with regard to stimulus choice (bandpass images vs. color photographs), eccentricity (8° vs. 2.5° and the potential role of covert attention (in Goren & Wilson), the origin of the happy-face advantage in peripheral vision continues to be unclear. 
In summary, the studies reviewed in this chapter demonstrate that peripheral vision has the potential to provide information on more complex, distributed features and permits the recognition of behaviorally relevant cues. Generally speaking, peripheral recognition of scenes, objects and faces shows a dependence on eccentricity that does not follow the predictions of cortical size-scaling and basic acuity measures. Part of the reason may be that object recognition also relies on mid-level configural cues rather than isolated low-level features alone. Such configural cues may arise from processes of perceptual organization that integrate local features into contours and carve up contours into parts. As discussed in the preceding sections there is some evidence to suggest that both contour integration and part-based recognition are subject to—and indeed limited by—crowding, a potentially important generalization of the crowding phenomenon originally established in the domain of peripheral letter recognition. However, limitations imposed by crowding may be modulated and sometimes mitigated by top-down effects (e.g., in texture segregation and contour integration), affective processing (e.g., in the recognition advantage for happy faces) and the use of fragmentary information permitting a coarse categorization of scenes or objects even at larger eccentricities. The mechanisms underlying these modulating effects are not yet well understood. 
Chapter 7. Learning and spatial generalization across the visual field
The studies on extrafoveal pattern vision considered so far avoided effects of learning. This was achieved either by using familiar stimuli like letters, objects, faces and scenes, or—in case of degraded versions thereof—by including extensive practice sessions prior to the main experiment to familiarize observers with the material and ensure a stable response behavior. In this section we will turn to work employing recognition tasks that explicitly address learning-induced changes of performance, either at a perceptual level or at a level involving the acquisition of new pattern categories. Intimately related to learning is the issue of generalization. For peripheral vision of particular relevance—given its considerable variance across the visual field—is the question of spatial generalization, i.e., to what extent translation invariance of performance is obtained if a stimulus is presented at a retinal location different to that which has been used during learning. Spatial generalization therefore is one of the prerequisites to achieve object constancy in visual perception. 
7.1. Learning
Practice can improve performance in peripheral vision in many tasks. Such improvement has typically been assessed in the parafovea and near periphery (roughly up to 10°), presumably because learning-induced changes are relatively easy to elicit in this eccentricity range. Most studies investigating the effect of practice have focused on perceptual learning, evaluating the effect of training on elementary visual functions like orientation discrimination, contrast sensitivity, and a range of acuity measures. For some of these functions, like orientation discrimination, bisection and Vernier acuity, performance in the visual periphery can be improved through training by factors of as much as three (Beard, Levi, & Reich, 1995; Crist, Kapadia, Westheimer, & Gilbert, 1997; Schoups, Vogels, & Orban, 1995). For other measures, in particular those assessing basic spatial resolution like Landolt C acuity or line resolution, the susceptibility to learning appears questionable (Westheimer, 2001). Perceptual learning experiments commonly employ training schedules that extend over several days and involve up to several thousand trials. During the training, performance improves along a trajectory that often shows a steep increase during the first few hundred of trials followed by more gradual but significant improvements thereafter (Fahle, Edelman, & Poggio, 1995). However, perceptual learning shows considerable individual variability and does not occur in all subjects (Beard et al., 1995). 
The specificity of training effects has been used to locate the neural substrate underlying perceptual learning. Such specificity has been reported for the discrimination of patterns of a similar orientation (Fiorentini & Berardi, 1981; Poggio, Fahle, & Edelman, 1992), spatial frequency (Fiorentini & Berardi, 1981) and retinal location (Fiorentini & Berardi, 1981; Kapadia, Gilbert, & Westheimer, 1994; Karni & Sagi, 1991; Schoups et al., 1995). Another approach has been to assess the transfer between the eyes. Here the results are more mixed and dependent on the particular function investigated. Complete or nearly complete specificity to the eye of training has been found for example in luminance detection (Sowden, Rose, & Davies, 2002), hyperacuity (Fahle, 1994; Fahle et al., 1995; Fahle & Edelman, 1993) and texture discrimination (Karni & Sagi, 1991). Complete or nearly complete generalization from the trained to the untrained eye has been reported for luminance contrast detection (Sowden et al., 2002), hyperacuity tasks (Beard et al., 1995), orientation discrimination (Schoups et al., 1995), phase discrimination (Fiorentini & Berardi, 1981), texture discrimination (Schoups & Orban, 1996) and identification of Gabor orientation (Lu, Chu, Dosher, & Lee, 2005). The observed pattern of specificity effects generally points toward a neural locus of learning within early visual areas, possibly at the level of V1 or V2, and may reflect changes in neural tuning (Poggio et al., 1992; Saarinen & Levi, 1995). However, performance improvements could in principle also be mediated by more than one mechanism rather than a unitary one and include multiple processes at various levels of the visual system (e.g., Beard et al., 1995; Lu & Dosher, 2004; Mollon & Danilova, 1996). Support for this notion comes from recent findings showing that the normal position specificity obtained for perceptual learning can be broken under certain conditions. Using a double-training paradigm involving two unrelated tasks (contrast discrimination and orientation discrimination) at separate retinal locations, Xiao et al. (2008) demonstrated a significant performance transfer for the task learned at one location to the second location that had been used for the other, apparently irrelevant, task. Zhang, Xiao, Klein, Levi, and Yu (2010) observed a similar transfer for orientation discrimination learning to a new test location by introducing at the latter a brief pre-test, which was too short to enable learning by itself. Zhang et al. interpret their findings as the result of an interaction of foveal and peripheral processing that may involve learning at more central cortical sites. Alternatively, the break up of position specificity could reflect statistical properties of the learning process that do not imply a specific brain implementation (Sagi, 2011). In any case, the question of the exact neuro-anatomical substrate underlying perceptual learning remains unresolved. 
A major constraint of peripheral vision arises from crowding effects (cf. Chapter 5: Recognition of patterns in context—Crowding), and a few recent studies have considered the susceptibility of crowding-related performance measures to perceptual learning. Chung, Legge, and Cheung (2004) measured visual span profiles, as assessed by identification performance for sequences of three letters, so-called trigrams, along lines at 10° eccentricity in the upper or lower visual field. Recognition rates improved with repeated training over four consecutive days and were accompanied by a significant increase of reading speed, measured in a separate experiment at the same eccentricities using a rapid serial visual presentation (RSVP) paradigm. Performance, in terms of both letter recognition and reading speed, also transferred from the trained to the untrained vertical hemifield and was retained for at least three months after the training. In a follow-up study, Chung (2007) reconsidered the effect of training on the identification of middle letters within trigrams of varying letter separation. Here the training extended over six days but—unlike in Chung et al. (2004)—only employed one location at 10 deg eccentricity on the vertical meridian in the inferior visual field. Post-training tests revealed a performance increase by 88% for the trained letter separation, which transferred to untrained (wider) separations as well. However, unlike Chung et al. (2004) there was no significant effect on reading speed. The reasons for this are unclear but could be related to procedural differences, in particular the involvement of multiple training locations (rather than a single one) when determining the visual span profiles in Chung et al.'s study. 
Sun, Chung, and Tjan (2010) employed ideal observer analysis and a noise-masking paradigm to further explore the mechanisms underlying the learning effects in crowding. Similar to Chung (2007) observers were trained over six days to identify closely flanked letters at an eccentricity of 10 deg in their lower right visual quadrant. The training sessions were bracketed by a pre- and a post-test. The latter involved the same retinal locations as the training but letters were embedded in white noise and presented in flanked and unflanked conditions. Test performance was characterized in terms of equivalent input noise and sampling efficiency relative to an ideal observer model (Pelli, 1981). The results showed an improvement of letter identification both in the flanked and unflanked conditions. In case of unflanked stimuli, the improvement was mostly reflected in an increase of sampling efficiency. For flanked stimuli, the improvement typically manifested itself either in an increase of sampling efficiency or a decrease of the equivalent input noise. In the context of Pelli et al.'s (2004) crowding model this pattern of results can be interpreted in terms of a window for feature integration that—as a consequence of learning—is optimized with regard to its spatial extent. The optimization process aims to establish the best compromise between a low level of input noise originating from the flankers (which decreases with window size) and high sampling efficiency, i.e., a high number of valid features (which increases with window size). In an additional retention test Sun et al. also demonstrated that the learning-induced reduction of crowding persisted at least for six months. Whether this improvement is specific for the trained retinal location remains unclear. 
Studies on perceptual learning typically focus on discrimination tasks at an early stage of visual processing, employing stimuli of either very simple (e.g., lines, gratings) or at least highly familiar structure (e.g., letters). From a more general perspective, however, the recognition of patterns and objects relies on the previously acquired categories that act as determinants for perceptual classification later on (e.g., Bruner, 1957; Rosch, 1978). A number of studies have compared foveal and extrafoveal vision regarding the potential to learn new pattern categories. 
Jüttner and Rentschler (2000) demonstrated a dissociation of category and discrimination learning in extrafoveal vision with regard to a common set of unfamiliar gray-level patterns. In their study they compared performance in two tasks where the patterns were either assigned to one class out of two classes (discrimination) or to one class out of three classes (categorization)8. Both tasks involved the same set of fifteen compound Gabor gratings specified in a two-dimensional Fourier space. Within this low-dimensional, parametric feature space the patterns formed three clusters of five samples each (Figures 23a and 23b). Participants were trained in a supervised learning paradigm (Rentschler, Jüttner, & Caelli, 1994). During category learning, the subjects learned all three classes simultaneously (Figure 23c, top). By contrast, discrimination learning involved three consecutive experiments each employing a different pair of classes (I–II, I–III, and II–III) in counterbalanced order (Figure 23c, bottom). Each learning condition was performed either in foveal or in extrafoveal view (eccentricity 3 deg), with patterns in the latter condition being size-scaled according to cortical magnification (Rovamo & Virsu, 1979; cf. Chapter 3.1: The cortical magnification concept). In the discrimination task, observers showed fast learning in both the foveal and extrafoveal viewing condition (Figure 23d). By contrast, category learning of the identical stimuli was fast only in foveal view, whereas it proceeded much more slowly (by a factor of six) in extrafoveal vision. A variance reduction of the pattern classes by a factor of 100 (see inset in Figure 23a) reduced the dissociation between extrafoveal categorization and discrimination but did not remove it. A further experiment demonstrated a transfer from discrimination to subsequent categorization only for learning in foveal view but not in extrafoveal vision. 
Figure 23.
 
Dissociation of category and discrimination learning (modified from Jüttner & Rentschler, 2000). (a). The learning signals were given by a set of fifteen compound Gabor gratings, defined in a two dimensional Fourier feature space. Within this feature space, the learning stimuli formed three clusters thus defining three classes. Two different sets of signals, A and B, were generated. They had the same configuration with respect to their of cluster means (dashed triangle) and only differed in their mean class variance σm. For signal set B the latter was reduced by a factor of 100 relative to set A, as indicated by the circles. (b). Illustration of the actual graylevel representations of the patterns in set A. (c). Learning tasks. For category learning (top), the subjects were trained with all three classes (I–III) simultaneously. For discrimination learning (bottom) the subjects were trained only with pairs of pattern classes (i.e., I vs. II, II vs. III, and I vs. III) in three consecutive experiments. (d). Mean learning time as a function of eccentricity of training location. For set A (solid lines), observers show fast discrimination learning regardless of training location. By contrast, for categorization learning duration is greatly increased in extrafoveal viewing conditions. For set B (dashed lines) the dissociation between the two tasks is still significant but markedly reduced.
Figure 23.
 
Dissociation of category and discrimination learning (modified from Jüttner & Rentschler, 2000). (a). The learning signals were given by a set of fifteen compound Gabor gratings, defined in a two dimensional Fourier feature space. Within this feature space, the learning stimuli formed three clusters thus defining three classes. Two different sets of signals, A and B, were generated. They had the same configuration with respect to their of cluster means (dashed triangle) and only differed in their mean class variance σm. For signal set B the latter was reduced by a factor of 100 relative to set A, as indicated by the circles. (b). Illustration of the actual graylevel representations of the patterns in set A. (c). Learning tasks. For category learning (top), the subjects were trained with all three classes (I–III) simultaneously. For discrimination learning (bottom) the subjects were trained only with pairs of pattern classes (i.e., I vs. II, II vs. III, and I vs. III) in three consecutive experiments. (d). Mean learning time as a function of eccentricity of training location. For set A (solid lines), observers show fast discrimination learning regardless of training location. By contrast, for categorization learning duration is greatly increased in extrafoveal viewing conditions. For set B (dashed lines) the dissociation between the two tasks is still significant but markedly reduced.
To further explore the nature of the observed dissociation between categorization and discrimination, Jüttner and Rentschler (2000) used the confusion error data to reconstruct and visualize the conceptual space in the two tasks in terms of a probabilistic virtual prototype (PVP) model (Jüttner & Rentschler, 1996, cf. Chapter 8.5.1: Statistical model of visual pattern recognition). Applied to the data of the discrimination learning task, the virtual prototype configurations combined across the three class pairings indicated well separated categories in both foveal and extrafoveal viewing conditions. By contrast, for category learning only the virtual prototypes in the foveal condition mirrored the triangular class configuration in physical feature space. For extrafoveal learning the prototype configuration showed an almost collinear arrangement. This indicates a reduced perceptual dimensionality in extrafoveal vision that affects categorization tasks involving the simultaneous separation of multiple classes along multiple feature dimensions much more severely than discrimination tasks requiring an intrinsically one-dimensional evaluation of stimulus information only (cf. Duda & Hart, 1973). 
The difficulty of extrafoveal pattern category learning can be overcome by prolonged training. By applying the PVP model to moving averages of the confusion errors during the learning process, Jüttner and Rentschler (1996) and Unzicker, Jüttner, and Rentschler (1999) showed that the acquisition of pattern categories is best described as a successive testing of hypotheses regarding the appearance of the patterns within each category. At least for foveal vision, such hypothesis testing can be simulated in terms of a quasi-propositional reasoning based on the part structure of each pattern (Jüttner, Langguth, & Rentschler, 2004; Rentschler & Jüttner, 2007; cf. Chapter 8.5.2: Representational complexity of peripheral vision). This suggests a neural locus for learning at a much later stage of visual processing compared with that for perceptual learning. Pattern category learning has been indeed found to display more complex effects of lateralization (Langguth, Jüttner, Landis, Regard, & Rentschler, 2009) and to be much less specific to the trained location in the visual field (Jüttner & Rentschler, 2008), as will be discussed in the following section. 
7.2. Spatial generalization
Despite the considerable dependence on eccentricity of many elementary visual performance measures (cf. Chapter 3: Cortical magnification and the M-scaling concept) observers' ability to recognize familiar objects is surprisingly robust against displacements across the visual field (Biederman & Cooper, 1992; Ellis, Allport, Humphreys, & Collis, 1989; Stankiewicz & Hummel, 2002). For example, Biederman and Cooper asked participants to name common objects that were presented as line drawings (4 deg image size) centered at eccentricities of 2.4 deg to the left or right of fixation. In a second block the images were presented again at either the same or the complementary position. Naming latencies and error rates in the second block were found to be reduced as a result of priming. However, the size of the priming effect was translation invariant, i.e., the same regardless of whether the prime had been presented at the same location as the test or at a different one. A control experiment employing different exemplars with the same name in the two blocks demonstrated that a substantial part of the priming was visual, and therefore could not be attributed to simple name repetition. 
Unlike familiar objects, the recognition of unfamiliar objects shows much less potential for spatial generalization. A number of studies have employed paradigms involving same/different discriminations for sequentially flashed stimuli (e.g., Dill & Fahle, 1997a; Foster & Kahn, 1985; Larsen & Bundesen, 1998). Foster and Kahn (1985) sequentially presented random dot patterns at different retinal locations. Discrimination performance was found to decline linearly with increasing spatial separation, an effect that proved independent from eccentricity-dependent variations of acuity or attention. To explain their results, Foster and Kahn proposed a continuous compensation mechanism to achieve a translation-invariant normalization. However, Dill and Fahle (1997a) observed—using a similar paradigm—that the location specificity only applied to “same” trials, a finding that argues against explanations in terms of a continuous normalization process. 
Other work has considered effects of learning to account for the contrasting results regarding the impact of translations on the recognition of familiar and unfamiliar objects. Nazir and O'Regan (1990) extended Foster and Kahn's paradigm by a learning phase, during which participants were trained to discriminate random dot patterns at a fixed location in the visual field. After having reached the learning criterion (here: 95% correct), they were tested for their ability to spatially generalize the acquired knowledge, i.e., to recognize the patterns either at the trained location, the center of the fovea, or at a mirror-symmetric location in the contralateral visual field. Discrimination accuracy dropped significantly at the two new testing locations, with error rate increasing from 5% (corresponding to the original 95% learning criterion) at the trained location to 25% at the new ones. There was no effect of distance between training and test location. In a similar study, Dill and Fahle (1997b) found evidence that discrimination learning might involve two mechanisms operating at different time scales and with different potential for spatial generalization: A fast mechanism that allows subjects to recognize patterns above chance even after a few trials and is invariant to translation; and a slow mechanism leading to further improvements which, however, are specific to the training location. Given its time course and spatial selectivity Dill and Fahle speculated that the latter may be based on perceptual learning (cf. Chapter 7.1: Learning). 
A further factor affecting translation invariance of pattern recognition is pattern structure. Dill and Edelman (2001) tested observers with sets of novel, animal-like stimuli (Figure 24a) in a sequential same–different matching task. The stimulus images (size: 3 × 2 deg) were presented at 4 deg eccentricity in one of the four quadrants, upper-left, upper-right, lower-left and lower-right. In each trial, the two patterns to be matched were presented either at the same location (“control” condition), or at locations that were spatially separated and involved either horizontally, vertically or diagonally adjacent quadrants. Complete invariance was observed for patterns that differed in constituent parts, regardless of whether the parts formed a (familiar) animal-like structure or were scrambled into (unfamiliar) spatial arrangements (Figure 24b). By contrast, translation invariance was broken if the two patterns only differed in structural composition, i.e., shared the same parts in different spatial configurations (Figure 24c). 
Figure 24.
 
Imperfect translation-invariance for recognizing configural changes in sequential pattern matching (modified from Dill & Edelman, 2001, Experiments 3 and 4). The two patterns to be matched were shown either at the same location (“control” condition), or at separate locations involving either horizontally, vertically or diagonally adjacent quadrants. (a). Examples of scrambled animal-like patterns. Stimuli within each column differ in their parts but share the same part configuration. Stimuli within each row consist of the same parts in different configurations. (b). Rate of correct responses as a function of spatial separation in the “same configuration – different part” condition. Solid line: “same” responses; dashed line: “different” responses. The data show a significant interaction of the two response types. However, the corresponding d′ values (red line) reveal no significant variation with separation. (c). As before but for correct responses in the “different configuration – same parts” condition. Again, the data show a significant interaction between “same” and “different” responses. Crucially, the corresponding d′ values display a significant effect of spatial separation. With permission from Pion Ltd, London.
Figure 24.
 
Imperfect translation-invariance for recognizing configural changes in sequential pattern matching (modified from Dill & Edelman, 2001, Experiments 3 and 4). The two patterns to be matched were shown either at the same location (“control” condition), or at separate locations involving either horizontally, vertically or diagonally adjacent quadrants. (a). Examples of scrambled animal-like patterns. Stimuli within each column differ in their parts but share the same part configuration. Stimuli within each row consist of the same parts in different configurations. (b). Rate of correct responses as a function of spatial separation in the “same configuration – different part” condition. Solid line: “same” responses; dashed line: “different” responses. The data show a significant interaction of the two response types. However, the corresponding d′ values (red line) reveal no significant variation with separation. (c). As before but for correct responses in the “different configuration – same parts” condition. Again, the data show a significant interaction between “same” and “different” responses. Crucially, the corresponding d′ values display a significant effect of spatial separation. With permission from Pion Ltd, London.
While the recognition of “structure only” stimuli may not show immediate invariance to translation, spatial generalization may be brought about by category learning. Jüttner and Rentschler (2008) traced the acquisition of categories of compound Gabor gratings—here used as unfamiliar gray-level images differing only in terms of their part structure given by the bright and dark bars along the horizontal symmetry axis (cf. Chapter 7.1: Learning). The training extended over periods of several hours in an interleaved learning and testing paradigm that either involved the same or different retinal locations at 3 deg to the left or right of fixation. The results showed that pattern categories acquired at one location became available at other locations even though there had been no position-specific feedback. 
Jüttner and Rentschler explained their findings in terms of a syntactic pattern recognition approach to category learning (Jüttner et al., 2004; Rentschler & Jüttner, 2007; cf. Chapter 8.5.2: Representational complexity of peripheral vision). It assumes that categories of Compound Gabor are described by production rules that combine multiple attributes representing either properties of individual pattern parts or those of part relations. Such rules could either involve a part-specific encoding of the visual field position of individual pattern components (yielding rules that are highly location specific), or encode the relative position for adjacent pattern components (which would produce rules that are translation invariant). A shift in the format of positional information during category acquisition would then become manifest in an emerging position invariance of visual recognition without requiring any position-specific feedback. These different ways of encoding positional information may have a correspondence in the increasing size of receptive fields along the higher stages of the ventral visual pathway in primates, in conjunction with the increasing preference of cells along this pathway for complex configural patterns rather than isolated pattern components (Tanaka, 1996, cf. Chapter 8.5.2: Representational complexity of peripheral vision). 
The notion of a representational shift during category acquisition could also explain the divergent findings regarding the translation-invariance of recognition in case of familiar (i.e., learnt) objects, as well as the lack of such invariance in case of unfamiliar (unlearnt) objects. Such shifts may not be the sole mechanism for achieving spatial invariance in the visual system. Rather they could act complementary to invariance mechanisms of more limited scope, which may be active at early and intermediate levels of feature processing and reflect automatic, adaptive responses to the spatio-temporal statistics of the visual environment (e.g., DeYoe et al., 1996; Wallis & Rolls, 1997; Wiskott & Sejnowski, 2002). 
Chapter 8. Modeling peripheral form vision
Peripheral vision is inferior to foveal vision not only in terms of low level functions but also of perceived form. This is known since Aubert and Foerster (1857) and Lettvin (1976). Yet peripheral form vision received little attention until crowding became a popular topic. This can be attributed to the fact that vision research developed efficient methodologies for exploring the limits of perception but fell short of capturing form (see Shapley et al., 1990). The situation changed more recently, when a number of common interests between the cognitive sciences and the engineering community became apparent (e.g., Deco & Rolls, 2003; Pfeifer & Bongard, 2007; Rentschler, Caelli, Bischof, & Jüttner, 2000). Thus, it is possible to combine more traditional methodologies from psychophysics with formal concepts from computer vision, artificial neural networks, and pattern recognition. 
The interest in peripheral form vision did also increase for reasons of public health. The superiority of foveal vision may be lost in case of impaired development (amblyopia; e.g., Ciuffreda, Levi, & Selenow, 1991; Sireteanu, 2001; Stuart & Burian, 1962), degenerative disease (macular degeneration; e.g., Jager, Mieler, & Miller, 2008), or brain lesion (see Grüsser & Landis, 1991). Attempts have been made to improve, by way of learning, amblyopic vision (Banks, Campbell, Hess, & Watson, 1978) and cerebral amblyopia (Rentschler, Baumgartner, Campbell, & Lehmann, 1982). More recently, progress was made in developing retinal implants to enable helpful vision in case of degenerative disease (Shire et al., 2009; Zrenner et al., 2011). Yet it is unclear to what extent form vision can be restored under such conditions. 
This chapter focuses on possibilities of formally characterizing peripheral form vision. It is organized in six parts. In the first part (Chapter 8.1: Parts, structure, and form) we consider the notions of parts, structure, and form. The second part (Chapter 8.2: Role of spatial phase in seeing form) reviews models which are rooted in traditional psychophysics. That is, relationships of Fourier phase spectra and form vision are discussed. The limitations of this approach lead to the description of peripheral form vision in terms of local magnitude and phase within a multiresolution scheme. 
For reasons of historic development, the method of classification images is considered next (Chapter 8.3: Classification images indicate how crowding works). It does not fit into a common scheme with the other approaches but offers insight into letter recognition under non-crowding and crowding conditions. 
Novel concepts inspired by the progress in computer vision and artificial neural networks are discussed in Chapter 8.4: Computational models of crowding. One model of crowding is rooted in procedures of texture analysis and synthesis developed in computer vision. Another model of crowding and visual clutter uses similar processing strategies and embraces luminance and chromaticity channels. It determines the loss of information due to spatial averaging in terms of a measure of relative entropy. A feedforward–feedback model of crowding takes evidence of reciprocal coupling between cortical areas into account. 
The fifth part of the chapter (Chapter 8.5: Pattern categorization in indirect view) introduces methodologies of directly assessing form, or overall structure, by means of pattern classification with multiple categories. Peripheral form vision is thus characterized in terms of representational complexity and processing speed. The chapter concludes with results concerning the confusion of mirror-symmetric patterns in indirect view (Chapter 8.6: The case of mirror symmetry). 
8.1. Parts, structure, and form
The notions of structure and form refer to complexities, where multiplicities of parts are ordered by sets of relations (Whyte, 1968). Structure has a physical and static connotation. Form reflects development toward order, thus involving learning. It has the aspect of shape. Similarity of shapes, which have no elements in common, was evidence for Mach (1865) and von Ehrenfels (1890) for the existence of qualities of “Gestalt.” These are often referred to by saying: “The whole is more than the sum of its parts.” Yet Minsky and Papert (1971) noted that this is a vague and metaphorical statement unless it is specified, what is meant by “parts” and by “sum.” They made this point by means of Figure 25. Many different points may be selected of the two patterns at the top, and it may be recorded what is seen within a small circle around each of these points. Unless records are kept of the circles' locations, the two figures yield identical views (Figure 25, bottom). Thus, the problem of characterizing the connectedness of local data is at the core of the attempt to understand form vision. The problem is known as the (perceptual) binding problem (e.g., Roskies, 1999; Singer, 1999; von der Malsburg, 1999). 
Figure 25.
 
Disconnected and connected figural elements and point-wise samples thereof (from Minsky & Papert, 1971).
Figure 25.
 
Disconnected and connected figural elements and point-wise samples thereof (from Minsky & Papert, 1971).
In statistical physics, cooperative phenomena were analyzed on the basis of von Neumann's (1932) generalization of the concept of entropy from thermodynamics to quantum mechanics. Entropy has been conceived of as reflecting the amount of disorder in a physical system. Von Neumann's “microscopic entropy” is more precise in that it measures the lack of information about the microstates of a physical system. This enabled Watanabe (1985, Chap. 6) to derive a measure of the strength of structure as the difference between the sum of the entropies of the parts and the entropy of the whole. The connection between entropy and information was rediscovered by Shannon and Weaver (1949), who formulated a theory of information that has found wide application in fields such as telecommunications and computing (Brillouin, 1956).9 
The existence of structure implies that the knowledge of some parts allows one to predict the whole. That is, the variety of the states of the whole is restricted despite the variety of the states of its parts (Watanabe, 1985, Chap.6). This touches upon the principle of “Prägnanz” of Gestalt psychology according to which percepts of a high degree of regularity are formed. The problem with applying information theory to form vision is obvious from noting that it assumes a recipient of information. That is, the definition of “parts” of images, or patterns, is meaningful with regard to visual processing only. Given the rich body of knowledge of receptive field structures in neurophysiology, this would seem to be an easy piece and the definition of “sum” could be adopted from information theory. Yet vision research has focused on the limits of visibility, i.e., on threshold measurements. As a result, there is not yet a generally accepted theoretical framework for two-dimensional feature extraction that takes account of the reduction of redundancy as a fundamental characteristic of biological vision (see Zetzsche, Barth, & Wegmann, 1993). 
The difficulty of reliably defining pattern parts would thus seem to be the main obstacle for characterizing visual form within the framework of information theory. We know of one approach, where this problem has been solved by encoding the information contained within image regions and by measuring two-dimensional features in such terms (Boccignone, Ferraro, & Caelli, 2001; Ferraro, Boccignone, & Caelli, 1999). In that approach the increase in entropy across spatial scales during fine-to-coarse transformations was considered. Such transformations were mediated by diffusion operators. Thus, the idea of entropy propagation across scale space bears promise of characterizing foveal and peripheral form vision within a unified concept based on first principles of statistical physics (Ferraro & Boccignone, 2009). 
A rich literature exists on artificial neural networks, where the connections of units within the networks are structured in a way that is intimately related to the learning algorithms used to train the networks (Haykin, 1999; Kohonen, 1982, 1984, 2001). Unfortunately, we know of only a few such approaches, where the modeling of peripheral form vision has been endeavored. In this chapter we focus on such models. With regard to Watanabe's above-mentioned concept of structure, they may be grouped into two categories. It is possible to make assumptions on the nature of parts and vary part entropies by locally introducing summary statistics. The effects of increasing the entropy of image parts can then be assessed by testing perceived (whole-) image structure (Chapter 8.1: Parts, structure, and form and Chapter 8.3: Classification images indicate how crowding works). Alternatively, one may directly judge the structure of the whole, or form, by studying pattern categorization (Chapter 8.4: Computational models of crowding). That possibility relies on the fact that, in the absence of diagnostic features, classification depends on the discovery of global differences in structure that enable the grouping of patterns along multiple dimensions. 
8.2. Role of spatial phase in seeing form
In earlier years, the (global) Fourier transform was considered a means of extracting characteristic measurements from stimulus patterns, termed features. The discrimination of patterns was predicted from their representations along some feature dimension. This led to the idea of foveal form vision depending on the contrast sensitivity for spatial frequency components plus the encoding of phase. Peripheral form vision was thus assumed to reflect shortcomings of encoding phase. The failure of this approach led to a model of form vision, where images were reconstructed from partial information of local phase (Chapter 8.2.1: Fourier model of form vision). These reconstructions can be regarded as first approximations of peripheral form vision (Chapter 8.2.2: Combined frequency-position representations for form vision). 
8.2.1. Fourier model of form vision
Consistent with the Fourier concept of form vision, image structure is lost in “amplitude-only” versions of a scene, where phase values are all set to zero. Not so in “phase-only” images, where phase information is left intact but the amplitude values are set to a non-zero constant over all spatial frequencies (Brettel, Caelli, Hilz, & Rentschler, 1982; Huang, Burnett, & Deczky, 1985; Oppenheim & Lim, 1981; Piotrowski & Campbell, 1982). Sensitivities to spatial phase were probed in a number of psychophysical studies using compound gratings (Barrett, Morrill, & Whitaker, 2000; Bennett & Banks, 1987, 1991; Braddick, 1981; Burr, 1980; Klein & Tyler, 1986; Lawden, 1983; Rentschler & Treutwein, 1985; Stephenson, Knapp, & Braddick, 1991). Stimulus patterns were composed of harmonic spatial frequency components in specific amplitude and phase relationships. These studies showed that, notwithstanding size scaling, patterns with identical probabilities of each luminance level (mirror-symmetric waveforms, 90/270 deg phase shift) are exceedingly difficult to discriminate in indirect view. No such difficulty exists for phase shifts producing differences in first-order statistics (0/180 deg pairs). 
Barrett et al. (2000) agreed with Rentschler and Treutwein (1985) and Stephenson et al. (1991) in that global phase is not encoded in human vision. They proposed that substantially different mechanisms mediate the two types of discrimination. The dependence of 0/180 deg discriminations on retinal eccentricity reflects functional characteristics of mechanisms mediating contrast sensitivity. For 90/270 deg discriminations, the relative positions of local features are registered, and a size-scaling factor more than ten times greater than the one for contrast detection is required to equate foveal and peripheral performance (Barrett et al., 2000). Bennett and Banks (1987, 1991) explained their findings in terms of a unified model based on even-symmetric and odd-symmetric mechanisms (Field & Nachmias, 1984). They attributed the difficulty with mirror-symmetric waveforms in indirect view to a reduction in the number or sensitivity of odd-symmetric mechanisms. 
Morrone, Burr, and Spinelli (1989) employed one-dimensional stimulus patterns composed of 256 cosine components. They succeeded to equate discrimination performance in foveal and peripheral vision by employing a common scaling factor. Different from the stimuli employed by Bennett and Banks (1987, 1991) and Rentschler and Treutwein (1985), the variation of phase changed the nature of features (edge or bar) in their stimuli but did not entail apparent displacements thereof. Morrone and co-workers thus concluded that their task was not affected by positional uncertainty in the periphery. 
Using two-dimensional gray-level textures as stimuli, Harvey, Rentschler, and Weiss (1985) found a dramatic loss of discrimination sensitivities to band-limited phase distortion with parafoveal viewing. The effect was independent of the range of spatial frequency and the type of distortion, phase quantization or phase randomization (Hübner, Caelli, & Rentschler, 1988). More specifically, grayscale textures could not be discriminated from their phase-distorted versions below 22.5 deg phase resolution. This value compared fairly well with that of 30 deg phase resolution for the discrimination of compound gratings (Burr, 1980). Yet these findings did not prove the existence of phase encoding per se since measures of image distortion in the frequency domain and in the intensity domain predicted discrimination equally well (Hübner et al., 1988). 
8.2.2. Combined frequency-position representations for form vision
The theory of linear filtering at early stages of the visual system by a multiplicity of Gabor units of even and odd symmetry (Daugman, 1984; Marcelja, 1980), or wavelet transforms (Mallat, 1989), gained acceptance in the Eighties of the past century. Field (1987), Watson (1987a), and Zetzsche and Schönecker (1987) showed that the decomposition via localized band-pass filters enables the efficient reduction of image redundancy (see also Watson, 1993). The generalized Gabor scheme of image representation involves multiresolution schemes such as the Laplacian pyramid (Burt & Adelson, 1983) and oriented edge-operators (Daugman, 1985). It accounts for position-dependent sampling, oversampling, logarithmic frequency scaling, and phase quantization (Porat & Zeevi, 1988). 
We have used such an approach for analyzing amblyopic form vision (Treutwein, Rentschler, Scheidler, Zetzsche, & Boergen, 1996). To do so, we employed a polar representation of local amplitude, or magnitude, and local phase (Behar, Porat, & Zeevi, 1992; Morrone & Owens, 1987; Wegmann & Zetzsche, 1990; Zeevi & Porat, 1989; Zetzsche & Schönecker, 1987). Local magnitude is probably being computed by complex cells in the visual cortex (Adelson & Bergen, 1985; Morrone & Burr, 1988). The mechanism of encoding local phase remains to be revealed. Images reconstructed from local-phase-only tend to adequately reproduce edge relationships while compressing gray-level information; local-magnitude-only representation distorts edge information (Zeevi & Porat, 1989). 
Treutwein et al. (1996) modeled amblyopic form vision as local-magnitude-only vision with one bit resolution of local phase. They found morphic image distortions as have been reported from crowding in normal subjects (Figure 26). Such distortions are not obvious from the image reconstructions from “complex-cells-only” representations by Shams and von der Malsburg (2002). This is not necessarily surprising since ambiguities with image reconstruction from partial information are inevitable. They are typically taken care of by imposing ad hoc constraints that are not made explicit by summary descriptions of reconstruction algorithms. 
Figure 26.
 
Original images (left column) as seen with “complex cells-only” vision (right column). These simulations are obtained from a model of amblyopic vision and provide a first approximation of peripheral form vision (from Treutwein et al., 1996).
Figure 26.
 
Original images (left column) as seen with “complex cells-only” vision (right column). These simulations are obtained from a model of amblyopic vision and provide a first approximation of peripheral form vision (from Treutwein et al., 1996).
Our successful visualization of amblyopic form vision has implications for peripheral vision given the fact that “amblyopia represents a loss of the physiological superiority of the fovea” (Burian & Von Noorden, 1974, p. 245, their italics). Thus, it seems that image reconstruction from local-magnitude-only information approximates peripheral form vision fairly well. That procedure implies an increase in part entropies. This is obvious from the fact that local magnitude can be seen as a sort of probability density for observing a feature. Thereby, it remains unknown what type of feature, edge or bar, is present and where precisely it is located. 
To summarize, the measurement of sensitivities to spatial phase did not support the view that the selection of parts of Fourier image spectra characterizes form vision. Yet these experiments suggested that local energy detection is one mechanism of form perception, which is available, though with varying sensitivity, across the visual field. This is consistent with the result that image reconstruction from local-magnitude-only information within a multiresolution scheme approximates peripheral form vision fairly well. 
8.3. Classification images indicate how crowding works
We then review results obtained by using the method of “classification images.” That method enables the analysis of observer behavior in psychophysical tasks of letter identification. Results may be compared to data obtained under different model assumptions. We therefore consider classification images within the discussion of models of peripheral form vision. 
For inputs of Gaussian white noise, input–output cross-correlation provides the impulse response of a linear system (“reverse correlation”). A generalization of this technique builds on the functional representation of non-linear systems by Wiener (1958). Higher order kernels characterizing an unknown system may be obtained by cross-correlating the output of the system with a multidimensional product formed from the input (Lee & Schetzen, 1965; Orcioni, Pirani, & Turchetti, 2005). This requires, however, mathematical assumptions that do not necessarily hold for a biological system. Moreover, reliable error bounds for the representation of specific inputs cannot be obtained (Palm & Poggio, 1977; Poggio, 1981). With these caveats in mind, we report on the method of classification images, which has gained acceptance as a tool of analyzing visual processing following the studies by Ahumada and coworkers (Ahumada, 1996, 2002; Ahumada & Beard, 1999; Ahumada & Lovell, 1971; Beard & Ahumada, 1999). 
For generating classification images, an observer is presented with patterns of zero-mean white Gaussian noise, which do or do not contain the signal that is to be detected or identified (Ahumada, 2002). After a large number of trials, the noise samples presented on trials, on which the signal was reported to be absent, are averaged and subtracted from the average of the noise samples presented on trials, on which the signal was reported to be present. The difference is the classification image. Such templates, obtained in the presence of internal noise on the observer's side, display how well the image intensity values at a given pixel correlate with the observer's response. Psychophysically obtained classification images can be compared with templates derived under different model assumptions. This reveals which model best captures the observer's processing characteristics. Classification images have been registered so far with two response categories only. Yet it is possible to adapt the method to the use of multiple response categories (Dai & Micheyl, 2010; Watson, 1998). 
Beard and Ahumada (1999) studied the detection of checkerboard and Gabor stimuli with foveal and parafoveal viewing. In a fixed-noise condition, they used the same noise sample throughout a series of trial blocks of a two-interval forced-choice paradigm. In a random-noise condition, they generated a new noise sample for each trial. With fixed noise, these authors found improved detection with larger improvement in the fovea. They explained it as a result of template learning and attributed the disadvantage of parafoveal viewing to positional uncertainty. Levi and Klein (2002) presented one-dimensional patterns composed of sinusoidal waveforms as test signals and as noise, and determined target detection as well as target position with foveal or parafoveal viewing. They found that classification images for target detection resemble test stimuli both in foveal and parafoveal vision. By contrast, these authors found position acuity to be much lower under parafoveal conditions, a result reflected in reduced observer efficiency and coarser classification images. 
Possibilities of uncovering non-linear processing characteristics from classification images were explored by Abbey and Eckstein (2002), Barth, Beard, and Ahumada (1999), Neri (2004), Solomon (2002), and Tjan and Nandy (2006). Neri studied non-linear aspects of the classification image technique using a comparative approach. This involved the generation of kernels to predict observer responses in a given task and their derivation from the second-order statistics of noise images. Tjan and Nandy (2006) proved that, given a high-contrast signal, the classification subimage from error trials contains a clear negative image of the observer's template for the input signal. This image is unaffected by intrinsic and extrinsic noise. The positive subimage from the alternative template is blurred, and the extent of blur is an estimate of spatial uncertainty. Tjan and Nandy found that, with peripheral viewing, templates are not distorted in shape and almost identical to those from foveal viewing. Yet the intrinsic spatial uncertainty is much higher with peripheral viewing. 
Letter identification under crowding conditions was investigated by Nandy and Tjan (2007), who dealt with the letters “X” and “O.” Using the method of classification images, they defined noise fields as either noise fields per se or as the sum of masking noise plus flankers. First-order templates were found reduced in contrast but undistorted in shape under flanking conditions. Nandy and Tjan then computed for each trial the correlations between pairs of noise pixels that systematically affected the observer's response. Thus, they were able to delimit second-order structural elements (oriented “dipoles”) used for the identification of target letters. These authors also estimated the spatial extent over which features are detected and used. They were led to conclude that crowding increases the amount of features invalid for target identification at the expense of valid features. As noted by Nandy and Tjan, these results support the account of feature source confusion of crowding (Krumhansl & Thomas, 1977; Pelli et al., 2004; Strasburger, 2005; Strasburger et al., 1991; Wolford, 1975). 
To summarize, measuring first-order classification images for letter identification confirmed the existence in peripheral vision of spatial uncertainty. It also revealed that crowding reduces the contrast of first-order templates but leaves their shape unaffected. Insight into the mechanism of crowding came from considering second-order statistics of external noise: Consistent with observations from the recognition of numerals, features for letter recognition can be extracted in foveal and peripheral view but “once this is accomplished, the peripheral mechanism no longer knows where a feature came from” (Nandy & Tjan, 2007, p. 22). 
8.4. Computational models of crowding
Recent models of crowding and visual clutter draw on developments in computer vision and neurophysiology. For texture analysis and synthesis, the goal was to describe a wide variety of textures within a common framework. The approach is rooted in the work of Julesz and co-workers (e.g., Caelli et al., 1978; Julesz, 1962, 1981), who explained texture perception in terms of joint probability distributions for intensities at sets of n pixels. Such descriptions are inconvenient in case of n > 2. This led to the combination in computer vision of filter theory and statistical modeling, where textures are conceived of as resulting from probability distributions on random fields. Thus it became possible to determine parameters for probability models underlying observed textures and to synthesize textures by sampling from these models (see Zhu, Wu, & Mumford, 1998). The crowding model by Balas et al. (2009) is based on such strategies (Chapter 8.4.1: Feedforward models of crowding). 
Designers of user interfaces and information displays are confronted with the problem of visual clutter. That is, too many objects in a display make the search for a target object slow or inaccurate (e.g., Rosenholtz, Li, Mansfield, & Jin, 2005). van den Berg, Cornelissen, and Roerdink (2009) presented a crowding model that also predicts clutter. It further accounts for the observation of chromaticity information contributing to crowding (van den Berg et al., 2007; Chapter 8.4.1: Feedforward models of crowding). 
Real world scenes involve occlusions, perspective and lighting conditions. Feedforward models of visual processing then fail to reliably predict recognition. Ambiguities of image interpretation can be overcome in computer vision by combining via feedback loops global object models with local analysis (e.g., Cheng, Caelli, & Sanchez-Azofeifa, 2006; Mumford, 1994). Neurophysiological results further indicate that primary visual cortex integrates global information from feedback loops with local spatial precision (Bullier, 2001; Lee et al., 1998). Such findings encourage the interpretation of foveal and peripheral form vision differing both in terms of local feature measurement and global information received via feedback. The crowding model by Jehee et al. (2007) is based on that idea (Chapter 8.4.2: A Feedforward–feedback model of crowding). 
8.4.1. Feedforward models of crowding
The theory that the “ground system” ignores relative position and evaluates statistics over the output of feature analyzers, advanced by Julesz and co-workers, received support by the crowding study of Parkes et al. (2001). These authors measured and predicted from a computational model that for judging, whether a Gabor target is tilted relatively to a surrounding array of Gabor distracters, observers rely on delimiting the average orientation of the Gabor patterns. 
Balas et al. (2009) generalized the findings of Parkes and co-workers by measuring, from a given image, some set of statistics for a pre-set region of spatial pooling. The models considered for these statistics were pixel intensity distributions, local autocorrelation functions, magnitude correlations between the states of neighbors in wavelet-based pyramid decompositions, and relative phase of wavelet features between neighboring scales. The synthesis began with an arbitrary image and iteratively applied constraints obtained from some measured statistics. The result was a new image sample having approximately the same statistics as the given image. Balas et al. used this technique to generate test patterns derived from arrays of letter targets, which were constrained by some measured summary statistics. Subjects viewed distorted test patterns in direct view and, in separate experiments, original test patterns in indirect view. Thus, it was possible to see whether human observers made the same errors with synthesized patterns as they did in indirect viewing of the original patches. Using this psychophysical procedure, Balas and co-workers were able to predict observer performance in letter identification under no-crowding and crowding conditions using magnitude correlations of wavelet states. By the same token, they answered the question of how an arbitrary image would look like in indirect view (Figure 27). 
Figure 27.
 
Crowding as a result of summary statistics within a model of texture analysis and synthesis (from Balas et al., 2009).
Figure 27.
 
Crowding as a result of summary statistics within a model of texture analysis and synthesis (from Balas et al., 2009).
Several aspects of the approach by Balas et al. (2009) are noteworthy. The model provides a rigorous formulation of the account of feature source confusion of crowding as discussed in the context of crowding data (Chapter 5: Recognition of patterns in context—Crowding) and classification images (Chapter 8.3: Classification images indicate how crowding works). It is general in the sense that it applies not only to acuity test charts or Gabor patches but also to complex gray-level images. As a consequence, the model predicts “outer crowding” with arrays of characters (e.g., Nandy & Tjan, 2007; Strasburger et al., 1991) or of numerosity judgments with framed dot patterns (Parth & Rentschler, 1984). The model by Balas et al. also covers “inner crowding” as observed by Hübner et al. (1985) for faces masked by spatially correlated noise and by Martelli et al. (2005) for face caricatures. Thus, it qualifies as a model of peripheral form vision in general. Last not least, it allows, in principle, for adaptive control of the extent of the spatial pooling area. With sufficiently small summation areas, the model would reflect foveal form vision, thus conforming to a unified account of peripheral and foveal form vision. 
van den Berg et al. (2009) pursued the idea that crowding is an important constituent of visual clutter. To do so, they used a computational architecture based on the decomposition of the RGB-input image into CIELab components, that is, into luminance, red/green-, and blue/yellow-images. The luminance components were submitted to multiscale decomposition, then to orientation decomposition, and finally to contrast filtering via difference-of-Gaussians (DOGs). The chromaticity components remained unaltered in these respects. Much as in the study by Balas et al. (2009), crowding was simulated by performing local averaging over integration fields within the images resulting from all channels. The loss of information induced by averaging was evaluated by computing Kullback–Leibler divergences (see Haykin, 1999, Section 10.2). That is, differences between the probability distributions of original and distorted component images were quantified in terms of relative entropy functions (Gibbs, 1914). The resulting measures of image degradation were pooled over orientation and chromaticity channels to obtain one global clutter value. That value correlated well with subjective clutter assessment and search performance in cluttered scenes, thus suggesting the existence of a close relationship between the phenomena of clutter and crowding. 
8.4.2. A feedforward–feedback model of crowding
As discussed by Bullier (2001), neurons in cortical areas V1 and V2 encode at high spatial precision. Corresponding to the high magnification factors of these early areas and limited axon length, horizontal connections within the areas cannot reach far in the visual field. This entails the core problem of form vision, namely the question of how local analysis and global information are integrated. A solution of this problem makes use of the fact that neurons in higher areas, such as MT, V4, TEO, and TE, have larger receptive fields and magnification factors are lower in these areas. It can be assumed therefore that the results of computations performed in higher areas are retro-injected via feedback connections to neurons of lower areas (Bullier, 2001; Lee et al., 1998). More generally, there is growing evidence from neuroanatomy and neurophysiology that the traditional interpretation of visual perception as a process, where “an input vector falls in at the eye, is fed forward through the system, and an output vector, possessing the virtues of invariance, emerges at the other end…” is inappropriate (Young, 2000, p.141). On such grounds, Roelfsema, Lamme, Spekrijse, and Bosch (2002) designed a neural network with recurrent coupling, which combines a grouping operation for image elements with contour detection. 
Jehee et al. (2007) used the same model architecture composed of five areas corresponding to cortical areas V1, V2, V4, TEO, and TE. The lowest area in the model contains a number of units with one sort of feature selectivity and the same number of units with another sort of feature selectivity. At each higher level in the model, the number of units decreases by a certain factor, and the size of the receptive fields increases by the same factor. The input image is thus represented at a coarser resolution in each successive area. High-level neurons in the model initially distinguish between low-resolution aspects of input patterns and ignore details. After a number of feedforward–feedback cycles of processing, they display selectivity for spatial detail. With the same parameters, the model accounts for crowding: When stimulus representations fall within the “feedback window” of a single high-level unit, they are subjected to grouping and cannot be enhanced individually. Thus, the model bears similarity to the attentional account of crowding (He et al., 1996), where the attentional window is thought to be not small enough to select individual targets. Given the key-role of cycles of processing, the model also accommodates the observation that peripheral form vision lacks temporal stability (e.g., Korte, 1923; Pelli et al., 2004; Rentschler, 1985; Tyler & Likova, 2007). 
In brief, computational models provide convincing descriptions of peripheral form vision and, more specifically, of crowding. Using efficient methodologies from computer vision, the feedforward model by Balas et al. (2009) allows the generation of gray-level images that can be used for psychophysically measuring and visualizing crowding. The model by van den Berg et al. (2009) has similar functional characteristics and embraces the processing of luminance and chromaticity information. It further evaluates the loss of image structure due to spatial integration in terms of relative entropies. The feedforward–feedback model by Jehee et al. (2007) is closer to the neuroanatomical and neurophysiological reality. It offers possibilities of formalizing the attentional account as well as dynamic characteristics of crowding. 
8.5. Pattern categorization in indirect view
Tyler and Likova (2007) argued that the functional and physiological causes of crowding are unsettled since concepts such as template matching, feature integration, and attentional feature conjunction fall short of explaining them. They attributed this to a lack of rules for matching sensed patterns to internalized templates and advocated the use of neurodynamic models like Hopfield neural networks to solve the problem. Hopfield nets allow template matching, i.e., the retrieval of pattern vectors (pixel matrices) stored in memory in response to the input of incomplete or noisy versions thereof (see Haykin, 1999, Chap. 14). While we agree that the use of formal concepts of pattern recognition bears promise of shedding light on the nature of peripheral form vision, we suspect that it is not the matching of sensed patterns to internalized templates as such that impairs peripheral form vision. We prefer an approach to pattern recognition that is based on a more general concept than template matching and assumes that stimuli are sorted and given meaning by assigning them to learned categories (Bruner, 1957; Bruner, Goodnow, & Austin, 1956; Watanabe, 1985). 
Our goal was to compare foveal and peripheral form vision in terms of human abilities of assigning patterns to so far unknown classes that are to be learned. This conforms to the more general definition of pattern recognition in the technical literature (e.g., Duda & Hart, 1973; Fu, 1976; Haykin, 1999; Watanabe, 1985). To achieve this, we used a psychophysical paradigm of supervised category learning for unfamiliar gray-level patterns (Caelli, Rentschler, & Scheidler, 1987). For analyzing categorization performance, we employed a new strategy of psychometrics with explicit reference to physical stimulus descriptions (PVP, see below). 
The material in this section is organized in two parts. In the first part, we introduce the model of Probabilistic Virtual Prototypes (PVP; see also Chapter 7.1: Learning). In the second part, we return to a specific experiment, the results of which explain the reduced perceptual dimensionality in indirect view (Jüttner & Rentschler, 2000; see also Chapter 7.1: Learning). 
8.5.1. Statistical model of visual pattern recognition
As discussed in Chapter 7.1: Learning, human performance in supervised category learning with unfamiliar gray-level patterns was measured in terms of time series of classification matrices (Caelli et al., 1987; Jüttner & Rentschler, 1996; Rentschler et al., 1994; Unzicker, Jüttner, & Rentschler, 1998; Unzicker et al., 1999). Classification data were predicted by using a probabilistic Bayesian classifier, operating on internalized feature vectors that result from the superposition of physical feature vectors and statistically independent error vectors. The latter are free parameters of the model. They are determined by minimizing the mean squared-error between observed and predicted data. Probabilistic virtual prototypes (PVP) are obtained as class-specific mean internalized feature vectors. These internalized representations of pattern classes are back-projected into physical feature space thus visualizing internalized and physical class representations within the same reference system. The PVP model was found to provide a more parsimonious account of perceptual categorization in peripheral vision than a number of standard models in the categorization literature (Unzicker et al., 1998). 
Using the PVP approach, we analyzed foveal and extrafoveal category learning for sets of compound Gabor patterns (see Figure 24a for an example). The pattern sets formed triangular class configurations in the defining two-dimensional Fourier feature space. For foveal learning, the virtual prototypes mirrored the configuration of the physical class means (Figure 28). The dimensionality of the physical feature space thus remained fully preserved in the internal representation. For extrafoveal learning, the PVP configurations degenerated to quasi one-dimensional formations despite that input patterns were size-scaled according to cortical magnification theory. That is, observers distinguished between the learning patterns in indirect view essentially along a single perceptual dimension. This dimension was not necessarily aligned with any of the evenness/oddness Fourier components in physical feature space. This argues against the proposal that peripheral form vision is characterized by a reduced number, or sensitivity, of odd-symmetric filter mechanisms (Bennett & Banks, 1987, 1991). 
Figure 28.
 
Internal representations of pattern categories acquired in direct (centre column) and indirect view (left and right column) by two subjects (AD and KR) in a three-class learning paradigm involving a set of 15 compound Gabor patterns. The corners of the dotted triangles represent the class means of the pattern categories within the generating evenness/oddness Fourier feature space. Internalized class prototypes (open and closed symbols) were obtained by fitting the PVP model to the psychophysical classification matrix cumulated across the learning sequence of each observer. Learning duration, as indicated by the number of learning units to criterion (numbers at the triangle tip), increases nearly ten-fold in indirect view (from Jüttner & Rentschler, 1996).
Figure 28.
 
Internal representations of pattern categories acquired in direct (centre column) and indirect view (left and right column) by two subjects (AD and KR) in a three-class learning paradigm involving a set of 15 compound Gabor patterns. The corners of the dotted triangles represent the class means of the pattern categories within the generating evenness/oddness Fourier feature space. Internalized class prototypes (open and closed symbols) were obtained by fitting the PVP model to the psychophysical classification matrix cumulated across the learning sequence of each observer. Learning duration, as indicated by the number of learning units to criterion (numbers at the triangle tip), increases nearly ten-fold in indirect view (from Jüttner & Rentschler, 1996).
The reduced perceptual dimensionality of extrafoveal vision is associated with an almost 10-fold increase in learning duration. Therefore, Unzicker et al. (1999) used the PVP approach to analyze the dynamics of category learning. They observed quasi-stationary periods of prototype configurations interspersed with abrupt configural transitions (Figure 29). That is, internal pattern representations did not evolve incrementally during learning.This suggests that peripheral form vision does not aim at matching sensed data with veridical pattern representations as in template matching. It is better understood as an inferential process (cf. Young, 2000) with a limited knowledge base. We will further discuss this hypothesis and its potential neurophysiological implications in the following section. 
Figure 29.
 
Dynamics of category learning in indirect view. Internal representations of pattern classes as in Figure 28. Observer C.Z. took 13 learning units to criterion. PVP configurations are obtained from locally averaging classification matrices by means of a Gaussian kernel with fixed spread parameter. Step size Δk is one learning unit. Decimal notations in brackets indicate the learning unit number and the root of the mean squared error of fit (from Unzicker et al., 1999). With permission from Elsevier.
Figure 29.
 
Dynamics of category learning in indirect view. Internal representations of pattern classes as in Figure 28. Observer C.Z. took 13 learning units to criterion. PVP configurations are obtained from locally averaging classification matrices by means of a Gaussian kernel with fixed spread parameter. Step size Δk is one learning unit. Decimal notations in brackets indicate the learning unit number and the root of the mean squared error of fit (from Unzicker et al., 1999). With permission from Elsevier.
8.5.2. Representational complexity of peripheral vision
For categorization tasks with pattern assignment to one out of three classes, peripheral form vision was found to be reduced to a single perceptual dimension (Jüttner & Rentschler, 1996; Rentschler et al., 1994). This contrasts with the lack of an impairment found for categorization tasks, where patterns are assigned to one out of two classes only (Jüttner & Rentschler, 2000, cf. Chapter 7.1: Learning). The structural difference between such tasks can be made explicit in the Relational Complexity Theory (RCT) proposed by Halford, Wilson, & Phillips, 1998; see also Andrews & Halford, 2002; Halford et al., 2007). Categorization with two categories involves binary relations of the form (Arg-1 greater than Arg-2), with the arguments Arg-n being the (scalar) independent variables of similarity between input patterns and two class models stored in memory. Categorization with three categories involves ternary relations. These can be decomposed into conjoint binary relations of the form {(Arg-1 greater than Arg-2) and (Arg-1 greater than Arg-3)} but not into independent binary relations. 
Within RCT the relational complexity of cognitive processes is defined by the number of interacting variables that must be represented in parallel to implement that process. It uses a metric, the representational rank, by means of which cognitive functions can be ordered according to their conceptual complexity. Representations incorporating binary relations have Rank 3, whereas representations incorporating ternary relations have Rank 4 (Halford et al., 2007). Thus, within the RCT framework, classification tasks with two classes differ from those with three classes in terms of their relational complexity. 
This structural difference may have implications for the connectivity of the central and peripheral visual field with cortical structures sub serving cognitive processing. A key structure here is the prefrontal cortex (PFC). Its functions have been characterized in terms of a system that enables the construction and maintenance of representations for guiding action and thought (for reviews see Fuster, 2001; Mesulam, 1998). These functions may be explicitly linked to the processing of relational complexity (see Halford et al., 2007). 
PFC also plays an important role in pattern categorization. Freedman, Riesenhuber, Poggio, and Miller (2003) reported enhanced selectivity of cells in monkey inferotemporal cortex (IT) after training for diagnostic features relative to stimulus features irrelevant for categorization. Yet the combination of those features into explicit category descriptions occurred at the level of PFC rather than IT. Brain imaging studies have revealed a similar organizing principle in humans, with a distinction between task-independent, shape-selective representations dominant in the lateral occipital cortex, and lateral prefrontal areas that respond explicitly to category membership (e.g., Kourtzi, Betts, Sarkheil, & Welchman, 2005; Op de Beeck, Baker, DiCarlo, & Kanwisher, 2006; Vickery et al., 2009). 
The lower representational complexity of peripheral form vision might imply that there is a lack of connections between early representations of the peripheral visual field and PFC. However, there is currently no direct evidence for such an assumption. Tanaka (1996) reported that the invariance of responses for stimulus position is first achieved in anterior IT (TE) as its neurons with large receptive fields receive inputs from neurons in posterior IT (TEO) with the same selectivity but much smaller receptive fields. Therefore, cells in the peripheral TEO might not be numerous enough to provide sufficient sampling as the central visual field is magnified in TEO (Kobatake & Tanaka, 1994). Another possibility is a reduced representation of the peripheral visual field in PFC resulting from the activation of the bottom-up attention management in the dorsal visual stream during tasks of pattern recognition (Tsubomi et al., 2009). 
To summarize, studies on pattern categorization demonstrate that cognitive processing in peripheral vision is characterized by lower representational complexity and processing speed compared to foveal vision. The superiority of the latter can be attributed to the functional capacity of an attentional controller for action and thought. The neurophysiological substrate for this functionality is provided by the prefrontal cortex. The possibility is raised that the cognitive constraints of peripheral form vision reflect a limited access of the peripheral visual field to prefrontal cortex. Thus, it is unlikely that the functional shortcomings of peripheral form vision can be fully compensated by learning. 
8.6. The case of mirror symmetry
We are left with commenting on the confusion of mirror-symmetric patterns in indirect view. Experiments with compound gratings showed that, notwithstanding size scaling, the distinction of mirror-symmetric waveforms is exceedingly difficult in indirect view (Chapter 8.1: Parts, structure, and form). It is equally difficult to distinguish mirror-symmetric patterns that consist of the same number of line segments in various spatial relationships (Saarinen, 1987, 1988). 
One would expect therefore to encounter the same problem with letter recognition. However, Higgins, Arditi, and Knoblauch (1996) obtained the same size-scaling factor for normalizing detection and identification of mirror-symmetric letters (like b and d) in direct and indirect view. This is surprising given the fact that young children (Gross & Bornstein, 1978; Mach, 1922; McMonnies, 1992) and dyslexic readers (Willows, Kruk, & Corcos, 1993) confuse mirror-image letters even in direct view. The apparent contradiction is resolved by noting that expert readers avoid mirror-image reversals by relying on left–right body awareness and linguistic skills (McMonnies, 1992). Consistent with these observations, lesions of the inferior part of the left angular gyrus in parietal cortex entail a disorder of the body schema and left–right confusion (Gerstmann syndrome; Mayer et al., 1999). Yet adults confuse mirror-symmetric letters under crowding conditions (Chung, 2010). Cognitive strategies for breaking mirror-symmetry seem to be disabled under such conditions. 
Traditional concepts of signal processing—such as cross-correlation, linear filtering, multichannel representation, even-symmetric-only or odd-symmetric-only filters, non-linear transducer functions applied to multichannel systems, and separation of on- and off-responses—are insufficient for explaining how mirror-image confusion can be avoided (Zetzsche, Krieger, & Rentschler, 1994, unpublished report to the German Research Council, DFG). However, a hypothesis can be raised by arguing from the categorization of mirror-symmetric patterns in foveal vision. Rentschler and Jüttner (2007) showed that the duration of category learning dramatically increases in the presence of symmetry relations between pattern classes (see also Chapter 7.1: Learning). However, once the concept of mirror-symmetry had been acquired, classification skills could be readily generalized to novel tasks involving mirror-symmetry. 
Rentschler and Jüttner explained these observations by using a technique of syntactical pattern recognition, where complex patterns are encoded in terms of parts and part relations (Caelli & Bischof, 1997; Caelli & Dreier, 1994; Jain & Hoffman, 1988; Jüttner, Caelli, & Rentschler, 1997). Here, each part is characterized in terms of part-specific features (e.g., size, intensity, area), and each pair of parts in terms of part-relational features (e.g., distance, contrast, angle). Two characteristics of representation were found to be crucial: First, learning symmetry relations between pattern classes involves shifts of representation toward a format in which features are combined to generate higher order features. Similarly, Ullman, Vidal-Naquet, and Sali (2002) suggested that visual features of intermediate complexity, or fragments within patterns, enable classification. Such higher order features could be part of a hierarchy of representations of increasing complexity enabling perceptual expertise (Palmeri, Wong, & Gauthier, 2004). Second, at least for some pattern parts, explicit associations between part positions relative to a scene-based reference system, and part attributes, or features, need to be preserved. These associations enable the generation of rules such as “the small light blob is to the left of the big dark blob.” In the machine vision literature, this characteristic of syntactic pattern representations is termed “part-indexing” as opposed to “attribute-indexing,” where feature part associations are ignored (Bischof & Caelli, 1997; Caelli & Bischof, 1997). 
From these findings we propose that feature part associations are needed for the categorization of mirror-symmetric patterns. The confusion of such patterns would seem to imply that such associations cannot be established with peripheral vision. It is interesting to note that a loss of feature part associations is what Nandy and Tjan (2007) identified as the mechanism of crowding. This does not necessarily imply that the mechanisms of part-indexing for the distinction of left- and right letters and for “ordinary” letter recognition in non-crowding situations are identical. 
Chapter 9. Conclusions
The conclusions of this comprehensive review of peripheral vision may be encapsulated in twelve general statements: 
  • 1. Ophthalmology, optometry, psychology, and the engineering sciences have their own traditions of research on peripheral vision. To their disadvantage, these disciplines worked independently of each other for quite a long time.
  • 2. The variation of spatial scale is the major contributor to differences in performance across the visual field. It is well described by an inverse linear function (cf. Table 2) but the scaling parameters—slope and axis intercept—vary widely among visual functions. Levi's E2 value is a useful first yardstick for their comparison (Table 4), but often two parameters are required. To equalize performance across the visual field, scaling along non-spatial stimulus dimensions—in particular pattern contrast—is required along with size scaling (Melmoth & Rovamo, 2003; Strasburger et al., 1994). Results of recent fMRI studies support the spatial-scale model for which we summarize empirical values and derive a logarithmic retinocortical mapping function which matches the inverse linear law.
  • 3. With regard to peripheral letter recognition, three observations are noteworthy. First, letter acuity is similar at high contrast to other acuities, except hyperacuities. Second, results obtained for letter recognition at high contrast do not generalize to intermediate and low contrast (Strasburger et al., 1991). Peripheral letter contrast sensitivity can be quantified using a size-contrast trade-off function (Strasburger et al., 1994). Third, Riccò's law of spatial summation does not hold for letter recognition, and letter size plays a more important role in letter recognition than in detection tasks.
  • 4. Crowding is the loss of form vision as a consequence of target patterns appearing in the spatial context of distracter patterns. It occurs when the surrounding patterns are closer than a critical distance specified by Bouma's law (1970). The latter shows a formal analogy with M-scaling (Levi et al., 1985; Strasburger, 2005) and can be stated in terms of the retinocortical mapping.
  • 5. Crowding differs from low-level contour interactions, such as lateral masking and surround suppression. A first approach to understand this phenomenon involves a two-stage theory of feature detection and feature combination (Pelli et al., 2004; Strasburger, 2005; Strasburger & Rentschler, 1996).
  • 6. Crowding is also subject to modulations by transient and sustained attention (Averbach & Coriell, 1961; Fang & He, 2008; He et al., 1996). Transient attention has a gain control effect on target contrast thresholds, which is independent of cue size, but has no effect on target–flanker confusions (Strasburger, 2005). The details of the interaction between these attentional factors with feature detection and position coding are still unresolved.
  • 7. One of the largest contributors to crowding are target–flanker confusions (Chung & Legge, 2009; Strasburger et al., 1991). Such errors may result from letter source confusion (Strasburger, 2005) or feature source confusion (e.g., Livne & Sagi, 2010; May & Hess, 2007; Wolford & Chambers, 1983) but the binding mechanisms underlying letter confusions are still unclear.
  • 8. Regarding the recognition of scenes, objects and faces in peripheral vision, performance does not generally follow predictions from cortical size-scaling and acuity measures. This indicates that configural information plays a role in the recognition of complex stimuli. Such information may result from mid-level processes of perceptual organization that integrate local features into contours, and separate contours into parts of scenes or objects. There is some evidence of contour integration and part-based recognition being limited by crowding (e.g., Martelli et al., 2005; May & Hess, 2007). However, these constraints may be modulated and sometimes mitigated by top-down effects mediated by attention, by affective processing, and by the possibility to perform coarse categorizations based on fragmentary information. Peripheral vision therefore has a generic potential to permit the recognition of behaviorally relevant cues.
  • 9. Peripheral vision may improve in many tasks by way of learning. Such learning may occur at an early perceptual level, or at a higher level involving the acquisition of pattern categories. Perceptual learning is typically location specific. It affects elementary visual functions such as orientation discrimination, contrast sensitivity, and some types of acuity. It also reduces crowding (Chung et al., 2004). The neural locus of perceptual learning is generally assumed to be within early visual areas even though there is an ongoing debate on this issue. Pattern category learning is likely to involve more central stages of visual processing and shows less specificity to retinal location. Pattern categorization in extrafoveal vision is generally limited in that overall pattern similarity cannot be appreciated. (Jüttner & Rentschler, 1996).
  • 10. Spatial generalization—or translation-invariance—of pattern recognition across the visual field, is dependent on familiarity and pattern structure. For familiar objects, recognition is robust against displacements of several degrees (e.g., Biederman & Cooper, 1992). For unfamiliar objects, immediate translation-invariance is only obtained when diagnostic part information is available (Dill & Edelman, 2001). Otherwise, such invariance can result from prolonged category learning, even if the training only involves a single retinal location (Jüttner & Rentschler, 2008). This emerging translation invariance may indicate a representational shift from location-specific attributes to position-invariant relations.
  • 11. Image reconstruction from local-magnitude-only information in a multiresolution scheme approximates peripheral form vision fairly well (Treutwein et al., 1996). This approach is generalized by replacing structural information within image regions by summary statistics (Balas et al., 2009; van den Berg et al., 2009). Balas et al.'s model thus achieved the generation of gray-level images that can be used for psychophysically measuring and visualizing crowding. The neurocomputational model of Jehee et al. (2007) demonstrates how local analysis and global information are integrated via reciprocal coupling of cortical areas. The use of classification images for letter identification confirmed the existence of spatial uncertainty in peripheral vision and provided insight into the mechanism of crowding (Nandy & Tjan, 2007). In brief, computational models support the view that crowding reflects the loss of associations between features and pattern parts (cf. Pelli et al., 2004; Wolford, 1975).
  • 12. Cognitive functions in peripheral vision (Jüttner & Rentschler, 1996, 2000; Rentschler et al., 1994) can be characterized in terms of lower representational complexity (Halford et al., 1998, 2007) and processing speed. This might reflect a limited access of the peripheral visual field to prefrontal cortex. Thus, peripheral form vision is best understood as an inferential process with a limited data base. It is further suggested that the confusion of mirror-image patterns in peripheral form vision reflects the loss of feature part associations (part-indexing; Caelli & Bischof, 1997; Rentschler & Jüttner, 2007). Taken together, the limitations on pattern representation in peripheral vision appear to be as significant as those imposed on low level functions and resulting from crowding.
Appendix: Korte's account
Korte's treatise (1923) On the apprehension of Gestalt in indirect vision is a fine example of writing in the Gestalt tradition and was the decisive text on that topic at the time. Due to its importance for current research in peripheral vision and because the Gestalt tradition has been tragically discontinued, we would like to summarize the main points in Korte's treatise. After pointing out “the fundamental importance of seeing sidelong” in normal reading since “most letters are only seen extrafoveally,” Korte described the perceptual process of perceiving letters and words and extracted general perceptual rules from his observations. He stressed the dynamic character of perception by proceeding from the general to the specific. Korte claimed that recognition occurs in three consecutive phases. The first phase of the perceptual process involves the most common features of the visual impression perceived as a whole, e.g., roundedness, angularity, conspicuousness, length, etc. The second phase is the emergence of detail, and the third phase, the unequivocal identification (p. 43). 
Korte described the second phase, which is of particular relevance in the present context, most extensively. The second phase sets in when, as sensations change, something characteristic predominates, and the Gestaltungsdrang (“compulsion to configurate” or “desire for Gestalt formation”) sets in, “creating from the clearly perceived and the diffusely remaining, the image of a character.” In the second phase, Korte stated that perception is not static, but constantly changing, with a floating of details or of “features”: “It has already been mentioned that the perceptions fluctuate extraordinarily. They do not keep still while being observed, but are permanently moving to the extent that subjects frequently describe them as “dancing.” Especially horizontal lines, ticks, and arches “whirr about aimlessly, up one minute, down the next, then right, and so on, and letters are often confused for one another. Precise localization only succeeds close up (saccadic movements of the eyeball at a distance may play a role here)” (Korte, 1923, p. 40). 
In the second phase Korte also extensively describes that not only separate features, but also whole characters hop about, which is something that has not been taken note of yet: “Firm localization of detail becomes extremely difficult. It is possible for the first and, less so, for the last letter at most. Subjects reported, e.g., “Somewhere there is a dot of an ‘i’” or “somewhere is this or that letter” Subject R reported “kä” resembled “two dancing manikins” and “two “o”s hopped about in the word.” Another subject reported for the syllable “wauß,” “The whole word jumps around.” Subject B, who was particularly experienced in indirect vision, saw a t and an o in “tot,” but was unable to say whether the o was on the right or on the left, or whether there was even half an o on either side of the t” (Korte, 1923, p. 41). 
Further on, Korte distinguished seven more specific “causes of misreading” in the second phase (Korte, 1923, p. 63 ff.). We can refer to them as Gestalt processes that underlie perceptual (and cognitive) errors in indirect vision: 
  • a) Absorption (“Aufsaugung”) and false amendment. “… a feature of a letter or a whole letter is added to another letter, or a detail becomes so dominant that it absorbs everything else.” Today's this is referred to as the wrong allocation of features, which does not happen randomly, but follows certain rules.
  • b) False localization of details both of features (b1) and whole letters (b2) (p. 41; examples given above).
  • c) Puzzling intermediate perceptual states. “For most misreadings, one will be able to point out some reason, but there are also many which cannot be explained.” (Here Korte points out the dynamic character of the perceptual process.)
  • d) Prothesis and Methathesis—in rare cases, letters are added in front of or at the end of a word.
  • e) Shortening of the perceptual image in a certain area in the visual field (p. 65–70; more details below).
  • f) Change of details in the perceived whole (e.g., assimilation of roundedness).
  • g) False cognitive set. Here Korte explained the influence of knowing which font category (“Antiqua” vs. “Fraktur” and lower vs. upper case only) the letters were taken from and whether syllables were meaningful or not.
One of these processes, (e) “perceptual shortening,” has been singled out as the first description of crowding in two recent reviews (Levi, 2008 and Tyler & Likova, 2007, based upon the translation in Pelli et al., 2004, p. 1139). In the six-page description on perceptual shortening, Korte (among others) writes: “It is as if there were pressure on both sides of the word that tends to compress it. Then the stronger, i.e., the more salient or dominant letters, are preserved, and they quasi ‘squash’ the weaker, i.e., the less salient letters, between them” (p. 69). The emphasis in the chapter on perceptual shortening is that words often appear to have fewer letters than they actually do. His examples (Figure A1) show that perceptual shortening is not specific to the crowding effect (impaired recognition, not necessarily vanishing, of a center letter), at least no more than are causes (b) to (f) (note that the example given in Tyler & Likova, 2007, Figure 2a, fits Korte's mechanism (b) “false localization of detail,” whereas their Figure 2b does not fit Korte's description). 
Figure A1.
 
(a) Five of the ten examples of perceptual shortening provided by Korte (1923, p. 67) showing meaningless syllables (sif, läunn, diecro, goruff, läff) and how they were reported by Korte's subjects (“sif” reported four times as “ff”, twice as “ss”, etc.). (b) Examples of false localization of detail with regard to whole letters (p. 42). (c) Examples of false localization of detail within letters (p. 41). Material in the the three graphs is copied from the original text and arranged, since the font (Fraktur, lower case) is not available in modern font sets.
Figure A1.
 
(a) Five of the ten examples of perceptual shortening provided by Korte (1923, p. 67) showing meaningless syllables (sif, läunn, diecro, goruff, läff) and how they were reported by Korte's subjects (“sif” reported four times as “ff”, twice as “ss”, etc.). (b) Examples of false localization of detail with regard to whole letters (p. 42). (c) Examples of false localization of detail within letters (p. 41). Material in the the three graphs is copied from the original text and arranged, since the font (Fraktur, lower case) is not available in modern font sets.
In summary, Korte has provided us with a thorough, phenomenological description of how letters and words are perceived in indirect vision. He emphasized that the perceptual process is dynamic with intermediate processing stages, resembling the workings of an associative network (McClelland & Rumelhart, 1981). Two of Korte's notions—floating of features and floating of whole characters—are of particular interest for current theorizing. 
Acknowledgments
Very special thanks to Lewis O. Harvey Jr. who brought me (HS) to the study of peripheral vision, pointed out the importance of crowding, taught us adaptive techniques back then, and welcomed me in beautiful Boulder to write my book on peripheral vision. Thanks to Terry Caelli for spirited discussions and his enthusiasm. Thanks to Ernst Pöppel for his long-standing support of our work on the visual field. We thank Dorothe Poggel for contributing to the review of temporal resolution, Keiji Tanaka, Naoyuki Osaka, Gustavo Deco, Christoph Zetzsche, Mario Ferraro, Alan Cowey, and Denis Pelli for helpful discussions, and Barry Lee for a critical look at the neurophysiology. We are indebted to Barbara Herzberger and Manfred MacKeben for skillful language editing. Particular thanks go to Manfred MacKeben for thorough reading and insightful comments on clinical and applied aspects of peripheral vision, and always being there over the years. Cordial thanks also to Ruth Rosenholtz for her in-depth review and meticulous spotting of inconsistencies. We thank the Deutsche Forschungsgemeinschaft for continuous support. 
This work is dedicated to the memory of the late Jerome Ysroael (“Jerry”) Lettvin, a genius and a friend. 
Commercial relationships: none. 
Corresponding author: Hans Strasburger.