Free
Research Article  |   April 2008
Visual short-term memory for natural scenes: Effects of eccentricity
Author Affiliations
Journal of Vision April 2008, Vol.8, 28. doi:https://doi.org/10.1167/8.4.28
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ljiljana Velisavljević, James H. Elder; Visual short-term memory for natural scenes: Effects of eccentricity. Journal of Vision 2008;8(4):28. https://doi.org/10.1167/8.4.28.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

It is well established that a range of basic visual acuities and sensitivities decline with retinal eccentricity due in part to a decline in spatial sampling in the retina. However, it is also known that not all peripheral deficits can be explained entirely by such low-level factors, suggesting a specialization of central vision for certain visual tasks. Here, we examine visual short-term memory for natural scenes and ask whether low-level factors can fully account for variations in performance across the visual field. We measure local recognition performance as a function of eccentricity for both coherent and scrambled natural scenes. We find that while spatial coherence substantially increases recognition rates for targets near fixation, the benefit of spatial coherence vanishes in the periphery. These results suggest that low-level factors cannot fully explain the decline in visual short-term memory for natural scenes in the periphery and that mechanisms selective for global configuration are largely confined to the central visual field.

Introduction
Under natural viewing conditions, human observers sample their visual environment in a sequence of discrete fixations, typically 3–4 per second (Rayner & Pollatsek, 1992). Due to both optical (Campbell & Green, 1965; Fincham, 1951) and neural (Cowey & Rolls, 1974; Daniel & Whitteridge, 1961; De Monasterio & Gouras, 1975; Wiesel, 1960) factors, the quality of the data encoded by the visual system degrades as a function of retinal eccentricity. Typically, this degradation is characterized in terms of low-level factors that limit resolution such as spatial acuity (e.g., Harris & Fahle, 1996) or contrast sensitivity (e.g., Loschky, McConkie, Yang, & Miller, 2005; Strasburger, Harvey, & Rentschler, 1991; Virsu & Rovamo, 1979; Virsu, Rovamo, Laurinen, & Näsänen, 1982). 
It is known, however, that these low-level factors do not always completely account for peripheral performance on more complex tasks. Bennett and Banks (1987), for example, showed that phase discrimination performance for compound grating stimuli declines dramatically in the periphery, even when stimuli are scaled appropriately to compensate for the reduced sampling rate in the periphery. Hess and Dakin (1997) found evidence for a qualitative breakdown in the processing of local oriented elements into curvilinear groups in the periphery. Hess and Dakin argued that their findings are not easily explained in terms of low-level visual factors like acuity and contrast sensitivity, suggesting that additional, mid-level perceptual organization factors may limit peripheral performance on the more complex task of contour grouping. 
Still more complex is the task of coding and recalling arbitrary features of a complex natural scene. In a companion study (Velisavljevic & Elder, 2002), we have shown that, overall, recognition of local components of a natural visual scene depends upon the global coherence of that scene. In this study, we examine how recognition rates vary across the visual field. Can this variation in visual short-term memory (VSTM) be accounted for fully by low-level factors such as acuity and contrast sensitivity, or are there mid-level factors, e.g., perceptual organization mechanisms, at play? We begin with a review of previous work that examines the VSTM representation of natural scenes across the visual field. 
Previous research
For the most part, studies that examine VSTM of a natural scene across the visual field test performance for object recall. These studies (Antes & Metzger, 1980; Biederman, Mezzanotte, Rabinowitz, Francolini, & Plude, 1981; Henderson & Hollingworth, 1999; Nelson & Loftus, 1980) generally show some decrease in recognition performance for objects presented eccentrically during the encoding phase. It is unclear from these studies, however, whether this decrease can be explained by low-level factors and whether these results generalize to memory for arbitrary features of natural scenes. 
A natural scene typically consists of many surfaces, textures, and structures that are not best described as objects. Functional brain imaging studies suggest that the lateral–occipital complex may be specialized for the encoding of objects (e.g., Grill-Spector, Kourtzi, & Kanwisher, 2001), while the parahippocampal place area may be specialized for the encoding of the spatial layout and geometry of scenes (e.g., Epstein, DeYoe, Press, Rosen, & Kanwisher, 2001; Epstein & Kanwisher, 1998; Nakamura et al., 2000; Ranganath, DeGutis, & D'Esposito, 2004). Thus, different neural substrates may be involved in the processing of objects and more general scene content. 
Antes (1977) and Metzger and Antes (1983) tested memory for fragments of a natural scene rather than objects. Test images were divided into a 2 × 4 grid of 8 blocks and presented for 100 ms (Antes, 1977) or for a variable duration ranging from 10 to 1000 ms (Metzger & Antes, 1983). A short time later, a probe block taken from either the original photograph or a different photograph was presented in the position in which it occurred in the photograph from which it was sampled. The task was to indicate whether the probe block was taken from the test image. For the most part, performance was higher for central than peripheral probes; thus, it appears that the decline in VSTM as a function of eccentricity does generalize to arbitrary components of a natural scene. It remains unclear, however, whether this decline can be explained by standard low-level factors such as acuity and contrast sensitivity. 
Phillips (1974) hypothesized that VSTM depends on the perceptual organization of local elements into groups. This has now been demonstrated for simple objects (Luck & Vogel, 1997; Vogel, Woodman, & Luck, 2001) but not for complex natural scenes. In this study, we attempt to distinguish the contribution of low-level and higher-level factors in determining the variation in recognition performance for natural scenes over the visual field. 
Experiments
Our first experiment employed a method similar to that of Antes (1977). An image was briefly presented and was followed by a probe display containing two smaller image blocks, one of which (the “target” block) was drawn from the briefly presented image and the other (the “distractor” block) from a random image never seen by the observer (Figure 1). The observer's task was to identify the target block drawn from the image he or she had just seen. 
Figure 1
 
Example coherent (A) and scrambled (B) test images.
Figure 1
 
Example coherent (A) and scrambled (B) test images.
We used this method to examine local VSTM across the visual field for both coherent and scrambled images in order to evaluate the role of low-level and higher-level factors in local VSTM. For the purpose of this study, we define low-level factors as those depending only upon properties of the small target block to be recognized or upon non-configural properties of the image context (e.g., luminance and color statistics). Higher-level factors, on the other hand, are assumed to rely upon configural properties of the image context and could include scene layout information or even semantic interpretations of the scene. 
We reasoned that if low-level factors could completely explain VSTM recognition performance, variations in performance across the visual field should be similar for coherent and scrambled scenes, as neither the local cues present in the target nor the luminance and color statistics of the image as a whole are altered by scrambling. In other words, we would expect a similar decline in performance for the two conditions as the distance of the target from fixation increases. If, however, higher-level factors play a role, we would expect better VSTM recognition performance for central targets embedded in coherent scenes containing contextual configural cues compared to central targets embedded in scrambled scenes, which lack such cues. Since the sensitivity of mechanisms underlying configural coding appears to decline in the periphery (e.g., Hess & Dakin, 1997), we would expect that this benefit of coherence would diminish for eccentric targets. 
Our results do show this dependence on the coherence of the image context, thus suggesting an important role for higher-level factors in this task. A control experiment shows that this effect of coherence persists when images are inverted in orientation and/or color to reduce rapid access to semantic content, thus suggesting that the effect is rooted largely in mid-level configural cues relied on for scene memory. 
In Experiment 2, we used a similar method to determine the extent to which photographic bias (e.g., a tendency to locate the main subject of the photograph near the center of the frame) might explain the decline in recognition performance with eccentricity observed for coherent scenes in Experiment 1. Results suggest that while photographic bias is partially responsible for the decline in VSTM performance with eccentricity, observer factors account for the majority of the effect. 
In Experiment 3, we directly examined whether mid-level factors at the perceptual/encoding stage of processing could account for the results of Experiment 1. The task was to detect the presence of a small scrambled section in an otherwise coherent natural image. Results showed that observers became unable to detect local incoherence at 14 deg eccentricity, roughly the same eccentricity at which configural cues ceased to aid recognition in Experiment 1. These results support the interpretation that perceptual organization mechanisms responsible for encoding configural cues are critical for VSTM of natural scenes in central vision. 
Experiment 1
Methods
Our first experiment uses a paradigm similar to Biederman (1972) and Antes (Antes, 1977; Metzger & Antes, 1983) to test local VSTM for natural scenes. Participants viewed briefly presented natural images followed by a probe stimulus composed of two small image blocks. The task was to identify which of the probe blocks was drawn from the test image. 
Participants
Three female and seven male observers between the ages of 20 and 34 (mean = 27) received Can$8/hour for participation in the experiment. All observers were naïve to the purpose of the experiment and had normal or corrected-to-normal vision. 
Apparatus
A 21-in. Sony Trinitron ® CRT monitor with a display resolution of 640 × 480 pixels and a refresh rate of 85 Hz was used to display the stimuli. The experiments were programmed with Matlab 5.2 using the Psychophysics toolbox (Brainard, 1997; Pelli, 1997) for stimulus presentation and synchronization between the graphics card and monitor. The participant was positioned with a head-chin rest 40 cm from the screen. 
Stimuli
As shown in Figure 1, we used coherent and scrambled test and mask images similar to those used by Biederman (1972). Test and mask images were drawn from a database of 14,267 JPEG color photographs of scenes that are part of the Print Artist Platinum Plus (v. 12.0) database marketed by Sierra Home. The database incorporates photographs from a variety of indoor and outdoor scenes. Each photograph was cropped to a resolution of 358 × 358 pixels and divided into an 8 × 8 grid of local image blocks displayed either in their natural order (coherent condition) or randomly scrambled (scrambled condition). 
Mask images were used to ensure that the task was not testing iconic memory (Sperling, 1960). Each test and mask image were viewed only once by each observer. Test and mask images subtended 31 × 31 deg. 
Scrambling produces intensity and color discontinuities not present in the original image. To make the two conditions more comparable, a black two-pixel thick lattice was overlaid on all test and mask images to occlude block boundaries ( Figure 1). 
The two probe blocks consisted of (1) a target block drawn from the test image and (2) a distractor block drawn from a random, unseen image. Each probe block subtended 3.9 deg of visual angle. Target and distractor blocks were selected with uniform probability from the 64 block locations. The left/right location of the target and distractor was randomized. 
Procedure
Experiment 1 consisted of three parts. Experiments 1a and 1b examined VSTM for color and monochrome images. Experiment 1c examined the effects of inverting images in orientation and/or color to reduce access to semantic representations. Each participant completed these three parts in random order. Within each part, the conditions were blocked and performed in random order. Each condition consisted of 20 practice and 100 experimental trials. Prior to each condition, the experimenter presented an example of a test image, a mask image, and a target and distractor probe block to each observer. 
The stimulus sequence is illustrated in Figure 2. Each trial began with a fixation stimulus (1 s), a test image (70.6 ms), a mask image (517 ms), and two probe blocks (until response). The participant pressed one of two keys on a computer keyboard to indicate which block she/he believed was part of the test image (target probe). There was feedback for each trial: A tone sounded if the response was incorrect. 
Figure 2
 
Stimulus sequence.
Figure 2
 
Stimulus sequence.
Experiments 1a and 1b: VSTM for color and monochrome images
In Experiments 1a and 1b, we examined how recognition rates vary across the visual field for color and monochrome images in order to test the effects of local low-level factors and higher-level configural factors determining VSTM for natural scenes. A decline in performance for peripheral targets could arguably be due to a decline in the sensitivity of either low-level or higher-level mechanisms. However, if local low-level factors wholly determine performance, we expect a similar decline for both coherent and scrambled images since the local cues present in the target are not altered by scrambling. If, on the other hand, higher-level configural cues play a role, we expect coherent images to lead to higher performance near fixation and a steeper falloff with eccentricity since the sensitivity of mechanisms underlying configural coding appears to decline in the periphery (e.g., Hess & Dakin, 1997). 
We tested both color and monochrome images in order to understand the role that color plays in VSTM for natural scenes across the visual field. Since color provides a rich visual cue, we predicted that overall performance would be better for color images than for monochrome images. A natural second hypothesis is that this effect will diminish in the periphery since cone density and color sensitivity decline as a function of eccentricity (Curcio, Sloan, Packer, Hendrickson, & Kalina, 1987; Mullen, 1991; Newton & Eskew, 2003). However, the complexity of this visual task leaves open other possibilities. If performance depends not just on the features of the target but on the features and configuration of the scene context, the dependence of the color contribution on the eccentricity of the target may be reduced. 
Methods
Test images, mask images, and target and distractor probe blocks were color in Experiment 1a and monochrome in Experiment 1b. Experiments 1a and 1b were blocked, and the order of presentation was random. Both coherent and scrambled mask images were used. 
To examine the effect of retinal eccentricity, target blocks were clustered into three groups ( Figure 3) with mean eccentricities of 6.2, 12.0, and 15.7 deg. 
Figure 3
 
Three target location groups, clustered by eccentricity.
Figure 3
 
Three target location groups, clustered by eccentricity.
Consequently, Experiments 1a and 1b were a 2 (color: present or absent) × 2 (test coherence: coherent or scrambled) × 2 (mask coherence: coherent or scrambled) × 3 (target eccentricity: near, intermediate, far) within-subject design. 
Results
A four-way within-subject ANOVA (color × test coherence × mask coherence × target eccentricity) was performed. Analysis revealed a main effect of color ( F(1,9) = 124.86, p < .0001): Performance was significantly better on average for color images. There was also a main effect of test coherence ( F(1,9) = 17.43, p = .002): Performance was significantly better on average for coherent images. A significant main effect of target eccentricity ( F(2,18) = 12.53, p < .001) was also found. There was no significant effect of mask coherence ( F(1,9) = 1.44, p = .26), and there were no significant interactions with mask coherence (all ps > .15). We therefore collapsed across mask coherence to further examine interactions between the other three factors (i.e., color, test coherence, and eccentricity). 
The results are plotted in Figure 4. The three-way interaction was not significant ( F(2,18) = 1.77, p = .20). The color × test coherence interaction was also not significant ( F(1,9) = .50, p = .50). Thus, the effect of color on performance does not seem to depend in any way on the configuration of the image. There was a marginally significant color × eccentricity interaction ( F(2,18) = 3.40, p = .056) reflecting a slightly greater rate of decrease in performance as a function of eccentricity for monochrome images as compared to color images. Most importantly, however, there was a dramatic interaction between test coherence and eccentricity ( F(2,18) = 13.42, p < .001). While performance is much higher for coherent images than scrambled images when targets are near fixation, performance for coherent images declines with eccentricity, whereas performance for scrambled images remains roughly constant. 
Figure 4
 
Recognition performance for color images (black lines) and monochrome images (gray lines). Lines represent maximum likelihood linear fits. Error bars represent ±1 standard error of the mean ( SEM).
Figure 4
 
Recognition performance for color images (black lines) and monochrome images (gray lines). Lines represent maximum likelihood linear fits. Error bars represent ±1 standard error of the mean ( SEM).
Regression analysis reveals that performance decreased as a function of eccentricity for coherent images, in both color ( r 2 = .36, F(1,28) = 15.48, p = .001) and monochrome ( r 2 = .59, F(1,28) = 40.98, p < .001) conditions. However, no significant effect of eccentricity was found for scrambled images in either the color ( r 2 = .01, F(1,28) = 0.32, p = .57) or monochrome ( r 2 = .001, F (1,28) = .02, p = .88) conditions. One possible explanation for the lack of variation in accuracy with eccentricity in the scrambled condition is a floor effect: Perhaps observers simply cannot do the task for scrambled images, regardless of the eccentricity of the target. However, the fact that performance for the scrambled condition is well above chance for both color ( t(9) = 20.91, p < .001) and monochrome ( t(9) = 6.26, p < .001) images rules this out. 
The prediction equations ( Table 1) can be used to show that the regression lines for coherent and scrambled images cross at 15.1 deg eccentricity for color images and 14.7 deg for monochrome images, suggesting that the benefit of global configuration for local VSTM is limited to the central 30 deg of the visual field. 
Table 1
 
Prediction equations for linear regression of percent correct against eccentricity (deg). Note: * P = performance (% correct); x = target probe eccentricity (deg).
Table 1
 
Prediction equations for linear regression of percent correct against eccentricity (deg). Note: * P = performance (% correct); x = target probe eccentricity (deg).
Condition Prediction equation*
Experiment 1a: Color and coherent P = 91.41 − 1.14 x
Experiment 1a: Color and scrambled P = 71.66 + .17 x
Experiment 1b: Monochrome and coherent P = 87.72 − 1.77 x
Experiment 1b: Monochrome and scrambled P = 60.98 + .05 x
Experiment 1c: Inverted and coherent P = 88.91 − 1.48 x
Experiment 1c: Inverted and scrambled P = 70.66 − 0.05 x
Discussion
The visual task in this experiment involved only recognition of local image blocks and did not depend explicitly on the global coherence of the image. Nevertheless, our results show that for central targets, human observers do rely heavily on the global configural context of the image for local VSTM. This finding is consistent with prior findings that suggest that VSTM depends upon a “chunking” process (i.e., a grouping of parts into meaningful wholes) to maximize use of limited memory resources (Luck & Vogel, 1997; Miller, 1956; Vogel et al., 2001). However, our results suggest that this reliance on higher-level cues such as global configural context for VSTM may be limited to central vision, possibly because low-level factors such as lower spatial acuity prevent the observer from perceiving configural properties of the image in the periphery. 
In addition to this indirect effect, low-level factors are also clearly contributing directly to VSTM since performance for scrambled images is well above chance. However, the fact that performance for scrambled images does not vary with the eccentricity of the target suggests that these low-level cues are likely to be statistical properties of the image context as opposed to properties of the target block per se. Such statistical representations may also underlie mechanisms for rapid scene categorization (Renninger & Malik, 2004; Steeves et al., 2004). 
We found color to be as important for peripheral targets as for targets near fixation. Given the known falloff in cone density and color sensitivity in the periphery (Curcio et al., 1987; Mullen, 1991; Newton & Eskew, 2003), this finding also points to the possibility that VSTM for peripheral targets is largely based upon statistical properties of the image as a whole rather than properties of the specific target block. Due to the spatial correlations known to be present in natural images (see, e.g., Field, 1987), centrally weighted color statistics will still be predictive of the colors present in peripheral targets. Thus, the superior performance for peripheral targets in color images over monochrome images does not necessarily reflect color processing in the periphery. Rather, it may be the color cues present in the central portion of the image that are being used to improve discrimination of these peripheral targets. 
Our results do not show any effect of mask coherence. This is interesting because scrambling makes a mask difficult to recognize conceptually, whereas coherent masks are easily recognizable, and there is a well-known conceptual masking effect in which recognizable masks reduce scene memory much more than unrecognizable (nonsense) masks (Intraub, 1984; Loftus & Ginn, 1984; Loftus, Hanna, & Lester, 1988; Potter, 1976). This result thus suggests that our task may not be indexing the same form of conceptual memory exercised in these previous experiments. Does this mean that performance in our task does not depend in any way on the semantics or the familiarity of image content? We explore this question in our next experiment. 
Experiment 1c
The results of Experiments 1a and 1b suggest that observers use global context for local VSTM in central vision. The benefit could derive from a geometrical representation, defining shapes, spatial relationships, and scene layout, that facilitates encoding and recall of local information near fixation (Hochberg, 1968, 1978, 1986; Sanocki, 2003; Sanocki & Epstein, 1997). Another possibility is that it is semantic, defining the type of scene (e.g., beach, office) or types of objects (e.g., people, houses, rocks) in the scene. It is known that observers can extract semantic information from a natural scene with viewing times under 125 ms (Antes, Penland, & Metzger, 1981; Loftus, 1972; Oliva & Schyns, 1997; Potter, 1976; Schyns & Oliva, 1994) and retain it in both short- and long-term memory (Intraub, 1981; Potter, 1975, 1976; Subramaniam, Biederman, & Madigan, 2000). 
A number of researchers (Intraub, 1984; Klein, 1982; Rock, 1974; Shore & Klein, 2000) have reported that access to semantic information is reduced if an upright image is inverted in orientation. We might expect a similar reduction if the image is inverted in color (Goffaux et al., 2005; Oliva & Schyns, 2000; Steeves et al., 2004). We decided to use both techniques to distinguish configural and semantic contributions to local VSTM in central vision (Figure 5). 
Figure 5
 
Images inverted in orientation (a and b), color (c and d), and orientation and color (e and f).
Figure 5
 
Images inverted in orientation (a and b), color (c and d), and orientation and color (e and f).
If the improvement in local VSTM within the central visual field is due primarily to semantic cues, we would expect the differences in local VSTM performance for coherent versus scrambled images witnessed in Experiments 1a and 1b to be substantially diminished when the images are inverted in orientation and/or color (i.e., we would expect a significant three-way interaction between test image coherence, eccentricity, and inversion). On the other hand, if the improvement is due primarily to non-semantic configural cues, we would expect these differences to persist even for inverted images (i.e., no significant interaction). 
Method
The method was identical to Experiments 1a and 1b with the following exceptions. Experiment 1c was a 2 (test coherence: coherent or scrambled) × 3 (inversion type: orientation inverted, color inverted, or color and orientation inverted) × 3 (eccentricity: near, intermediate, or far) within-subject design. Color inversion was effected by taking the 8-bit complement of each RGB color channel. Mask images were scrambled for all conditions and inverted in orientation and/or color in correspondence with the test images for each condition. 
Results
The results are shown in Figure 6. First, we conducted a three-way within-subject ANOVA (test coherence × inversion type × target eccentricity) on recognition performance. As there was no main effect of the inversion type and no interactions involving the type of inversion approached significance (all Fs < 1), we collapsed across inversion type in our subsequent analysis. Figure 7 shows these results compared against the results of Experiment 1a
Figure 6
 
Recognition performance for images inverted in orientation (a), color (b), orientation and color (c), and pooled over inversion type (d). Lines represent maximum likelihood linear fits. Error bars represent ±1 SEM.
Figure 6
 
Recognition performance for images inverted in orientation (a), color (b), orientation and color (c), and pooled over inversion type (d). Lines represent maximum likelihood linear fits. Error bars represent ±1 SEM.
Figure 7
 
Recognition performance for Experiments 1a (color images) and 1c (pooled over inversion type). Lines represent maximum likelihood linear fits. Error bars represent ±1 SEM.
Figure 7
 
Recognition performance for Experiments 1a (color images) and 1c (pooled over inversion type). Lines represent maximum likelihood linear fits. Error bars represent ±1 SEM.
There was a main effect of target eccentricity ( F(2,18) = 24.55, p < .001). Although there was no main effect of test coherence for the inverted images ( F(1,9) = 1.86, p = .21), there was a significant interaction between test coherence and target eccentricity ( F(2,18) = 14.64, p < .001). Surprisingly, performance for the 15 deg target group was actually better when images were scrambled ( F(1,9) = 12.22, p = .007). 
While inversion appears to lower performance overall, there does not appear to be a qualitative difference between central and peripheral targets, or between coherent and scrambled images, in the effects of inversion on VSTM. To assess this objectively, we conducted a three-way within-subject ANOVA (inversion × test coherence × target eccentricity). As explained above, the inversion factor contrasted performance across the color image conditions in Experiment 1a and the pooled image inversion conditions in Experiment 1c. Although performance was worse overall for inverted images ( F(1,9) = 12.33, p = .007), there were no two-way interactions between inversion and eccentricity ( F(2,18) = 2.86, p = .08) or between inversion and coherence ( F(1,9) = 3.27, p = .10). Further, the three-way interaction was not significant ( F < 1), reflecting the lack of impact of inversion on the eccentricity pattern observed in Experiment 1a. These results suggest that the main effect of configural context on local VSTM in central vision is through mid-level configural cues rather than semantic cues. 
Linear regression analysis of percent correct against eccentricity also reveals little effect of inversion on the variation in performance with target eccentricity. As for non-inverted images, performance decreased as a function of eccentricity in the coherent condition ( r 2 = .35, F(1,28) = 14.96, p = .001), but not in the scrambled condition ( r 2 = .002, F(1,28) = 0.07, p = .80). Based on the prediction equations ( Table 1), the regression lines cross at 12.8 deg, not too far off the 15.0 deg crossing estimated in Experiment 1a. The correlation coefficients for the color, monochrome, and inverted coherent test images ( r = .60, .77, .59, respectively) were not statistically different ( p > .05). For direct comparison, Figure 8 compares recognition performance for coherent conditions of Experiments 1a, 1b, and 1c
Figure 8
 
Recognition performance for coherent test images in Experiments 1a, 1b, and 1c. Lines represent maximum likelihood linear fits. Error bars represent ±1 SEM.
Figure 8
 
Recognition performance for coherent test images in Experiments 1a, 1b, and 1c. Lines represent maximum likelihood linear fits. Error bars represent ±1 SEM.
Figure 8 suggests a steady decline in performance with eccentricity for coherent scenes rather than a sudden drop outside the fovea. To show this in more detail, in Figure 9 we show performance for Experiment 1 plotted for all target eccentricities present in the image. The decline in performance as a function of eccentricity appears to be roughly linear over the range tested. 
Figure 9
 
Recognition performance for coherent test images in Experiments 1a, 1b, and 1c as a function of all target eccentricities. Error bars represent ±1 SEM.
Figure 9
 
Recognition performance for coherent test images in Experiments 1a, 1b, and 1c as a function of all target eccentricities. Error bars represent ±1 SEM.
Discussion
The fact that inverting images in color and/or orientation reduced performance overall suggests that semantic information, or at least familiarity, plays a role in the representation of natural scenes in memory. However, the absence of any interactions with inversion suggests that the main effect of configural context on local VSTM in central vision is through mid-level configural cues (e.g., shape, figure/ground, spatial layout) rather than semantic cues. 
In this experiment, we obtained the surprising result that for the most eccentric targets, local VSTM performance is actually better for scrambled images. While not reaching significance, there was a trend in the same direction for Experiments 1a and 1b as well. 
There are several possible explanations for this finding. One possibility is that for these peripheral targets, discrimination is based upon low-level cues (e.g., color and luminance statistics) derived from the central portion of the image. While for coherent images the statistics of peripheral targets may deviate substantially from the central statistics, for scrambled images the statistics in the center and the periphery are the same due to randomization of location. Thus, the centrally weighted statistics may be better predictors of peripheral targets for scrambled images than for coherent images, leading to better performance for eccentric targets presented within scrambled images. 
A second possible explanation for this effect is a difference in the allocation of spatial attention between the coherent and the scrambled conditions. It is possible that in the coherent condition the observer's attention is focused toward the central portion of the image, where the subject of the photograph may be expected to lie (Buswell, 1935; Mannan, Ruddock, & Wooding, 1997). In the scrambled condition, observers may distribute their attention more evenly over the whole image. 
A related but distinct explanation relates to the actual distribution of salient content over the photographic images used in the experiment. One reason for the decline in performance in the periphery may be that the most salient content of a natural image typically lies near the center of the frame (photographic bias). Since scrambling the image spreads this salient content uniformly over the visual field, peripheral targets may actually be easier to recall on average when the image is scrambled. We explore this issue in our next experiment. 
While all three of these explanations may play a role, we note that none of them can entirely explain the pattern of results for scrambled and coherent test images, as they would not explain why, for randomly selected targets, there is a large overall decrease in performance for scrambled images. The weight of evidence thus continues to suggest a substantial role for mid-level configural cues in local VSTM. 
Experiment 2: Photographic bias
In Experiment 1, we found that local VSTM performance declined substantially with eccentricity for coherent but not for scrambled images. We have suggested that this may be due to the selective use of global configural context for local VSTM within the central visual field. Here we explore a second factor, which may also play a role in this central bias. 
We used stock photographs in Experiment 1, and the fixation mark was positioned to be in the center of these images. It seems likely that the art of photographic composition will induce systematic differences between the central and the peripheral portions of a photograph. For example, photographs are often composed so that subjects are near the center of the frame ( Figure 10). Thus, VSTM performance may have been better in the central visual field for coherent scenes in part because that was where the more distinctive or interesting content of the photograph lay. In contrast, for scrambled scenes, salient imagery would be evenly distributed across the image and therefore at all eccentricities. In fact, there might be more salient content in the visual periphery in the scrambled condition than in the coherent condition, which might make peripheral targets more memorable in the scrambled condition than in the coherent condition, possibly explaining the superior performance for peripheral targets presented in scrambled scenes observed in Experiment 1c
Figure 10
 
A typical professionally composed photograph.
Figure 10
 
A typical professionally composed photograph.
The objective of Experiment 2 is to evaluate the role of this potential second factor. By employing natural visual images cropped from both central and peripheral regions of high-resolution stock photographs, we decouple the observer eccentricity of the local targets from the image eccentricity (normalized distance in the image from its original center). We will use a multiple regression analysis to determine the relative importance of these two factors. If photographic bias contributes to the eccentricity effect with coherent test images, we would expect this effect to disappear or reverse when the salient image content is in the periphery. 
Method
Participants
Five male and five female observers between the ages of 22 and 30 (mean = 25) received Can$8 an hour for their participation. All observers were naïve to the purpose of the experiment and had normal or corrected-to-normal vision. 
Stimuli
All images in this experiment were derived from 20 CDs in the Corel Photo Library. The central 720 × 720 pixel portion of each photograph (originally, 1024 × 1536 pixels) was extracted to derive the test and the mask images. 
For each condition, the test image was a 360 × 360 pixel section taken from the center, top left, bottom left, top right, or bottom right of the original image ( Figure 11). Novel photographs were used for each test image. As in Experiment 1, each stimulus subtended 31 × 31 deg visual arc, and an 8 × 8 black two-pixel lattice accentuated the 64 potential target blocks. Example stimuli for each condition, based on the image in Figure 11, are shown in Figure 12
Figure 11
 
Partitioning of original image into five test images.
Figure 11
 
Partitioning of original image into five test images.
Figure 12
 
Example stimuli for each condition.
Figure 12
 
Example stimuli for each condition.
For each condition, mask images were derived in the same manner as the test images for that condition. Because the number of images in our high-resolution database was limited, a set of 10 images were randomly selected as mask images prior to the experiment. For each trial, a mask image was randomly selected from this set of 10 and randomly scrambled. 
In order to ensure a uniform distribution of locations, we used a slightly different method for selecting target blocks in this experiment. For each condition, all 64 possible locations were selected twice, for a total of 128 trials per condition. The order of presentation was random. Distractor blocks were randomly selected from novel images derived from the Corel database in the same way as the corresponding test images. 
Procedure
The procedure was the same as for Experiment 1. Each blocked condition consisted of 10 practice and 128 test trials. 
Results
The distance of the center of the target in the test image from an observer's fixation point ranged from 2.5 to 17.8 deg (Observer Eccentricity). The normalized distance of a target center (in the test image) relative to the center of the original image ranged from .07 to 1 (Normalized Image Eccentricity). 
Test images drawn from the center of the original image overlap with part of each of the off-center test images ( Figure 13). As a result, a target location in this overlap region has a single image eccentricity but multiple observer eccentricities depending on the condition of the experiment. In Figure 13, the lightest-gray and darkest-gray regions each consist of 24 target block locations with mean normalized image eccentricity of 0.2 and 0.4, respectively. The mean observer eccentricity for these groups varies by condition. The gray square indicates the fixation point for the center condition, and the gray circles indicate the fixation points for each of the off-center conditions. For the center condition, the average observer eccentricity for the light and the dark gray groups was 6.65 and 14.65 deg, respectively. For the off-center conditions, this is reversed: The average observer eccentricity for the light and the dark gray group is 14.65 and 6.65 deg, respectively. This decoupling of observer and image eccentricity allows us to disambiguate the effects of these factors on recognition performance. 
Figure 13
 
Disambiguation of observer and image eccentricity. The square for the center condition and the circles for remaining conditions indicate fixation locations for each condition relative to the original image.
Figure 13
 
Disambiguation of observer and image eccentricity. The square for the center condition and the circles for remaining conditions indicate fixation locations for each condition relative to the original image.
Recognition performance for each observer and image eccentricity group is presented in Figure 14. A two-way within-subject ANOVA revealed a main effect of observer eccentricity: Performance was better for targets closer to fixation ( F(1,9) = 21.73, p = .001). There was, however, also a main effect of image eccentricity: Performance was better for targets nearer the center of the photograph ( F(1,9) = 9.87, p = .01). There was no interaction between the two factors ( F(1,9) = .39, p = .55). In summary, recognition performance decreases as both observer and image eccentricity increases. 
Figure 14
 
Recognition performance as a function of normalized image and observer eccentricity. Error bars represent ±1 SEM.
Figure 14
 
Recognition performance as a function of normalized image and observer eccentricity. Error bars represent ±1 SEM.
The foregoing analysis is convenient in that it illustrates the effect of both observer and image eccentricity while the other factor is held constant. However, it is somewhat limited in that image eccentricity is not allowed to vary over its full extent. Thus, to quantify the effect of observer and image factors in our experiments, we conducted a multiple linear regression analysis over observer and image eccentricity using the data from all four conditions. The two factors were found to account for a significant portion of the variance in performance ( r = .24, r 2 = .06, adjusted r 2 = .06, F(2,427) = 13.4, p < .001). The partial regression coefficients ( Table 2) were significant for both observer eccentricity, r = −.19, t(427) = −4.1, p < .001, and image eccentricity, r = −.11, t(427) = −2.3, p = .02. The effect of observer eccentricity was found to be 1.7 times larger than the effect of image eccentricity. 
Table 2
 
Prediction equations. Note: * P = Predicted performance (%), x = Observer eccentricity (deg), y = Normalized image eccentricity, P z = Standardized performance, Z x = Standardized observer eccentricity, and Z y = Standardized image eccentricity.
Table 2
 
Prediction equations. Note: * P = Predicted performance (%), x = Observer eccentricity (deg), y = Normalized image eccentricity, P z = Standardized performance, Z x = Standardized observer eccentricity, and Z y = Standardized image eccentricity.
Prediction equations*
P = 87.4 − .55 x − 5.7 y
P z = −0.19z x − 0.11z y
Discussion
In Experiment 1, we found that local VSTM performance declined substantially with eccentricity for coherent but not for scrambled images. The results of Experiment 2 suggest that roughly one third of this effect is due to image eccentricity, while the remaining two thirds is due to observer eccentricity. The influence of image eccentricity may result from greater salience or distinctiveness of image content near the center of the photograph. Given the absence of any decline in the periphery for scrambled images in Experiment 1, the effects of observer eccentricity seem most likely due to a selective reliance on global configural context within the central visual field. 
Experiment 3: Coherence detection
The results of Experiment 1 have led us to propose that local VSTM depends upon global configural context, but only within the central visual field. This interaction between coherence and eccentricity in VSTM could be due to encoding, storage, or retrieval processes. One possible explanation at the encoding stage is that coherent configural cues cannot even be visually detected in the periphery. To assess this possibility, our final experiment employs a task in which participants are asked to detect scrambled patches within otherwise coherent images. Detection of local incoherence is measured as a function of the eccentricity of the incoherent patch. 
The results of this experiment will help to identify the stage of processing that limits our ability to exploit configural cues for peripheral VSTM. A finding that detection of incoherence diminishes in the periphery would support the notion that the bottleneck in exploiting configural cues for peripheral VSTM lies in the encoding stage. A finding that incoherence detection is relatively good across the visual field would suggest that the bottleneck may lie in storage or retrieval stages. 
Method
Participants
Six male and four female observers from York University with normal or corrected-to-normal vision received Can$10 each for participation in this experiment. 
Apparatus
The apparatus was identical to that used in Experiment 1
Stimuli
We employed the same image database used in Experiment 1. On each trial the test image was with equal probability either completely coherent or contained a 3 × 3 block scrambled patch ( Figure 15). The scrambled patch was centered at 2.5, 7.6, or 12.7 deg eccentricity, with equal probability. 
Figure 15
 
Examples of two possible test stimuli: coherent and partially scrambled.
Figure 15
 
Examples of two possible test stimuli: coherent and partially scrambled.
Design
Each condition consisted of 40 completely coherent test images and 40 partially scrambled test images. There were four possible locations for each eccentricity that were each sampled 10 times per condition ( Figure 16). We employed both coherent and scrambled mask images, blocked in random order with test images randomly selected on each trial. For each block, each participant had 24 practice trials and 240 experimental trials. Each scene was viewed once by each subject. Thus, Experiment 3 was a 2 (mask image: coherent or scrambled) × 3 (eccentricity: 2.5, 7.6, and 12.7 deg) within-subject design. 
Figure 16
 
Each square grid represents a test image divided into 64 blocks. Cross represents fixation relative to the test image. For each condition, the scrambled patch was centered at either 2.5, 7.6, or 12.7 deg eccentricity. For each eccentricity, there were four possible patch locations.
Figure 16
 
Each square grid represents a test image divided into 64 blocks. Cross represents fixation relative to the test image. For each condition, the scrambled patch was centered at either 2.5, 7.6, or 12.7 deg eccentricity. For each eccentricity, there were four possible patch locations.
Procedure
Since pilot testing indicated that performance was at chance with the 70.6 ms presentation time used in Experiments 1 and 2, we increased presentation time to 150 ms. Each trial began with a 200 ms fixation period, followed by the presentation of the test image for 150 ms, the mask image for 500 ms, and a response screen displayed until a response was made ( Figure 17). Observers indicated whether the test image was partially scrambled. Feedback was provided as in Experiments 1 and 2
Figure 17
 
Stimulus sequence.
Figure 17
 
Stimulus sequence.
Results
Given the potential for subjective bias toward reporting either coherence or incoherence, we report performance using a d′ measure. A full factorial repeated measures ANOVA with mask type and eccentricity as factors showed that incoherence detection was unaffected by the coherence of the mask ( F(1,9) = 2.40, p = .17). There was, however, a main effect of eccentricity: Incoherence sensitivity decreased as eccentricity increased ( F(2,18) = 25.70, p < .001). To further analyze this effect, we collapsed across mask conditions ( Figure 18). 
Figure 18
 
Detection performance as a function of the eccentricity of the incoherent region (averaged over mask conditions). Error bars indicate ±1 SEM.
Figure 18
 
Detection performance as a function of the eccentricity of the incoherent region (averaged over mask conditions). Error bars indicate ±1 SEM.
We conducted a linear regression analysis to determine the eccentricity at which participants could not distinguish between coherent and scrambled sections. Recognition performance decreased as block eccentricity increased, r 2 = .44, F(1,28) = 22.31, p < .001. 
The prediction equation is as follows: d′ = 1.09 − 0.08 x, where x = target eccentricity (deg). Based on this equation, observers become unable to do the task ( d′ = 0) at an eccentricity of 14.0 deg. This falls within the range of eccentricities (12.8–15.1 deg) at which coherence no longer improved recognition performance in Experiment 1. Thus, the results of our final experiment add support to the hypothesis that the visual system is only able to extract coherent configural cues useful for local recognition from the central portion of the visual field. 
Discussion
Detection of image incoherence is difficult in the periphery, even when stimulus duration is increased. It should be noted that at least some of this effect could be attributable to image eccentricity as well as observer eccentricity. For example, it is possible that peripheral portions of the image are more homogeneous and therefore do not change as much in appearance when scrambled. However, given the results of Experiment 2 showing the observer eccentricity factor to be roughly 1.7 times stronger than the image eccentricity factor, it seems likely that difficulty in detecting incoherence in the periphery is largely due to the observer factor. This supports the hypothesis that the lack of difference in local VSTM performance between coherent and scrambled scenes in the periphery ( Experiment 1) occurs at the perceptual encoding level, specifically due to the visual system's inability to extract coherent configural cues from peripheral portions of the visual field. 
General discussion
The purpose of this study was to better understand the processes underlying visual short-term memory (VSTM) for local properties of natural scenes across the visual field. We were particularly interested in identifying any degradation in performance as a function of target eccentricity and whether this degradation is due to low-level or higher-level factors. 
To this end, we used a local VSTM recognition paradigm involving both coherent and scrambled natural images. We reasoned that if low-level factors could completely explain VSTM recognition performance, variations in performance across the visual field should be similar for coherent and scrambled scenes, as neither the local cues present in the target nor the luminance and color statistics of the image as a whole are altered by scrambling. If, however, higher-level factors play a role, we would expect better VSTM recognition performance for central targets embedded in coherent scenes containing contextual configural cues compared to central targets embedded in scrambled scenes, which lack such cues. Since the sensitivity of mechanisms underlying configural coding appears to decline in the periphery (e.g., Hess & Dakin, 1997), we would expect that this benefit of coherence would diminish for eccentric targets. 
Using this paradigm, we found in Experiments 1a and 1b that (1) for coherent images, performance declines steadily with the eccentricity of the target, whereas (2) for scrambled images, performance is above chance but is unaffected by the eccentricity of the target. Thus, while local VSTM appears to benefit from coherent configural context for targets in the central visual field, this benefit declines steadily to vanish outside the central 30 deg of the visual field. Crucially, Experiment 2 showed that, while the decline in performance with eccentricity specific to coherent images may be partially due to the greater salience of image content near fixation (photographic bias), the majority of the effect is due to observer factors. Together, these findings suggest that low-level factors cannot fully explain variations in VSTM performance over the visual field. 
Experiment 1c revealed an overall decrease in performance when images were inverted in orientation and/or color to reduce access to semantic cues, suggesting that semantic information, or at least familiarity, does play some role in local VSTM. However, we also found that this manipulation did not affect the variation in performance with eccentricity: as for normal images, scrambling inverted images reduced performance only within the central visual field. Together, these findings suggest that, within the central visual field, local VSTM for natural scenes relies to a substantial degree on mid-level configural cues (e.g., shape, figure/ground, spatial layout). 
VSTM is generally characterized as highly limited in capacity (Luck & Vogel, 1997; Miller, 1956; Phillips, 1974; Vogel et al., 2001). Miller (1956) argued that chunking smaller subunits into larger meaningful units effectively overcomes the limited capacity of short-term memory stores. This is consistent with our inference that VSTM for natural scenes relies on global configural context over the central visual field. However, Luck and Vogel (Luck & Vogel, 1997; Vogel et al., 2001) argued that the basic unit of chunking in VSTM is an object. If this is the case, scrambling may lower local VSTM performance by fragmenting the objects in the image. Given that many of our images contained large regions of texture (e.g., forest foliage, sky, water, etc.) rather than well-defined objects, it seems likely that the effect is more general than this. For example, scrambling of “background” portions of the image may disrupt the extraction of scene layout information important for scene encoding (Hochberg, 1968, 1978, 1986). 
Hess and Dakin (1997) argued that curvilinear contour binding mechanisms may be limited to a central region of the visual field within 10 deg of the fovea. Given the large differences in our paradigms and stimuli, these results seem in reasonable agreement with our own, and it may be that these contour mechanisms play a key role in extracting the configural structure required to extract an efficient representation of a natural scene. 
Our first two experiments do not identify whether the failure to exploit configural cues in the periphery is due to a breakdown at the encoding, storage, or retrieval stages of processing. Our third and final experiment showed that our ability to detect incoherence drops rapidly in the periphery, suggesting that the limiting factor may be an inability to even detect and encode configural cues in the periphery. 
In addition to configural and semantic cues, low-level factors are also clearly contributing directly to VSTM since performance for scrambled images is well above chance. However, the fact that performance for scrambled images was found not to vary with the eccentricity of the target suggests that these low-level cues are likely to be statistical properties of the image context as opposed to properties of the target block per se. 
We found color to be as important for peripheral targets as for targets near fixation. Given the known falloff in cone density and color sensitivity in the periphery (Curcio et al., 1987; Mullen, 1991; Newton & Eskew, 2003), this finding also points to the possibility that VSTM for peripheral targets is largely based upon statistical properties of the image as a whole rather than properties of the specific target block. 
Conclusions
Our goal in this paper has been to better understand how visual short-term memory for natural scenes varies over the visual field. We measured local recognition performance for natural scene content as a function of eccentricity, for both coherent and scrambled natural scenes. We found that while spatial coherence substantially increases recognition rates within the central visual field, the benefit of spatial coherence vanishes in the periphery. While a part of this central benefit derives from photographic bias that places more memorable content near the center of the frame, the majority of the effect is due to superior performance of the central visual mechanisms underlying recognition and memory. This central advantage for coherent but not for scrambled scenes suggests that the perceptual organization mechanisms involved in the grouping and coding of global spatial configuration may be largely confined to the central visual field. 
Acknowledgments
We thank the editor and reviewers for their insightful comments and suggestions on the previous draft. We also thank Ricardo Tabone and Bob Yhou for programming the experiments. This research was supported in part by grants from the Natural Sciences and Engineering Research Council of Canada and Precarn IRIS. 
Commercial relationships: none. 
Corresponding authors: Ljiljana Velisavljević and James H. Elder. 
Emails: lvelisavljevic@gmail.com and jelder@yorku.ca. 
Address: Centre for Vision Research, York University, Rm. 003G, Computer Science Building, 4700 Keele Street, North York, Ontario, Canada M3J 1P3. 
References
Antes, J. R. (1977). Recognizing and localizing features in brief picture presentations. Memory & Cognition, 5, 155–161. [CrossRef] [PubMed]
Antes, J. R. Metzger, R. L. (1980). Influences of picture context on object recognition. Acta Psychologica, 44, 21–30. [CrossRef]
Antes, J. R. Penland, J. G. Metzger, R. L. (1981). Processing global information in briefly presented pictures. Psychological Research, 43, 277–292. [PubMed] [CrossRef] [PubMed]
Bennett, P. J. Banks, M. S. (1987). Sensitivity loss in odd-symmetric mechanisms and phase anomalies in peripheral vision. Nature, 326, 873–876. [PubMed] [CrossRef] [PubMed]
Biederman, I. (1972). Perceiving real-world scenes. Science, 177, 77–80. [PubMed] [CrossRef] [PubMed]
Biederman, I. Mezzanotte, R. J. Rabinowitz, J. C. Francolini, C. M. Plude, D. (1981). Detecting the unexpected in photointerpretation. Human Factors, 23, 153–164. [PubMed] [PubMed]
Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433–436. [PubMed] [CrossRef] [PubMed]
Buswell, G. T. (1935). How people look at pictures: A study of the psychology and perception in art. Chicago: University of Chicago Press.
Campbell, F. W. Green, D. G. (1965). Optical and retinal factors affecting visual resolution. The Journal of Physiology, 181, 576–593. [PubMed] [Article] [CrossRef] [PubMed]
Cowey, A. Rolls, E. T. (1974). Human cortical magnification factor and its relation to visual acuity. Experimental Brain Research, 21, 447–454. [PubMed] [CrossRef] [PubMed]
Curcio, C. A. Sloan, Jr., K. R. Packer, O. Hendrickson, A. E. Kalina, R. E. (1987). Distribution of cones in human and monkey retina: Individual variability and radial asymmetry. Science, 236, 579–582. [PubMed] [CrossRef] [PubMed]
Daniel, P. M. Whitteridge, D. (1961). The representation of the visual field on the cerebral cortex in monkeys. The Journal of Physiology, 159, 203–221. [PubMed] [Article] [CrossRef] [PubMed]
De Monasterio, F. M. Gouras, P. (1975). Functional properties of ganglion cells of the rhesus monkey retina. The Journal of Physiology, 251, 167–195. [PubMed] [Article] [CrossRef] [PubMed]
Epstein, R. DeYoe, E. A. Press, D. Z. Rosen, A. C. Kanwisher, N. (2001). Neuropsychological evidence for a topographical learning mechanism in parahippocampal cortex. Cognitive Neuropsychology, 18, 481–508. [CrossRef] [PubMed]
Epstein, R. Kanwisher, N. (1998). A cortical representation of the local visual environment. Nature, 392, 598–601. [PubMed] [CrossRef] [PubMed]
Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A, Optics and Image Science, 4, 2379–2394. [PubMed] [CrossRef] [PubMed]
Fincham, E. F. (1951). The accomodation reflex and its stimulus. British Journal of Ophthalmology, 35, 381–393. [PubMed] [Article] [CrossRef] [PubMed]
Goffaux, V. Jacques, C. Mouraux, A. Oliva, A. Schyns, P. G. Rossion, B. (2005). Diagnostic colors contribute to the early stages of scene categorization: Behavioural and neuropsychological evidence. Visual Cognition, 12, 878–892. [CrossRef]
Grill-Spector, K. Kourtzi, Z. Kanwisher, N. (2001). The lateral occipital complex and its role in object recognition. Vision Research, 41, 1409–1422. [PubMed] [CrossRef] [PubMed]
Harris, J. P. Fahle, M. (1996). Differences between fovea and periphery in detection and discrimination of spatial offsets. Vision Research, 36, 3469–3477. [PubMed] [CrossRef] [PubMed]
Henderson, J. M. Hollingworth, A. (1999). The role of fixation position in detecting scene changes across saccades. Psychological Science, 10, 438–443. [CrossRef]
Hess, R. F. Dakin, S. C. (1997). Absence of contour linking in peripheral vision. Nature, 390, 602–604. [PubMed] [CrossRef] [PubMed]
Hochberg, J. Haber, R. N. (1968). In the mind's eye. Contemporary theory and research in visual perception. New York: Holt, Rinehart, & Winston.
Hochberg, J. (1978). Perception. Englewood Cliffs, NJ: Prentice Hall.
Hochberg, J. Boff,, K. J. Kaufman,, L. Thomas, J. P. (1986). Representation of motion and space in video and cinematic displays. Handbook of perception and human performance. (1, pp. 1–64). New York: Wiley.
Intraub, H. (1981). Rapid conceptual identification of sequentially presented pictures. Journal of Experimental Psychology. Human Perception and Performance, 7, 604–610. [CrossRef]
Intraub, H. (1984). Conceptual masking: The effects of subsequent visual events on memory for pictures. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 115–125. [PubMed] [CrossRef] [PubMed]
Klein, R. (1982). Patterns of perceived similarity cannot be generalized from long to short exposure durations and vice-versa. Perception & Psychophysics, 32, 15–18. [PubMed] [CrossRef] [PubMed]
Loftus, G. R. (1972). Eye fixations and recognition memory for pictures. Cognitive Psychology, 3, 525–551. [CrossRef]
Loftus, G. R. Ginn, M. (1984). Perceptual and conceptual masking of pictures. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 435–441. [PubMed] [CrossRef] [PubMed]
Loftus, G. R. Hanna, A. M. Lester, L. (1988). Conceptual masking: How one picture captures attention from another picture. Cognitive Psychology, 20, 237–282. [PubMed] [CrossRef] [PubMed]
Loschky, L. C. McConkie, G. W. Yang, J. Miller, M. E. (2005). The limits of visual resolution in natural scene viewing. Visual Cognition, 12, 1057–1092. [CrossRef]
Luck, S. J. Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 390, 279–281. [PubMed] [CrossRef] [PubMed]
Mannan, S. K. Ruddock, K. H. Wooding, D. S. (1997). Fixation sequences made during visual examination of briefly presented 2D images. Spatial Vision, 11, 157–178. [PubMed] [CrossRef] [PubMed]
Metzger, R. L. Antes, J. R. (1983). The nature of processing early in picture perception. Psychological Research, 45, 267–274. [PubMed] [CrossRef] [PubMed]
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. [PubMed] [CrossRef] [PubMed]
Mullen, K. T. (1991). Colour vision as a post-receptoral specialization of the central visual field. Vision Research, 31, 119–130. [PubMed] [CrossRef] [PubMed]
Nakamura, K. Kawashima, R. Sato, N. Nakamura, A. Sugiura, M. Kato, T. (2000). Functional delineation of the human occipito-temporal areas related to face and scene processing: A PET study. Brain, 123, 1903–1912. [PubMed] [Article] [CrossRef] [PubMed]
Nelson, W. W. Loftus, G. R. (1980). The functional visual field during picture viewing. Journal of Experimental Psychology: Human Learning and Memory, 6, 391–399. [PubMed] [CrossRef] [PubMed]
Newton, J. R. Eskew, Jr., R. T. (2003). Chromatic detection and discrimination in the periphery: A postreceptoral loss of color sensitivity. Visual Neuroscience, 20, 511–521. [PubMed] [CrossRef] [PubMed]
Oliva, A. Schyns, P. G. (1997). Coarse blobs or fine edges Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive Psychology, 34, 72–107. [PubMed] [CrossRef] [PubMed]
Oliva, A. Schyns, P. G. (2000). Diagnostic colors mediate scene recognition. Cognitive Psychology, 41, 176–210. [PubMed] [CrossRef] [PubMed]
Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. [PubMed] [CrossRef] [PubMed]
Phillips, W. A. (1974). On the distinction between sensory storage and short-term visual memory. Perception & Psychophysics, 16, 283–290. [CrossRef]
Potter, M. C. (1975). Meaning in visual search. Science, 187, 965–966. [PubMed] [CrossRef] [PubMed]
Potter, M. C. (1976). Short-term conceptual memory for pictures. Journal of Experimental Psychology: Human Learning and Memory, 2, 509–522. [PubMed] [CrossRef] [PubMed]
Ranganath, C. DeGutis, J. D'Esposito, M. (2004). Category-specific modulation of inferior temporal activity during working memory encoding and maintenance. Cognitive Brain Research, 20, 37–45. [PubMed] [CrossRef] [PubMed]
Rayner, K. Pollatsek, A. (1992). Eye movements and scene perception. Canadian Journal of Psychology, 46, 342–376. [PubMed] [CrossRef] [PubMed]
Renninger, L. W. Malik, J. (2004). When is scene identification just texture recognition? Vision Research, 44, 2301–2311. [PubMed] [CrossRef] [PubMed]
Rock, I. (1974). The perception of disoriented figures. Scientific American, 230, 78–85. [PubMed] [CrossRef] [PubMed]
Sanocki, T. (2003). Representation and perception of scenic layout. Cognitive Psychology, 47, 43–86. [PubMed] [CrossRef] [PubMed]
Sanocki, T. Epstein, W. (1997). Priming spatial layout of scenes. Psychological Research, 8, 374–378.
Schyns, P. G. Oliva, A. (1994). From blobs to boundary edges: Evidence for time and spatial scale dependent scene recognition. Psychological Science, 5, 196–200. [CrossRef]
Shore, D. I. Klein, R. M. (2000). The effects of scene inversion on change blindness. Journal of General Psychology, 127, 27–43. [PubMed] [CrossRef] [PubMed]
Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs. General and Applied, 74, 1–29. [CrossRef]
Steeves, J. K. Humphrey, G. K. Culham, J. C. Menon, R. S. Milner, A. D. Goodale, M. A. (2004). Behavioral and neuroimaging evidence for a contribution of color and texture information to scene classification in a patient with visual form agnosia. Journal of Cognitive Neuroscience, 16, 955–965. [PubMed] [CrossRef] [PubMed]
Strasburger, H. Harvey, Jr., L. O. Rentschler, I. (1991). Contrast thresholds for identification of numeric characters in direct and eccentric view. Perception & Psychophysics, 49, 495–508. [PubMed] [CrossRef] [PubMed]
Subramaniam, S. Biederman, I. Madigan, S. (2000). Accurate identification but no priming and chance recognition memory for pictures in RSVP sequences. Visual Cognition, 7, 511–535. [CrossRef]
Velisavljevic, L. Elder, J. H. (2002). What do we see in a glance [Abstract]. Journal of Vision, 2, (7):493, [CrossRef] [PubMed]
Virsu, V. Rovamo, J. (1979). Visual resolution, contrast sensitivity and the cortical magnification factor. Experimental Brain Research, 37, 475–494. [PubMed] [CrossRef] [PubMed]
Virsu, V. Rovamo, J. Laurinen, P. Näsänen, R. (1982). Temporal contrast sensitivity and cortical magnification. Vision Research, 22, 1211–1217. [PubMed] [CrossRef] [PubMed]
Vogel, E. K. Woodman, G. F. Luck, S. J. (2001). Storage of features, conjunctions, and objects in working memory. Journal of Experimental Psychology: Human Perception and Performance, 27, 92–114. [PubMed] [CrossRef] [PubMed]
Wiesel, T. N. (1960). Receptive fields of ganglion cells in the cat's retina. The Journal of Physiology, 153, 583–594. [PubMed] [Article] [CrossRef] [PubMed]
Figure 1
 
Example coherent (A) and scrambled (B) test images.
Figure 1
 
Example coherent (A) and scrambled (B) test images.
Figure 2
 
Stimulus sequence.
Figure 2
 
Stimulus sequence.
Figure 3
 
Three target location groups, clustered by eccentricity.
Figure 3
 
Three target location groups, clustered by eccentricity.
Figure 4
 
Recognition performance for color images (black lines) and monochrome images (gray lines). Lines represent maximum likelihood linear fits. Error bars represent ±1 standard error of the mean ( SEM).
Figure 4
 
Recognition performance for color images (black lines) and monochrome images (gray lines). Lines represent maximum likelihood linear fits. Error bars represent ±1 standard error of the mean ( SEM).
Figure 5
 
Images inverted in orientation (a and b), color (c and d), and orientation and color (e and f).
Figure 5
 
Images inverted in orientation (a and b), color (c and d), and orientation and color (e and f).
Figure 6
 
Recognition performance for images inverted in orientation (a), color (b), orientation and color (c), and pooled over inversion type (d). Lines represent maximum likelihood linear fits. Error bars represent ±1 SEM.
Figure 6
 
Recognition performance for images inverted in orientation (a), color (b), orientation and color (c), and pooled over inversion type (d). Lines represent maximum likelihood linear fits. Error bars represent ±1 SEM.
Figure 7
 
Recognition performance for Experiments 1a (color images) and 1c (pooled over inversion type). Lines represent maximum likelihood linear fits. Error bars represent ±1 SEM.
Figure 7
 
Recognition performance for Experiments 1a (color images) and 1c (pooled over inversion type). Lines represent maximum likelihood linear fits. Error bars represent ±1 SEM.
Figure 8
 
Recognition performance for coherent test images in Experiments 1a, 1b, and 1c. Lines represent maximum likelihood linear fits. Error bars represent ±1 SEM.
Figure 8
 
Recognition performance for coherent test images in Experiments 1a, 1b, and 1c. Lines represent maximum likelihood linear fits. Error bars represent ±1 SEM.
Figure 9
 
Recognition performance for coherent test images in Experiments 1a, 1b, and 1c as a function of all target eccentricities. Error bars represent ±1 SEM.
Figure 9
 
Recognition performance for coherent test images in Experiments 1a, 1b, and 1c as a function of all target eccentricities. Error bars represent ±1 SEM.
Figure 10
 
A typical professionally composed photograph.
Figure 10
 
A typical professionally composed photograph.
Figure 11
 
Partitioning of original image into five test images.
Figure 11
 
Partitioning of original image into five test images.
Figure 12
 
Example stimuli for each condition.
Figure 12
 
Example stimuli for each condition.
Figure 13
 
Disambiguation of observer and image eccentricity. The square for the center condition and the circles for remaining conditions indicate fixation locations for each condition relative to the original image.
Figure 13
 
Disambiguation of observer and image eccentricity. The square for the center condition and the circles for remaining conditions indicate fixation locations for each condition relative to the original image.
Figure 14
 
Recognition performance as a function of normalized image and observer eccentricity. Error bars represent ±1 SEM.
Figure 14
 
Recognition performance as a function of normalized image and observer eccentricity. Error bars represent ±1 SEM.
Figure 15
 
Examples of two possible test stimuli: coherent and partially scrambled.
Figure 15
 
Examples of two possible test stimuli: coherent and partially scrambled.
Figure 16
 
Each square grid represents a test image divided into 64 blocks. Cross represents fixation relative to the test image. For each condition, the scrambled patch was centered at either 2.5, 7.6, or 12.7 deg eccentricity. For each eccentricity, there were four possible patch locations.
Figure 16
 
Each square grid represents a test image divided into 64 blocks. Cross represents fixation relative to the test image. For each condition, the scrambled patch was centered at either 2.5, 7.6, or 12.7 deg eccentricity. For each eccentricity, there were four possible patch locations.
Figure 17
 
Stimulus sequence.
Figure 17
 
Stimulus sequence.
Figure 18
 
Detection performance as a function of the eccentricity of the incoherent region (averaged over mask conditions). Error bars indicate ±1 SEM.
Figure 18
 
Detection performance as a function of the eccentricity of the incoherent region (averaged over mask conditions). Error bars indicate ±1 SEM.
Table 1
 
Prediction equations for linear regression of percent correct against eccentricity (deg). Note: * P = performance (% correct); x = target probe eccentricity (deg).
Table 1
 
Prediction equations for linear regression of percent correct against eccentricity (deg). Note: * P = performance (% correct); x = target probe eccentricity (deg).
Condition Prediction equation*
Experiment 1a: Color and coherent P = 91.41 − 1.14 x
Experiment 1a: Color and scrambled P = 71.66 + .17 x
Experiment 1b: Monochrome and coherent P = 87.72 − 1.77 x
Experiment 1b: Monochrome and scrambled P = 60.98 + .05 x
Experiment 1c: Inverted and coherent P = 88.91 − 1.48 x
Experiment 1c: Inverted and scrambled P = 70.66 − 0.05 x
Table 2
 
Prediction equations. Note: * P = Predicted performance (%), x = Observer eccentricity (deg), y = Normalized image eccentricity, P z = Standardized performance, Z x = Standardized observer eccentricity, and Z y = Standardized image eccentricity.
Table 2
 
Prediction equations. Note: * P = Predicted performance (%), x = Observer eccentricity (deg), y = Normalized image eccentricity, P z = Standardized performance, Z x = Standardized observer eccentricity, and Z y = Standardized image eccentricity.
Prediction equations*
P = 87.4 − .55 x − 5.7 y
P z = −0.19z x − 0.11z y
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×