Free
Research Article  |   March 2005
The dynamics of visual pattern masking in natural scene processing: A magnetoencephalography study
Author Affiliations
Journal of Vision March 2005, Vol.5, 10. doi:10.1167/5.3.10
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Jochem W. Rieger, Christoph Braun, Heinrich H. Bülthoff, Karl R. Gegenfurtner; The dynamics of visual pattern masking in natural scene processing: A magnetoencephalography study. Journal of Vision 2005;5(3):10. doi: 10.1167/5.3.10.

      Download citation file:


      © 2016 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements
Abstract

We investigated the dynamics of natural scene processing and mechanisms of pattern masking in a scene-recognition task. Psychophysical recognition performance and the magnetoencephalogram (MEG) were recorded simultaneously. Photographs of natural scenes were briefly displayed and in the masked condition immediately followed by a pattern mask. Viewing the scenes without masking elicited a transient occipital activation that started approximately 70 ms after the pattern onset, peaked at 110 ms, and ended after 170 ms. When a mask followed the target an additional transient could be reliably identified in the MEG traces. We assessed psychophysical performance levels at different latencies of this transient. Recognition rates were reduced only when the additional activation produced by the pattern mask overlapped with the initial 170 ms of occipital activation from the target. Our results are commensurate with an early cortical locus of pattern masking and indicate that 90 ms of undistorted cortical processing is necessary to reliably recognize a scene. Our data also indicate that as little as 20 ms of undistorted processing is sufficient for above-chance discrimination of a scene from a distracter.

Introduction
Studies on masking
Pattern masking is widely used to study the dynamics of information processing. The earliest reports of visual masking date back to the 19th century (Baxt, 1871). Since then masking has been extensively used to study the dynamics of visual perception and information processing (Gegenfurtner & Rieger, 2000; Grill-Spector, Kushnir, Hendler, & Malach, 2000; Sperling, 1960) (for reviews, see Breitmeyer & Ogmen, 2000; Bachmann, 1994; Breitmeyer, 1984; Kahnemann, 1968). However, the neuronal basis of masking and the effect of the mask on the processing of the target are poorly understood (Breitmeyer & Ogmen, 2000; Enns & Di Lollo, 2000). One of the first electroencephalography (EEG)-studies of visual masking, in which the authors attempted to investigate the dynamics of the target processing, was conducted by Donchin, Wicke, and Lindsley (1963). The authors recorded the EEG from an occipital electrode while they presented weak flashes of light followed by strong flashes at the same location, and measured the perceptual impact of the strong flash on the weak leading flash. They hypothesized that the activation was simply the arithmetic sum of the separate surface potentials from the two flashes, and compared the trace produced by the combined stimuli with the sum of the traces produced when these stimuli were presented alone. For short target-mask stimulus-onset asynchronies (SOAs), observers were unable to distinguish the two flashes, and the measured EEG and the synthetic trace were similar. The authors conclude that at SOAs too short to perceive the leading flash, “the evoked potentials for the second stimulus displace those for the first stimulus” and that the visual percept correlates with the changes in the visually evoked potential (VEP). Schiller and Corover (1966) challenged these results. In a metacontrast masking paradigm, where the mask followed the target at an adjacent location, they found that the masking VEP did not correlate with the visual percept. Several subsequent studies have attempted to find some neurophysiological correlate of the phenomenal experience of simple, masked target stimuli in humans (e.g., Donchin & Lindsley 1965; Kaitz, Monitz, & Nesher, 1985; Schwartz & Pritchard, 1981; Vaughan & Silverstein, 1968). The results of these EEG-studies provide no clear picture of the neural correlates of masking and the way the mask changes the processing of the target. This might be because different kinds of masking were used that may have different neuronal substrates (Breitmeyer, 1984; Enns & Di Lollo, 2000), or because assumptions about the additivity of target and mask activity in the EEG-traces were incorrect (Donchin & Lindsley 1965; Kaitz et al., 1985; Schwartz & Pritchard, 1981). 
In addition to human EEG experiments, there are several recent neurophysiological masking studies in monkeys. Employing metacontrast masking, Macknik and Living-stone (1998) found a suppression of a late after-discharge in V1 neurons. The authors interpret this effect as a neuronal correlate of metacontrast masking. Other groups (Kovacs, Vogels, & Orban, 1995; Rolls & Tovee, 1994) have recorded from shape-selective neurons in macaque monkey temporal cortex. Kovacs et al. (1995) pattern masked simple shapes (stars, letters, and gratings) and found a reduction in the shape sensitivity of inferior temporal (IT) neurons at short target-mask intervals due to a reduced response duration. Rolls and Tovee (1994) recorded from face selective neurons in the superior temporal sulcus (STS). They used complex stimuli such as faces or letters as pattern masks. Similar to Kovacs et al. they found that the backward mask reduces the firing duration of the neurons. They conclude that the mask interrupts the processing of the STS neurons. In addition, they found that at SOAs between 60–100 ms, there are distinct responses to two successive stimuli. 
However, in both studies the locus of the interaction between the face or shape and the mask remains unclear. Rolls and Tovee (1994) argue that the basis of masking could be either lateral inhibition of face selective neurons by neurons that are activated by non-face stimuli or reduced information transfer from earlier visual areas (Rolls, Tovee, & Panzeri, 1999). According to these two hypotheses, the locus of the interaction could be respectively in temporal cortex or at earlier processing stages. Keyser and Perrett (2002) suggest a third mechanism, competition between stimuli presented in rapid succession and close spatial proximity. Although they do not explicitly specify the location where the competition occurs they seem to assume that the masking of complex stimuli occurs as a result of competition in higher visual areas such as STS. They predict that stimuli producing a stronger activation should win the competition with a weaker stimulus. 
Single cell studies of visual masking provide information in great temporal and anatomical detail, but they are restricted to looking at one location at a time. Grill-Spector et al. (2000) used functional magnetic resonance imaging (fMRI) to study the effects of the presentation duration of natural scenes in a backward masking paradigm. They compared brain activations in different visual areas with psychophysical recognition rates. In this study, a reduction of the scene-mask SOA to 120 ms had only a weak effect on the activity in primary visual cortex (V1) and in the lateral occipital complex (LOC), but shorter presentation durations reduced the activity in LOC more strongly than in V1. At 20 ms the activation in LOC approached the base-line, while in V1 the activation was still 70% of the activation produced by the unmasked presentation. These results are in accord with the single cell studies: Masking reduces the activity in higher visual areas but the effects in earlier visual areas are weak. 
Masking as a tool to study the dynamics of visual processing
It is known that natural scenes are processed very quickly and efficiently in the visual system (Allison, Puce, Spencer, & McCarthy, 1999; Bötzel & Grüsser, 1989; Thorpe, Fize, & Marlot, 1996). EEG studies have often concentrated on the latency of object category specific brain activity. The results of Thorpe et al. (1996) indicate that an animal in a natural scene can be detected by the human visual system within 150 ms. Faces elicit specific EEG components only 170–200 ms after the onset of the face-image on a screen (Allison et al., 1999; Bötzel & Grüsser, 1989). Only 60 ms of processing is sufficient to encode basic stimulus attributes such as color into short-term memory (Gegenfurtner & Rieger, 2000). Unfortunately, most of the EEG studies on scene and object processing have been performed without masking the stimuli and with an unlimited processing time, making it difficult to draw conclusions about the dynamics of the processing of objects or natural scenes. The studies of Kovacs et al. (1995), Rolls and Tovee (1994), and Grill-Spector et al. (2000) show that masking can be used to study the dynamics of the processing of complex visual stimuli because the mask imposes constraints on the processing of the target that can be exactly controlled by the timing of mask. 
In the study described here we investigate the pattern backward masking of natural scenes with human subjects using integrated electrophysiological and psychophysical measurements. We recorded brain activity while subjects performed a scene-recognition task (Figure 1). In our paradigm, an abstract pattern mask followed a scene at one of several SOAs. We extracted the initial transient activation from the scene-mask transition and compared it to the brain activity recorded during the unmasked processing of the scene. We also measured psychophysical recognition performance at the various SOAs. By comparing these measures, we were able to evaluate both the dynamics of the processing of the scene and the dynamics of the target-mask interaction. Our approach allowed us to circumvent the strong assumption that the brain activity measured on the surface of the skull is the arithmetic sum of the activations produced by the target and mask when they are presented separately. From our results, we derived a simple model for pattern masking. 
Figure 1
 
In the psychophysical paradigm each trial started with a fixation spot. The target photograph was presented at one of several durations (stimulus-onset asynchronies, or SOAs), the time between target and mask onset. There was no gap between target and mask, and the mask remained on the screen for 500 ms. In the subsequent query phase the target photograph was presented together with a distracter photograph. Each photograph was used only once in the experiment.
Figure 1
 
In the psychophysical paradigm each trial started with a fixation spot. The target photograph was presented at one of several durations (stimulus-onset asynchronies, or SOAs), the time between target and mask onset. There was no gap between target and mask, and the mask remained on the screen for 500 ms. In the subsequent query phase the target photograph was presented together with a distracter photograph. Each photograph was used only once in the experiment.
Methods
Psychophysics
We measured recognition performance for natural scenes as a function of presentation duration. Recognition was tested in a two-alternative forced-choice match-to-sample task (Figure 1). Each trial started with a fixation spot displayed at the center of a projection screen for a random duration between 1 and 1.4 s. Then a digitized photograph of a natural scene (target) was displayed for a short duration (between 25 and 500 ms). Except for the unmasked condition, a pattern mask immediately followed the scene and displayed for 500 ms. In the unmasked condition, the scene was presented for 500 ms. Following the mask presentation, a query screen with two images was presented. One of the images was the target and the other a distracter. The subject had to decide which of the two images was the target. The query screen remained visible until the subjects gave a response by moving a finger on their left or right hand, which triggered a photoelectric switch. After the response, a blank screen was displayed for 1 s, and the next trial started automatically with the fixation point. 
To avoid possible confounding learning effects, each image was shown only once. The images were chosen from a commercially available database (Corel PhotoDisc). The mask images consisted of randomly placed and sized color parallelograms. The parallelograms drawn in the foreground were smaller than the ones drawn on the background, so that a single large parallelogram could not form a large uniform surface. The colors in the mask were randomly chosen from the target and the distracter colors. A new mask was generated on each trial. Pilot experiments indicated that the mask was very effective. The rooted mean square (RMS) contrast was on average approximately 15% higher than the contrast of the scenes. The recognition threshold was determined by fitting a cumulative Gaussian function to the data. 
Stimulus presentation
The pictures were projected on a screen by means of a DLP-projector (model ddv810, Liesegang) with 800 × 600 pixels resolution and a refresh rate of 72 Hz. The subject was seated 0.9 m away from the screen, and the pictures (384 × 256 pixel) were 27.5 deg of visual angle wide and 18.8-deg high. The background of the presentation screen was gray with an average luminance of 32 cd/m2, which was the average luminance of the images. The pictures were presented at a 16-bit color depth, and the voltage-to-intensity function of each of the three projector primaries was linearized by means of a lookup table. 
Presentations were controlled by a personal computer running MS-DOS. The start of the target and mask presentation was monitored with photodiodes attached to the upper edge of the screen. A small white square was presented at this location while the target was on the screen. The signal from the photodiodes was recorded in parallel with the magnetoencephalogram (MEG) to provide a precise temporal marker for the appearance of target and mask on the screen. 
Physiological measurements
While the subjects performed the psychophysical task, event-related magnetic fields (EMFs) were recorded with a 151-channel whole head MEG system (CTF, model Omega) located in a magnetically shielded chamber. The DLP-projector was installed outside the shielded chamber, and the picture was projected via a hole in the wall of the chamber using a mirror system on the back of a transparent projection screen. The MEG recording started 200 ms before the target appeared on the screen, and the data were collected during a 700-ms period. The signal from the target photodiode served as a trigger for computing the EMF averages. Baseline activity was defined as the average activity during the 200 ms of pre-trigger recording. The MEG sampling rate was 625 Hz. During the experiment, the head of the subject was stabilized with a system of posts. 
Trials containing signals exceeding 1.5*1012 T, caused by eye movements or muscle contractions, were excluded from the data analysis. The time series was filtered with a 45-Hz low-pass FIR-filter. 
Subjects
Eight subjects participated in the experiment. All had normal or corrected-to-normal visual acuity. Subjects were paid for participation and gave their informed consent before the start of the experiment. 
The eight subjects were split into two groups of four subjects. Each group was tested in four stimulus conditions. In the first condition, only the target was presented for 500 ms without a mask (target-only condition). In the second condition, only a mask was presented without a preceding target (mask-only condition). In the third and fourth conditions, the target was presented for a short duration with a mask following immediately (target and mask conditions). For group A, the target durations (SOA) in the target and mask conditions were 37 ms and 92 ms. For group B, the target durations were 24 ms and 60 ms. Note that every subject was tested in conditions one (target-only) and two (mask-only). The four stimulus and mask conditions were distributed across two groups because each condition was repeated 120 times, which would have made the duration of a testing session unacceptably long if all six conditions had been included. 
The 480 measurements for each subject were split into three blocks of approximately equal length. Over these three blocks, the four trial types were randomly interleaved. Subjects had a short break between the blocks but were required to remain in the MEG. 
Results
Psychophysics
Figure 2 shows the recognition rate as a function of presentation duration. For every presentation duration, data were averaged across all subjects. In the mask-only condition (labeled as 0-ms SOA), the subjects had no information about a target, and consequently the recognition rate is close to the 50% chance level (46.8%, averaged over all eight subjects). When the target is presented without a subsequent mask (labeled as 500-ms SOA), the recognition rate is close to perfect (98.1%, averaged over all eight subjects). The recognition rate is significantly reduced when the SOA is 60 ms or shorter, t(3) = 8.8, p<.005, and only 24-ms SOA was sufficient to obtain a recognition rate that is significantly above guessing performance, t(3) = 3.15, p < .05. To achieve the threshold level recognition performance (75% correct), a SOA of 44 ms was necessary. 
Figure 2
 
The psychophysical recognition performance. The performances for mask-only and scene-only presentations are plotted at 0-ms SOA and 500-ms SOA, respectively. The first significant reduction in recognition performance occurs when the SOAis reduced to 60 ms, and the 75% correct response criterion threshold is reached with a 44-ms SOA. There is still significantly better than guessing performance with a SOA of only 24 ms. Error bars indicate 95% confidence intervals.
Figure 2
 
The psychophysical recognition performance. The performances for mask-only and scene-only presentations are plotted at 0-ms SOA and 500-ms SOA, respectively. The first significant reduction in recognition performance occurs when the SOAis reduced to 60 ms, and the 75% correct response criterion threshold is reached with a 44-ms SOA. There is still significantly better than guessing performance with a SOA of only 24 ms. Error bars indicate 95% confidence intervals.
Physiological measurements
The goal in our analysis was to track a temporal marker that signals the transition from scene to mask in the combined presentation of scene and mask. In the top row of Figure 3 we plotted the raw data of the magnetic fields evoked by the onset of the mask alone for the first two subjects measured in both groups (EMFs). The time series from all channels are overlaid. The data of both subjects show a first small maximum approximately 75-ms after-mask onset and a second higher one after approximately 100 ms (gray highlighted). The first maximum is most likely the magnetic equivalent of the CI that reflects the initial activation in retinotopic striate cortex (Martinez, DiRusso, Allno-Vento, Sereno, Buxton, et al., 2001). The maximum of the second deflection is the magnetic equivalent of the P100 that includes activity from retinotopic extrastriate visual areas (Martinez et al., 2001). 
Figure 3
 
Top row: Single subject averaged event-related magnetic field (EMF) time courses in all 151 sensors elicited by the presentation of the mask alone. Data from the first subject measured in each group are shown. The data show an initial small deflection that peaks after 75 ms and a second higher deflection that peaks after 100 ms (gray highlighted). Then an intermediate minimum occurs after this second deflection. The maxima of the deflections are restricted to a subset of sensor traces. Second and third rows: The EMF difference between masked and unmasked scene presentations [(scene and mask) — scene]. This difference should reveal higher activation elicited by the transition from the scene to the mask. The difference waves are initially quite flat but show a deflection after a certain latency (gray highlighted). The difference waves for the short and long SOAs are plotted in the second and third row, respectively. The SOAs are given in the legend of each plot. Longer SOAs prolong the latency of this deflection. This indicates that the initial deflection in the EMF difference waves reflects activation that is elicited by the scene-mask transition. Again the effects appear to be restricted to a subset of sensors. The latencies of the scene-mask transition seem to be preserved in the difference waves but the form of the difference waves shows only limited similarity with the EMFs produced by the mask alone. The amplitudes of the difference waves are clearly reduced. This indicates that it is unlikely that simple subtraction approaches would correctly reveal residual scene activation after masking or that summation approaches would produce a meaningful prediction of the brain activity.
Figure 3
 
Top row: Single subject averaged event-related magnetic field (EMF) time courses in all 151 sensors elicited by the presentation of the mask alone. Data from the first subject measured in each group are shown. The data show an initial small deflection that peaks after 75 ms and a second higher deflection that peaks after 100 ms (gray highlighted). Then an intermediate minimum occurs after this second deflection. The maxima of the deflections are restricted to a subset of sensor traces. Second and third rows: The EMF difference between masked and unmasked scene presentations [(scene and mask) — scene]. This difference should reveal higher activation elicited by the transition from the scene to the mask. The difference waves are initially quite flat but show a deflection after a certain latency (gray highlighted). The difference waves for the short and long SOAs are plotted in the second and third row, respectively. The SOAs are given in the legend of each plot. Longer SOAs prolong the latency of this deflection. This indicates that the initial deflection in the EMF difference waves reflects activation that is elicited by the scene-mask transition. Again the effects appear to be restricted to a subset of sensors. The latencies of the scene-mask transition seem to be preserved in the difference waves but the form of the difference waves shows only limited similarity with the EMFs produced by the mask alone. The amplitudes of the difference waves are clearly reduced. This indicates that it is unlikely that simple subtraction approaches would correctly reveal residual scene activation after masking or that summation approaches would produce a meaningful prediction of the brain activity.
In the second and third row of Figure 3 we show the EMF difference waves between the masked and the unmasked scene presentations for the same subjects at the different SOAs. The visual stimulation in both conditions is initially the same until the display switches from the scene to the mask. Thus, it can be expected that the EMF difference waves fluctuate randomly until the transition from the scene to the mask leads to a change in brain activity. As can be seen, the difference curves are relatively flat up to a certain latency at which a sudden increase of the differential activity occurs (gray highlighted). The latency of this effect increases with increasing scene-mask SOA. The maxima of the initial EMF deflections elicited by the mask alone (Figure 3, top row) and in the difference plots (Figure 3, second and third rows) appear only in a subset of the channel traces. 
The spatial distribution of the magnetic fields (Figure 4) shows that the 100-ms EMF maximum elicited by the mask was concentrated over the occipital sensors in each subject, and we suspected that this maximum is a good candidate to determine the dynamics of brain activity caused by the scene-mask transition at different SOAs. For the subsequent analysis we used a subset of 29 occipital sensors that covered the location of the spatial activation peaks (Figure 4) and used the same group of sensors for all subjects in the data analysis. The reduction of the sensor set focuses the analysis to the location of occipital cortex where the earliest signals from the scene-mask transition can be expected, and it increases the signal-to-noise ratio. 
Figure 4
 
The distribution of magnetic fields over the heads of all subjects 105 ms after the onset of the mask when the occipital activation peaked. Blue indicates magnetic field vectors emanating from the skull, and in the red areas the vectors point toward the skull. Yellow and bright blue indicate a stronger magnetic flux. The white dots represent the sensor locations. The marked sensors (thick white dots) were selected for the analysis.
Figure 4
 
The distribution of magnetic fields over the heads of all subjects 105 ms after the onset of the mask when the occipital activation peaked. Blue indicates magnetic field vectors emanating from the skull, and in the red areas the vectors point toward the skull. Yellow and bright blue indicate a stronger magnetic flux. The white dots represent the sensor locations. The marked sensors (thick white dots) were selected for the analysis.
We rectified the data in the sensor subset and then averaged over sensors and subjects in both groups (e.g., Fylan, Holliday, Singh, Anderson, & Harding, 1997). This gives a measure of occipital brain activity over time. The rectification is required because the magnetic fields elicited by an active part of the brain are bipolar. Simple averaging of the negative and positive portions of the bipolar fields does not lead to meaningful results, or might even lead to a cancellation of the magnetic fields (Figure 4). 
The results of this analysis are plotted in Figure 5A and 5C for the mask-only and scene-only conditions for group 1 and 2, respectively. Both scene and mask evoke a strong activation that initially rises steeply and peaks around 100–110 ms after the images appeared on the screen. In group 2 an earlier but weaker intermediate activation peak is visible approximately 75 ms after the onset of the stimuli. This lower activation peak reflects the magnetic equivalent of the CI, the earliest measurable cortical activation in the EEG that is most likely generated in V1 (Martinez et al., 2001) and was already visible in the raw data (Figure 3, top row). In the data from group 1, a slight slope change after approximately 75 ms indicates the presence of the same activation. The latency of the higher 100-ms peak is some-what shorter for the mask (group 1: 99.2 ms; group 2: 105.6 ms; mean: 102.4 ms) than for the scene (group 1: 107.2 ms; group 2: 112 ms; mean: 109.6 ms). The difference in latency between the two groups of subjects is quite small (mask: 6.4 ms; scene: 4.8 ms), indicating that these latencies are reproducible between groups. The 100-ms peak elicited by the mask (2*10−13 T) is higher than that elicited by the scene (1.54*10−13 T). This difference is statistically significant according to the criterion of Rugg, Doyle, and Wells (1995). This criterion seeks to avoid spurious differences by counting only differences that include more than five consecutive samples with difference p values less than .01. It is clear that the initial occipital deflection that extends between approximately 70 ms (when the CI occurs) and 170 ms (when an intermediate minimum occurs) and peaks at ca. 100 ms has multiple underlying neuronal generators (Figure 3, top row). The width of the 100-ms peak elicited by the mask is 17.6 ms at 90% of the peak amplitude in the average over all subjects, and the width of the peak elicited by the scene is 25 ms. Our goal in the following analysis is to track this peak of the occipital activation from the mask as a temporal marker that signals the transition between scene and mask in the combined presentation. To extract the activation peak caused by the scene-mask transition in the combined presentation of scene and mask, we subtracted the activation produced by the scene alone in our sensor set from the activation patterns that occurred in the combined presentations. The peak caused by the scene-mask transition in this difference was used as a temporal marker for the beginning of the interaction between scene and mask. The latency of this temporal marker can be seen as an upper limit for the arrival of mask activity in early visual cortex. 
Figure 5
 
A and C: The time course of the signal strength in the selected posterior sensors in group 1 (A) and 2 (C) averaged over all subjects. Both the target scene and mask produce an initial activation peak approximately 100 ms after stimulus onset. The initial peak from the mask is higher than from the scene, indicating a stronger neural activation by the mask. The early 75-ms peak in the raw data(Figure 3, top row) is visible as a lower peak in C, and as a slope change in A. B and D: The time courses of the signal strength for the scene alone and the combined scene-mask presentations at different SOAs for group 1 (B) and 2 (D). Signal time courses are shown for scene-only (red), scene-mask SOAs of 37 ms (B, blue), 92 ms (B, green), 24 ms (D, blue), and 60 ms (D, green). The dash-dotted difference curves were calculated by subtracting the “scene-only” activation from the activity elicited by the combined scene-mask presentation (e.g., blue curve — red curve). The positive difference peaks (marked by the arrows) are clearly recognizable (e.g., at 133 ms in the 37-ms SOA difference curve and at 192 ms in the 92-ms SOA difference curve). However, by looking, for example, at the 92-ms SOA time course (B, green) and the mask-only time course (A, black), it is clear that the activation strength measured in the combined scene-mask presentation is not simply the sum of the activations obtained in the separate presentations.
Figure 5
 
A and C: The time course of the signal strength in the selected posterior sensors in group 1 (A) and 2 (C) averaged over all subjects. Both the target scene and mask produce an initial activation peak approximately 100 ms after stimulus onset. The initial peak from the mask is higher than from the scene, indicating a stronger neural activation by the mask. The early 75-ms peak in the raw data(Figure 3, top row) is visible as a lower peak in C, and as a slope change in A. B and D: The time courses of the signal strength for the scene alone and the combined scene-mask presentations at different SOAs for group 1 (B) and 2 (D). Signal time courses are shown for scene-only (red), scene-mask SOAs of 37 ms (B, blue), 92 ms (B, green), 24 ms (D, blue), and 60 ms (D, green). The dash-dotted difference curves were calculated by subtracting the “scene-only” activation from the activity elicited by the combined scene-mask presentation (e.g., blue curve — red curve). The positive difference peaks (marked by the arrows) are clearly recognizable (e.g., at 133 ms in the 37-ms SOA difference curve and at 192 ms in the 92-ms SOA difference curve). However, by looking, for example, at the 92-ms SOA time course (B, green) and the mask-only time course (A, black), it is clear that the activation strength measured in the combined scene-mask presentation is not simply the sum of the activations obtained in the separate presentations.
The time courses of occipital brain activation elicited by the combined presentation of scene and mask for subject group 1 and 2 (see subjects section above) are shown in Figure5B and 5D, respectively. The activation caused by the scene alone is shown by the red curve in this figure. The dotted lines in the figure represent the difference curves. In these difference curves, the initial activation peak from the mask was selected according to two criteria: Because the 100-ms activation from the mask alone was the highest activation in the examined 300-ms time interval, the peak should be the highest and positive. This time interval was chosen because the longest SOA between target and mask was 92 ms, and the latency of the peak activity from the mask peak alone was around 100 ms. Both criteria were defined a priori by looking at the brain activation elicited by target and mask separately. Note that we do not assume that the activation from target and mask combines in a strictly additive way. The analysis assumes only that the transition from the scene to the mask produces a higher activation in the combined presentation. 
We used a correlation analysis to test the assumption that the difference maxima we extracted at the various SOAs reflect the initial activation peak due to the mask. The latency of the difference peak can be predicted by adding the latency of the initial activation peak in the mask-alone presentation and the SOA. For example, the initial activation peak caused by the mask alone occurred at 102.4 ms. With a target-mask SOA of 37 ms, the difference peak should occur at 139.4 ms. In Figure 6 we plot the latency of the initial difference maximum as a function of the SOAs we used. The correlation coefficient is r = 0.989; t(2) = 9.5, p<.01. The predicted slope of the regression is 1; the experimentally obtained slope is 1.04. This indicates that the difference maxima we extracted do in fact reflect the initial activation peak caused by the scene-mask transition. 
Figure 6
 
The measured latency of the difference peaks plotted against the respective SOA (black dots). The dash-dotted line represents the prediction for the latency of the difference peak(see text for an explanation). The prediction was calculated as the sum of the latency of the mask-only peak (see Figure 5A and C) and the respective SOA.
Figure 6
 
The measured latency of the difference peaks plotted against the respective SOA (black dots). The dash-dotted line represents the prediction for the latency of the difference peak(see text for an explanation). The prediction was calculated as the sum of the latency of the mask-only peak (see Figure 5A and C) and the respective SOA.
The next question we asked was what part of the brain activity associated with the processing of the target could be affected by the mask. The gray square in Figure 6 marks the temporal range in which additional activation from the mask was associated with a significant reduction in psychophysical recognition performance. It can be seen that the reduction in recognition performance occurs when the initial activation from the mask overlaps the processing of the target within the first 170 ms after the onset of the target on the screen. 
In Figure 7 the information from the physiological recordings and psychophysical measurements is merged to explore the dynamics of the interaction between the pattern mask and scene. The effects of the mask activation within the time course of target processing become clearer when recognition performance is plotted as a function of the latency of the mask-activation peak and overlaid on the activation in the undistorted scene processing. Accordingly, in Figure 7 the height of the bars indicates recognition performance at the target-mask SOAs shown on the upper abscissa, and the latency of the activation peak produced by these masks is shown on the lower abscissa (e.g., 160-ms latency at a 60-ms SOA; also see Figure 5). The red curve in this graph is plotted against the same timescale and shows the average time course of the occipital activation produced by the unmasked natural scenes. The first activity peak produced by the processing of the scene begins at approximately 70 ms when the activation rises steeply, crests at 110 ms, and then descends to a local minimum at 172 ms. With mask SOAs of 24–60 ms, the mask activity peak over-laps with this scene activity peak. This interval coincides with the interval in which there is a significant reduction in recognition performance, with the shorter SOAs producing both the most overlap and greatest reduction in recognition performance. Interestingly, at a 92-ms target-mask SOA, where we found no reduction in recognition performance, the mask-generated peak had a latency of 190 ms and did not overlap with the initial scene-activation deflection. Our data suggest that psychophysical recognition performance is reduced when the first activation peak produced by the mask overlaps with the first activation deflection generated by the scene. The greater this overlap, the more recognition performance is reduced. Activity related to the processing of the scene that occurs later than approximately 170 ms after the scene onset does not seem to be vulnerable to the initial activation elicited by the pattern mask in our recognition paradigm. 
Figure 7
 
The overlap between the neural activity produced by the scene and mask. The red curve represents the activation elicited by the processing of the scene alone averaged over all subjects (plotted against the red axes). Note the conspicuous deflection between approximately 70 ms and 170 ms that peaks at 110 ms. The height of the bars indicates the recognition performance (the black right ordinate) at the different mask SOAs (the upper black abscissa). The position of the bars on the lower red abscissa shows the latency of the activation peak in the difference wave for each of these SOAs. There is only a reduction in recognition performance when the mask activations overlap the scene-activation deflection. The error bars indicate the standard error of the mean.
Figure 7
 
The overlap between the neural activity produced by the scene and mask. The red curve represents the activation elicited by the processing of the scene alone averaged over all subjects (plotted against the red axes). Note the conspicuous deflection between approximately 70 ms and 170 ms that peaks at 110 ms. The height of the bars indicates the recognition performance (the black right ordinate) at the different mask SOAs (the upper black abscissa). The position of the bars on the lower red abscissa shows the latency of the activation peak in the difference wave for each of these SOAs. There is only a reduction in recognition performance when the mask activations overlap the scene-activation deflection. The error bars indicate the standard error of the mean.
Discussion
Scene processing dynamics and scene-pattern mask interaction
We investigated the dynamics of the processing of natural scenes by disturbing the processing of the scene after various temporal intervals with a pattern mask. The comparison of psychophysically determined recognition rates and the brain activity measured in the MEG allowed us to specify an interval in which the arrival of new information in visual cortex (e.g., by a pattern mask) disturbs the cortical processing of a scene (Figure 6). In our scene-recognition task this period begins 70 ms after the onset of the scene, when the first cortical visual activation can be observed in early visual cortex, and lasts up to 170 ms. After 170 ms, activation peaks produced by the scene-mask transition have no detrimental effect on scene-recognition performance. The initial MEG deflections produced by the target and mask certainly reflect neuronal activity in multiple visual areas. The activation was located over occipital sensors and most likely includes the CI (N75) and the P100 components. The early part of this deflection, including the peak around 100 ms, is thought to be generated in retinotopic visual areas early in the processing hierarchy. This has been shown in MEG and with event-related potential studies using intracranial recordings (Arroyo, Lesser, Poon, Webber, & Gordon, 1997) and source localizations (Martinez et al., 1999; Martinez et al., 2001; Rowley & Roberts, 1995; Shigeto, Tobimatsu, Yamamoto, Kobayashi, & Kato, 1998). We assume that the mask-induced activity peak we tracked in the difference waves reflects an upper limit for the time lag of the interaction between target and mask in early retinotopic visual areas. We make this assumption because the mask-induced deflection underlying that peak is extended in time, and it is possible that earlier components of the mask-induced activation [e.g., the CI (N75)] could interact with the processing of the target. 
Other arguments also speak for a scene-pattern mask interaction at early processing stages: The pattern mask has to be presented at the same spatial location as the scene photograph to be effective, and the low-level features (colors and orientation) of our mask and the scene are very similar (i.e., they cannot be coded in different neuronal populations in retinotopic areas). 
Generally, little is known about the nature of the information that can be extracted from a natural scene during these short processing intervals. Like others (e.g., Thorpe, 1996; Grill-Spector et al., 2000) we did not put special emphasis on this issue. Nevertheless, it appears highly likely to us that relatively high-level information was extracted in trials with successful recognition. Studies of object-specific processing indicate that during the first 150 ms of the cortical processing of an object the brain activity is driven by basic stimulus features, while activity after 170 ms is sensitive to object categories or task specific (Allison et al., 1999; Bötzel & Grüsser, 1989; Johnson & Olshausen, 2003; VanRullen & Thorpe, 2001). These reports are commensurate with the results from our correlation analysis, which suggests that only processes earlier than 170 ms are vulnerable to the activation generated by a pattern mask in our scene-recognition task. According to this interpretation, visual processing would reach a state after approximately 170 ms where it becomes independent of a low-level representation of the scene in early retinotopic visual areas. After this interval the visual system may have developed abstract representations of at least parts of the scene that are immune to interference by a simply structured mask. The neural substrate for the integration of information from earlier processing stages during the first 170 ms could be object-specific areas, such as area LOC. The BOLD-fMRI activity in area LOC correlates with the psychophysical scene-recognition performance (Grill-Spector et al., 2000). 
It seems unlikely to us that subjects used simple color information encoded during the scene presentation to distinguish between the scenes in the query phase at the critical short SOAs. Gegenfurtner and Rieger (2000) investigated the dynamics of the contribution of color to scene recognition with the identical paradigm. Their results indicate that, for example, color in natural scenes initially (during the first 30 ms) supports the image coding process, probably by easing the segmentation process. Longer presentation durations (longer than 60 ms) led to an advantage in retrieval, presumably by enhancing the representation in memory. In the actual study, we tried to force subjects to use as much information as possible by presenting a wide variety of scene types (e.g., landscapes, animals, flowers, indoor scenes, cars, food, people, etc.) in the query phase. This should keep subjects from developing strategies that are specific to the discrimination between a narrow range of scene classes (e.g., the mean scene color). 
Recent electrophysiological studies on the dynamics of visual processing of natural scenes (e.g., Johnson & Olshausen, 2003; Thorpe et al., 1996; VanRullen & Thorpe, 2001) used unmasked presentations to study the dynamics of scene processing and concentrated on the latency of brain activity that is specific for an object class. Thorpe et al. (1996) attribute a deflection that begins to differ from baseline 150 ms after the scene onset to object-specific processing in natural scenes. 
In these studies subjects had in principle extended processing times because visual persistence lengthens the availability of a visual stimulus far beyond the physical presentation duration (Sperling, 1960) when no mask follows the stimulus. This is indicated by the fact that the subjects could classify the scenes perfectly, although they were presented for only 20–40 ms. This makes it difficult to obtain information about the dynamics of the information extraction process. In the present experiment we interfered with the processing of the scene at different latencies with a pattern mask. A comparison of our psychophysical and physiological data indicated that only 20 ms of undistorted processing is sufficient to process enough information to discriminate two scenes better than chance. However, to reach maximum performance in our scene-recognition paradigm, 60 to about 90 ms of undistorted cortical scene processing was necessary. This parallels results from single cell recordings in macaque IT cortex and STS. Cells in these regions also show some shape-selective responses at target-mask SOAs as short as 20 ms (Kovacs et al., 1995; Rolls & Tovee, 1994). But the shape discrimination improves in single macaque IT cells only during the first 80 ms of their response (Kovacs et al., 1995). Rolls and Tovee (1999) found a similar interval in a face-discrimination task when they analyzed the information available in the spike trains of macaque IT cells at different target-mask SOAs in STS cells. The information available did not differ much between 100- and 60-ms integration duration, but dropped rapidly when the target-mask SOA was reduced below 60 ms. 
We cannot distinguish from our MEG data whether the analysis of the scene in the visual system is processed in a pure feed-forward manner (Thorpe et al., 1996) or if feedback loops play a role (Bullier, 2001; Enns & Di Lollo, 2000). It seems likely that during the first few tens of milli-seconds information is extracted dominantly via feed-forward pathways. Therefore, the significant recognition performance obtained at our shortest SOA (24 ms) might be based largely on feed-forward processing. On the other hand, the processing duration required to reach full recognition performance is sufficiently long to allow feedback connections to play a role (Bullier, 2001). 
Masking
Earlier EEG studies on the neural basis of visual masking in humans have led to diverging conclusions about the neuronal basis of masking (Donchin & Lindsley 1965; Enns & Di Lollo, 2000; Kaitz et al., 1985; Schwartz & Pritchard, 1981; Vaughan & Silverstein, 1968). Some of the discrepancies may be caused by the use of different types of masking, which may have different underlying neuronal mechanisms (Breitmeyer, 1984; Enns & Di Lollo, 2000). In addition, the approach to determine the residual target processing after masking might be problematic. This was done by subtracting the brain activity measured during mask-only presentations from the brain activity measured during combined presentations of target and mask (e.g., Donchin & Lindsley 1965; Kaitz et al., 1985; Schwartz & Pritchard, 1981). Inherent to this approach is the strong assumption that the activity measured at the scalp in the combined presentation of target and mask is simply the sum of the activity obtained when both stimuli are presented separately. It can be seen in Figure 3, Figure 5B, and 5D that this assumption may not be justified. Even the initial deflection in the difference waves has much lower amplitude than what would have been expected from the presentation of the mask alone. This suggests that the difference waves that are thought to reflect the residual activation from the target after masking can contain artificially created deflections due to the overestimation of the mask activation. In addition, the shape of the difference waves has only limited similarity with the waveform elicited by the isolated presentation of the mask. The interpretation of the difference waves is especially problematic if it is not known which proportion in the difference wave reflects target processing and when mask processing begins. We tried to circumvent this problem by making a weaker and falsifiable assumption. In our approach we tried to extract an early response to the pattern mask and tracked it over various target-mask onset asynchronies. Our data show that this is possible even for very short SOAs. This early mask response defines an upper limit for the time when the pattern mask begins to interfere with the processing of the target, and its latency can be compared to psychophysical performance and the undistorted target processing. 
It seems unlikely to us that a large amount of pattern backward masking occurs at precortical processing stages. The integration of target and pattern mask seems to occur predominantly at the cortical level as shown by, for example, Turvey (1973) in experiments using dichoptic masking. Presenting the target and the pattern mask to the two eyes separately leads to masking. This is different from masking by light, where both stimuli have to be presented to the same eye. 
Our results agree with a simple, physiologically motivated model for pattern backward masking of natural scenes. This model presumes the major processing stages that are contained in many other models of visual processing (Kosslyn, 1999; Lennie, 1998; Marr, 1982; Sperling, 1963). There is a representation of the scene in a visual buffer, whether it is called raw primal sketch (Marr, 1982) or iconic memory (Sperling, 1963), that encodes the physical properties of the stimulus. This representation is continuously analyzed by higher level processing stages and transformed into more and more abstract entities, such as surfaces or objects that are transferred into memory. In this model, the pattern mask would exert its detrimental effect by overwriting the template of the stimulus in the sensory buffer and replacing it with a representation of the mask. Similar physiological mechanisms were suggested by Bullier (2001) and Lamme and Roelfsema (2000). The overwriting of the sensory buffer is possible because the mask produces a much stronger initial activation peak than the target (Figure 5A and 5C) that is presumably due to the higher RMS contrast of the mask. Mask contrast has been shown to be an important determinant of the strength of a pattern mask (Turvey, 1973). Both masking by integration (Enns & Di Lollo, 2000; Turvey, 1973) and masking by competition (Keysers & Perrett, 2002) are compatible with this view. The sooner and more severe the disruption of the analysis of the template, the less information is extracted and the greater the reduction in recognition rates (Turvey, 1973). The visual buffer may be implemented in the retinotopic early visual areas (Bullier, 2001; Lamme & Roelfsema, 2000). This view is supported by studies in which the authors looked at the effects of masking at higher level visual-processing stages. A pattern mask decreased the neuronal responses to the target object and the information available in the spike train in macaque IT and STS when the SOA between target and pattern mask was reduced (Kovacs et al., 1995; Rolls & Tovee, 1994; Rolls et al., 1999). Keysers and Perrett (2002) and Keysers, Xiao, Foldiak, and Perrett (2001) found the same effect in STS neurons when the change rate in a rapid-serial-visual-presentation paradigm increased. In humans Grill-Spector et al. (2000) found a reduction of the BOLD-fMRI-activation in shape and object-specific brain areas, such as LOC and the fusiform face area, when the target-to-pattern mask SOAs were sufficiently short. The reduction in activation correlated with the decrease in recognition rate at short SOAs. 
Conclusions
In summary, we have shown that is possible to identify reliably in MEG time series data an initial activation peak evoked by the pattern mask in a combined presentation of a target and mask. By integrating psychophysical and physiological data, we found that the scene processing during the initial 170 ms after the scene onset seems to be vulnerable to new information from a pattern backward mask. Brain processes initiated later are independent of new information arriving in the early visual areas, at least in a recognition task like ours with the simple pattern mask. Only 20 ms of undistorted cortical processing of the scene was sufficient to discriminate a scene from a distracter at a rate that was better than chance, but between 60 ms and 90 ms were necessary to reach the maximal recognition performance. Our results are consistent with the assumption that the pattern mask overwrites a visual buffer that is presumably located in the early, retinotopic visual areas, and thereby disturbs the transfer of information to subsequent processing stages. 
Acknowledgments
We want to thank Robert Fendrich for helpful comments on the manuscript. JWR was supported by Land Sachsen-Anhalt Grant FKC0017IF0000 to the Magdeburg Leibniz program and Bundesmisterium für Bildung und Forschung Grant 01GO0202. 
Commercial relationships: none. 
Corresponding author: Jochem Rieger. 
Address: Leipzigerstr. 44, 39120 Magdeburg, Germany. 
References
Allison, T. Puce, A. Spencer, D. D. McCarthy, G. (1999). Electrophysiological studies of human face perception. I. Potentials generated in occipitotemporal cortex by face and non-face stimuli. Cerebral Cortex, 9(5), 415–430. [PubMed] [CrossRef] [PubMed]
Arroyo, S. Lesser, R. P. Poon, W.-T. Webber, R. W. S. Gordon, B. (1997). Neuronal generators of visual evoked potentials in humans: Visual processing in the human cortex. Epilepsia, 38, 600–610. [PubMed] [CrossRef] [PubMed]
Bachmann, T. (1994). Psychophysiology of visual masking. New York: Nova Science Publishers.
Baxt, N. (1871). Über die Zeit, welche nöthig ist , damit ein Gesichtsausdruck zum Bewußtsein kommt und über die Größe (Extension) der bewußten Wahrnehmung bei einem Gesichtseindrucke von gegebener Dauer. Archiv für die gesamte Physiologie, 4, 325–336. [PubMed] [CrossRef]
Bötzel, K. Grüsser, O. -J. (1989). Electric brain potentials evoked by pictures of faces and no-faces: Search for ‘face-specific’ EEG-potentials. Experimental Brain Research, 77, 349–360. [PubMed] [CrossRef] [PubMed]
Breitmeyer, B. G. (1984). Visual masking: An integrative approach (Vol. 4). Oxford: Oxford University Press.
Breitmeyer, B. G. Ogmen, H. (2000). Recent models and findings in backward visual masking: A comparison, review, and update, Perception & Psychophysics, 62, 1572–1595. [PubMed] [CrossRef] [PubMed]
Bullier, J. (2001). Integrated model of visual processing. Brain Research Reviews, 36, 96–107. [PubMed] [CrossRef] [PubMed]
Donchin, E. Wicke, J. D. Lindsley, D. B. (1963). Cortical evoked potentials and perception of paired flashes. Science, 141, 1285–1286. [PubMed] [CrossRef] [PubMed]
Donchin, E. Lindsley, D. B. (1965). Visually evoked response correlates of perceptual masking and enhancement. Electroencephalography and Clinical Neurophysiology, 19, 325–335. [PubMed] [CrossRef] [PubMed]
Enns, J. T. Di Lollo, V. (2000). What’s new in visual masking? Trends in Cognitive Sciences, 4(9), 345–352. [PubMed] [CrossRef] [PubMed]
Fylan, F. Holliday, I. E. Singh, K. D. Anderson, S. J. Harding, G. F. A. (1997). Magnetoencephalographic investigation of human cortical area V1 using colour stimuli. Neuroimage 6, 47–57. [PubMed] [CrossRef] [PubMed]
Gegenfurtner, K. R. Rieger, J (2000). Sensory and cognitive contributions of color to the recognition of natural scenes. Current Biology, 10, 805–808. [PubMed] [CrossRef] [PubMed]
Grill-Spector, K. Kushnir, T. Hendler, T. Malach, R. (2000). The dynamics of object-selective activation correlate with recognition performance in humans. Nature Neuroscience, 3(8), 837–843. [PubMed] [CrossRef] [PubMed]
Johnson, J. S. Olshausen, B.A. (2003). Timecourse of neural signals of object recognition. Journal of Vision, 3(7), 499–512. http://journalofvision.org/3/7/4/, doi: 10.1167/3.7.4. [PubMed][Article] [CrossRef] [PubMed]
Kahnemann, D. (1968). Methods, findings and theory in studies of perceptual masking. Psychological Bulletin, 6, 404–425. [PubMed] [CrossRef]
Kaitz, M. Monitz, J. Nesher, R. (1985). Electrophysiological correlates of visual masking. International Journal of Neuroscience, 28(3–4), 261–268. [PubMed] [CrossRef] [PubMed]
Keysers, C. Xiao, D. K. Foldiak, P. Perrett, D. I. (2001). The speed of sight. Journal of Cognitive Neuroscience, 13(1), 90–101. [PubMed] [CrossRef] [PubMed]
Keysers, C. Perrett, D. I. (2002). Visual masking and RSVP reveal neural competition. Trends in Cognitive Sciences, 6(3), 120–125. [PubMed] [CrossRef] [PubMed]
Kosslyn, S. M. (1999). If neuroimaging is the answer, what is the question. Proceedings of the Royal Society of London B, 354, 1283–1294. [PubMed] [CrossRef]
Kovacs, G. Vogels, R. Orban, G. A. (1995). Cortical correlate of backward masking. Proceedings of the National Academy of Sciences U.S.A., 92, 5587–5591. [PubMed][Abstract] [CrossRef]
Lamme, V. A. F. Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends in Neurosciences, 23, 571–579. [PubMed] [CrossRef] [PubMed]
Lennie, P. (1998). Single units and cortical organization. Perception, 27, 889–936. [PubMed] [CrossRef] [PubMed]
Macknik, S. L. Livingstone, M. S. (1998). Neuronal correlates of visibility and invisibility in the primate visual system. Nature Neuroscience, 1(2), 144–149. [PubMed] [CrossRef] [PubMed]
Marr, D. (1982). Vision. New York: W.H. Freeman.
Martinez, A. Anllo-Vento, L. Sereno, M. I. Frank, L. R. Buxton, R. B. Dubowitz, D. J. (1999). Involvement of striate and extrastriate visual cortical areas in spatial attention. Nature Neuroscience, 2, 364–369. [PubMed] [CrossRef] [PubMed]
Martinez, A. DiRusso, F. Allno-Vento, L. Sereno, M. I. Buxton, R. B. Hillyard, S. A. (2001). Putting spatial attention on the map: Timing and localization of stimulus selection processes in striate and extrastriate visual areas. Vision Research, 41, 1437–1457. [PubMed] [CrossRef] [PubMed]
Rolls, E. T. Tovee, M. J. (1994). Processing speed in the cerebral-cortex and the neurophysiology of visual masking. Proceedings of the Royal Society of London B, 257(1348), 9–15. [PubMed] [CrossRef]
Rolls, E. T. Tovee, M. J. Panzeri, S. (1999). The neurophysiology of backward visual masking: Information analysis. Journal of Cognitive Neuroscience, 11(3), 300–311. [PubMed] [CrossRef] [PubMed]
Rowley, H. A. Roberts, T. P. (1995). Functional localization by magnetoencephalography. Neuroimaging Clinics of North America, 5(4), 695–710. [PubMed] [PubMed]
Rugg, M. D. Doyle, M. C. Wells, T. (1995). Word and non-word repetition within and across modality: An event-related potential study. Journal of Cognitive Neuro-science, 7, 209–227. [CrossRef]
Schiller, P. H. Chorover, S. L. (1966). Metacontrast: Its relation to evoked potentials. Science, 153, 1398–1399. [PubMed] [CrossRef] [PubMed]
Schwartz, M. Pritchard, W. S. (1981). AERs and detection in tasks yielding U-shaped backward masking functions. Psychophysiology, 18, 678–685. [PubMed] [CrossRef] [PubMed]
Shigeto, H. Tobimatsu, S. Yamamoto, T. Kobayashi, T. Kato, M. (1998). Visual evoked cortical magnetic responses to checkerboard pattern reversal stimulation: a study on the neural generators of N75, P100 and N145. Journal of Neurological Sciences, 156(2), 186–194. [PubMed] [CrossRef]
Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs, 74, 1–29. [CrossRef]
Sperling, G. (1963). A model for visual memory tasks. Human Factors, 5, 19–31. [PubMed] [PubMed]
Thorpe, S. Fize, D. Marlot, S. (1996). The speed of processing in the human visual system. Nature, 381, 520–522. [PubMed] [CrossRef] [PubMed]
Turvey, M. T. (1973). On peripheral and central processes in vision: Inferences from an information-processing analysis of masking with patterned stimuli. Psychological Reviews, 80(1), 1–52. [PubMed] [CrossRef]
VanRullen, R. Thorpe, S. (2001). The time course of visual processing: From early perception to decision making. Journal of Cognitive Neuroscience, 13, 454–461. [PubMed] [CrossRef] [PubMed]
Vaughan, H. G. Silverstein, L. (1968). Metacontrast and evoked potentials: A reappraisal. Science, 160, 207–208. [PubMed] [CrossRef] [PubMed]
Figure 1
 
In the psychophysical paradigm each trial started with a fixation spot. The target photograph was presented at one of several durations (stimulus-onset asynchronies, or SOAs), the time between target and mask onset. There was no gap between target and mask, and the mask remained on the screen for 500 ms. In the subsequent query phase the target photograph was presented together with a distracter photograph. Each photograph was used only once in the experiment.
Figure 1
 
In the psychophysical paradigm each trial started with a fixation spot. The target photograph was presented at one of several durations (stimulus-onset asynchronies, or SOAs), the time between target and mask onset. There was no gap between target and mask, and the mask remained on the screen for 500 ms. In the subsequent query phase the target photograph was presented together with a distracter photograph. Each photograph was used only once in the experiment.
Figure 2
 
The psychophysical recognition performance. The performances for mask-only and scene-only presentations are plotted at 0-ms SOA and 500-ms SOA, respectively. The first significant reduction in recognition performance occurs when the SOAis reduced to 60 ms, and the 75% correct response criterion threshold is reached with a 44-ms SOA. There is still significantly better than guessing performance with a SOA of only 24 ms. Error bars indicate 95% confidence intervals.
Figure 2
 
The psychophysical recognition performance. The performances for mask-only and scene-only presentations are plotted at 0-ms SOA and 500-ms SOA, respectively. The first significant reduction in recognition performance occurs when the SOAis reduced to 60 ms, and the 75% correct response criterion threshold is reached with a 44-ms SOA. There is still significantly better than guessing performance with a SOA of only 24 ms. Error bars indicate 95% confidence intervals.
Figure 3
 
Top row: Single subject averaged event-related magnetic field (EMF) time courses in all 151 sensors elicited by the presentation of the mask alone. Data from the first subject measured in each group are shown. The data show an initial small deflection that peaks after 75 ms and a second higher deflection that peaks after 100 ms (gray highlighted). Then an intermediate minimum occurs after this second deflection. The maxima of the deflections are restricted to a subset of sensor traces. Second and third rows: The EMF difference between masked and unmasked scene presentations [(scene and mask) — scene]. This difference should reveal higher activation elicited by the transition from the scene to the mask. The difference waves are initially quite flat but show a deflection after a certain latency (gray highlighted). The difference waves for the short and long SOAs are plotted in the second and third row, respectively. The SOAs are given in the legend of each plot. Longer SOAs prolong the latency of this deflection. This indicates that the initial deflection in the EMF difference waves reflects activation that is elicited by the scene-mask transition. Again the effects appear to be restricted to a subset of sensors. The latencies of the scene-mask transition seem to be preserved in the difference waves but the form of the difference waves shows only limited similarity with the EMFs produced by the mask alone. The amplitudes of the difference waves are clearly reduced. This indicates that it is unlikely that simple subtraction approaches would correctly reveal residual scene activation after masking or that summation approaches would produce a meaningful prediction of the brain activity.
Figure 3
 
Top row: Single subject averaged event-related magnetic field (EMF) time courses in all 151 sensors elicited by the presentation of the mask alone. Data from the first subject measured in each group are shown. The data show an initial small deflection that peaks after 75 ms and a second higher deflection that peaks after 100 ms (gray highlighted). Then an intermediate minimum occurs after this second deflection. The maxima of the deflections are restricted to a subset of sensor traces. Second and third rows: The EMF difference between masked and unmasked scene presentations [(scene and mask) — scene]. This difference should reveal higher activation elicited by the transition from the scene to the mask. The difference waves are initially quite flat but show a deflection after a certain latency (gray highlighted). The difference waves for the short and long SOAs are plotted in the second and third row, respectively. The SOAs are given in the legend of each plot. Longer SOAs prolong the latency of this deflection. This indicates that the initial deflection in the EMF difference waves reflects activation that is elicited by the scene-mask transition. Again the effects appear to be restricted to a subset of sensors. The latencies of the scene-mask transition seem to be preserved in the difference waves but the form of the difference waves shows only limited similarity with the EMFs produced by the mask alone. The amplitudes of the difference waves are clearly reduced. This indicates that it is unlikely that simple subtraction approaches would correctly reveal residual scene activation after masking or that summation approaches would produce a meaningful prediction of the brain activity.
Figure 4
 
The distribution of magnetic fields over the heads of all subjects 105 ms after the onset of the mask when the occipital activation peaked. Blue indicates magnetic field vectors emanating from the skull, and in the red areas the vectors point toward the skull. Yellow and bright blue indicate a stronger magnetic flux. The white dots represent the sensor locations. The marked sensors (thick white dots) were selected for the analysis.
Figure 4
 
The distribution of magnetic fields over the heads of all subjects 105 ms after the onset of the mask when the occipital activation peaked. Blue indicates magnetic field vectors emanating from the skull, and in the red areas the vectors point toward the skull. Yellow and bright blue indicate a stronger magnetic flux. The white dots represent the sensor locations. The marked sensors (thick white dots) were selected for the analysis.
Figure 5
 
A and C: The time course of the signal strength in the selected posterior sensors in group 1 (A) and 2 (C) averaged over all subjects. Both the target scene and mask produce an initial activation peak approximately 100 ms after stimulus onset. The initial peak from the mask is higher than from the scene, indicating a stronger neural activation by the mask. The early 75-ms peak in the raw data(Figure 3, top row) is visible as a lower peak in C, and as a slope change in A. B and D: The time courses of the signal strength for the scene alone and the combined scene-mask presentations at different SOAs for group 1 (B) and 2 (D). Signal time courses are shown for scene-only (red), scene-mask SOAs of 37 ms (B, blue), 92 ms (B, green), 24 ms (D, blue), and 60 ms (D, green). The dash-dotted difference curves were calculated by subtracting the “scene-only” activation from the activity elicited by the combined scene-mask presentation (e.g., blue curve — red curve). The positive difference peaks (marked by the arrows) are clearly recognizable (e.g., at 133 ms in the 37-ms SOA difference curve and at 192 ms in the 92-ms SOA difference curve). However, by looking, for example, at the 92-ms SOA time course (B, green) and the mask-only time course (A, black), it is clear that the activation strength measured in the combined scene-mask presentation is not simply the sum of the activations obtained in the separate presentations.
Figure 5
 
A and C: The time course of the signal strength in the selected posterior sensors in group 1 (A) and 2 (C) averaged over all subjects. Both the target scene and mask produce an initial activation peak approximately 100 ms after stimulus onset. The initial peak from the mask is higher than from the scene, indicating a stronger neural activation by the mask. The early 75-ms peak in the raw data(Figure 3, top row) is visible as a lower peak in C, and as a slope change in A. B and D: The time courses of the signal strength for the scene alone and the combined scene-mask presentations at different SOAs for group 1 (B) and 2 (D). Signal time courses are shown for scene-only (red), scene-mask SOAs of 37 ms (B, blue), 92 ms (B, green), 24 ms (D, blue), and 60 ms (D, green). The dash-dotted difference curves were calculated by subtracting the “scene-only” activation from the activity elicited by the combined scene-mask presentation (e.g., blue curve — red curve). The positive difference peaks (marked by the arrows) are clearly recognizable (e.g., at 133 ms in the 37-ms SOA difference curve and at 192 ms in the 92-ms SOA difference curve). However, by looking, for example, at the 92-ms SOA time course (B, green) and the mask-only time course (A, black), it is clear that the activation strength measured in the combined scene-mask presentation is not simply the sum of the activations obtained in the separate presentations.
Figure 6
 
The measured latency of the difference peaks plotted against the respective SOA (black dots). The dash-dotted line represents the prediction for the latency of the difference peak(see text for an explanation). The prediction was calculated as the sum of the latency of the mask-only peak (see Figure 5A and C) and the respective SOA.
Figure 6
 
The measured latency of the difference peaks plotted against the respective SOA (black dots). The dash-dotted line represents the prediction for the latency of the difference peak(see text for an explanation). The prediction was calculated as the sum of the latency of the mask-only peak (see Figure 5A and C) and the respective SOA.
Figure 7
 
The overlap between the neural activity produced by the scene and mask. The red curve represents the activation elicited by the processing of the scene alone averaged over all subjects (plotted against the red axes). Note the conspicuous deflection between approximately 70 ms and 170 ms that peaks at 110 ms. The height of the bars indicates the recognition performance (the black right ordinate) at the different mask SOAs (the upper black abscissa). The position of the bars on the lower red abscissa shows the latency of the activation peak in the difference wave for each of these SOAs. There is only a reduction in recognition performance when the mask activations overlap the scene-activation deflection. The error bars indicate the standard error of the mean.
Figure 7
 
The overlap between the neural activity produced by the scene and mask. The red curve represents the activation elicited by the processing of the scene alone averaged over all subjects (plotted against the red axes). Note the conspicuous deflection between approximately 70 ms and 170 ms that peaks at 110 ms. The height of the bars indicates the recognition performance (the black right ordinate) at the different mask SOAs (the upper black abscissa). The position of the bars on the lower red abscissa shows the latency of the activation peak in the difference wave for each of these SOAs. There is only a reduction in recognition performance when the mask activations overlap the scene-activation deflection. The error bars indicate the standard error of the mean.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×