October 2019
Volume 19, Issue 12
Open Access
Article  |   October 2019
A crash in visual processing: Interference between feedforward and feedback of successive targets limits detection and categorization
Author Affiliations
Journal of Vision October 2019, Vol.19, 20. doi:https://doi.org/10.1167/19.12.20
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Jacob G. Martin, Patrick H. Cox, Clara A. Scholl, Maximilian Riesenhuber; A crash in visual processing: Interference between feedforward and feedback of successive targets limits detection and categorization. Journal of Vision 2019;19(12):20. https://doi.org/10.1167/19.12.20.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The human visual system can detect objects in streams of rapidly presented images at presentation rates of 70 Hz and beyond. Yet, target detection is often impaired when multiple targets are presented in quick temporal succession. Here, we provide evidence for the hypothesis that such impairments can arise from interference between “top-down” feedback signals and the initial “bottom-up” feedforward processing of the second target. Although it is has been recently shown that feedback signals are important for visual detection, this “crash” in neural processing affected both the detection and categorization of both targets. Moreover, experimentally reducing such interference between the feedforward and feedback portions of the two targets substantially improved participants' performance. The results indicate a key role of top-down re-entrant feedback signals and show how their interference with a successive target's feedforward process determine human behavior. These results are not just relevant for our understanding of how, when, and where capacity limits in the brain's processing abilities can arise, but also have ramifications spanning topics from consciousness to learning and attention.

Introduction
The human brain can perceive and categorize images shown in rapid succession with remarkably brief presentation times, at rates of up to 70 images each second (Potter, 1975; S. Thorpe, Fize, & Marlot, 1996; Perrett, 2001; Evans & Treisman, 2005). 
Drawing inspiration from Hubel and Wiesel's ground-breaking work (Hubel & Wiesel 1962, 1968), and also more recent work (Riesenhuber & Poggio, 2000; Hong, Yamins, Majaj, & DiCarlo, 2016), this ability has been modeled as the result of a single feedforward pass of neuronal activity through the ventral stream processing hierarchy. The feedforward pass is thought to start from shape and object representations in occipitotemporal cortex, and then move to task circuits in prefrontal cortex (Ungerleider & Haxby, 1994; VanRullen & Thorpe, 2001; Riesenhuber & Poggio, 2002; VanRullen & Thorpe, 2002; Freedman, Riesenhuber, Poggio, & Miller, 2003; Serre, Oliva, & Poggio, 2007). In line with these results, electroencephalography (EEG) studies in which participants detected animals in natural scenes have found voltage differences over frontal electrodes between target and distractor images within 170–180 ms (S. Thorpe et al., 1996; Rousselet, Fabre-Thorpe, & Thorpe, 2002). 
However, in recent years, studies have shown that visual processing in this hierarchy does not end after the initial feedforward pass. Indeed, there is considerable evidence of a feedback period around 200–250 ms during which higher level frontal areas send re-entrant signals into lower level visual areas (Williams et al., 2008; Camprodon, Zohary, Brodbeck, & Pascual-Leone, 2012). Many studies have indicated that this feedback activity plays a key and causal role in visual awareness (Lamme & Roelfsema, 2000; Del Cul, Baillet, & Dehaene, 2007; Boehler, Schoenfeld, Heinze, & Hopf, 2008; Koivisto, Railo, Revonsuo, Vanni, & Salminen-Vaparanta, 2011; Muggleton, Banissy, & Walsh, 2011; Camprodon et al., 2012; Edelman & Gally, 2013; Koivisto, Salminen-Vaparanta, Grassini, & Revonsuo, 2016; Fahrenfort, Leeuwen, Olivers, & Hogendoorn, 2017; ). In effect, even though neurons in frontal cortex appear to have already differentiated the presence of the visual target around 170 ms, awareness of that particular target seems to rely on the presence of feedback around 230 ms (Pollen, 1999; Lamme, 2000; Meyer, 2012). 
Besides effectuating the awareness that something of interest was just seen, what other roles might these feedback signals serve? Experimental work in mice (Larkum, Senn, & Lüscher, 2004; De Pasquale & Sherman, 2013) and monkeys (Li, Piëch, & Gilbert, 2008), along with theoretical work (Grossberg, 1999; Ahissar & Hochstein, 2004; Del Cul et al., 2007; Seitz & Dinse, 2007; Gilbert & Li, 2012) has also indicated that this feedback period can play an important role in learning. Furthermore, as seen from the recent successes of deep learning algorithms, in which learning happens “top-down” following an initial feedforward pass, there are computational reasons for having the frontal cortex pass these feedback signals back to the very same occipitotemporal neurons which did the initial feedforward processing of the stimulus of interest. Indeed, feedback in all deep learning frameworks plays the crucial computational role for training the network how to recognize stimuli. By utilizing a feedback period that reshapes the tuning of the same feedforward neurons to provide more descriptive signals to the top-level of the network on subsequent presentations, deep learning techniques have recently matched and even surpassed human image classification performance (Krizhevsky, Sutskever, & Hinton, 2012; He, Zhang, Ren, & Sun, 2014; Taigman, Yang, & Ranzato, 2014). 
In summary, detection seems to be accomplished by the feedforward pass, and the subsequent feedback signals seem to be a substrate for both conscious awareness and learning. These findings give rise to the prediction that feedback can affect subsequent feedforward signals, and vice versa. Indeed, due to the timing of the neuronal response latencies, the feedforward (0–150 ms) and feedback (200–250 ms) periods of multiple targets can intermingle with each other when two successive visual targets are presented very close together in time. This interference hypothesis depends on the assumption that, by and large, the same neurons participate in both the feedforward and feedback waves. Indirect evidence for this assumption comes from fMRI studies of imagery that have reported similar activation patterns for perception and imagery of the same stimulus (Stokes, Thompson, Cusack, & Duncan, 2009). At the behavioral level, the interaction of two stimuli presented in rapid succession has been studied extensively. Studies of the so-called “Attentional Blink” usually focus on impairments in the processing of a second target presented shortly after a first target. This effect has been postulated to arise from attentional or working memory bottlenecks in parietal and frontal cortex (Broadbent & Broadbent, 1987; Raymond, Shapiro, & Arnell, 1992; S. Luck, Vogel, & Shapiro, 1996; Shapiro, Raymond, & Arnell, 1997; Marois, 2005; Del Cul, Baillet, & Dehaene, 2007; Dux & Marois, 2009; Martens & Wyble, 2010; Marti & Dehaene, 2017; Nieuwenhuis, Holmes, Gilzenrat, & Cohen, 2005; Hommel et al., 2006). Of relevance for the present study, it has also been proposed that this Attentional Blink “is mediated by feedback mechanisms triggered by processing in higher level brain areas that project back to earlier areas” (Martens & Wyble, 2010, p. 5). A related phenomenon, called “Backward Masking” (BM; Breitmeyer, 1984; Vorberg, Mattler, Heinecke, Schmidt, & Schwarzbach, 2003; Breitmeyer, 2007), refers in some respects to the opposite situation, wherein the perception of a briefly presented first stimulus is affected by a rapidly following second stimulus. One hypothesis to explain this observation is that backward masking “derives its effectiveness, at least partly, from disrupting re-entrant processing, thereby interfering with the neural mechanisms of figure-ground segmentation and visual awareness itself” (Fahrenfort Scholte, & Lamme, 2007). On the other hand, it has also been argued that instead of feedback activity, feedforward activity is a better model of the fact that a second stimulus can mask the first (Macknik & Martinez-conde, 2007). 
In this study, we showed participants streams of images at a rapid rate (12 Hz) in various configurations to strategically intermingle different portions of the feedforward and feedback signals of multiple targets. Next, we analyzed the effects on both the brain's EEG responses and the participants' detection and categorization performance. The results provide a unified account of how interactions between feedforward and feedback signals give rise to bottlenecks in the visual system's ability to detect and categorize visual stimuli, and show how these bottlenecks can be mitigated. 
Methods
Stimuli
Image stimuli for both experiments were drawn from a large commercially available image library containing 1200 photographs of natural scenes that have been used in previous psychophysical studies (S. Thorpe et al., 1996). The image stimuli were segregated into 600 images with animals (targets) and 600 images of natural scenes and buildings (distractors). We converted the image to grayscale and resized them to 384 × 256 pixels, which subtended approximately 3.2° width by 5.8° height of visual angle. We presented all images on a gray background. 
Presentation hardware
Stimuli were presented on a Samsung Syncmaster 2233RZ monitor with 5 ms response time and 120 Hz refresh rate, driven by two SLI linked NVIDIA GeForce 9800GT GPUs, at a screen resolution of 16801050. Previous research and our own photodiode-based analyses indicated that the 2233RZ monitor had accurate timing (Wang & Nikolić, 2011). The display was controlled by a custom-built workstation running Gentoo Linux with a 64-bit kernel that was tuned for real-time processing. The paradigm was programmed in MATLAB R2008a (MathWorks, Natick, MA) using Psychtoolbox version 3 (Brainard, 1997; Pelli, 1997). We recorded all target onset presentation times with a photodiode that was synchronized with the EEG signals. Participants entered their responses by using their right hand on the numeric keypad of a PS/2 keyboard. 
Task, presentation, and participants for Experiment 1 (the single-stream EEG experiment)
Participants performed a standard Animal/no Animal target detection task (S. Thorpe et al., 1996; Evans & Treisman, 2005; Serre et al., 2007) on rapid serial visual presentation (RSVP) streams. Each trial involved a stream of images that was presented in the fovea at a rate of 12 images/s with no blanks in between images. The computer pseudorandomly chose the images to be included in the streams for each trial, and all images within any given trial were unique. Trials either contained zero targets (all distractors), one target embedded in a stream of distractors (“T1-only”), or two targets embedded in a stream of distractors at a specified distance apart (“Lag” trials). Each trial started with an intertrial delay which varied randomly between 200–400 ms, and was then followed by a fixation cross which appeared for 300 ms, a blank gray screen (300 ms), between eight and 14 initial distractors, a potential target image, eight images (that on Lag trials contained a second target image at the appropriate position), and a final set of seven additional distractor images. After each stream, a small square cued the participant to respond manually and without a time limit (see Figure 1). Participants were told that each stream could contain zero, one, or two images containing one or more animals, and that, after each stream was presented, they were to enter the number of images containing animals that they saw on the number pad by pressing 0, 1, or 2. 
Figure 1
 
Experiment 1: Paradigm and detection performance (N = 14). (A) Illustration of the experimental paradigm. Participants counted instances of animal target images amongst distractor images in RSVP streams presented at 12 Hz. (B) Participant accuracies for the number of targets detected within each condition. Chance level was 33% (response choices were zero animals, one animal, and two animals, and each correct response choice was equally likely). “No targets” refers to trials without target images, and T1-only refers to trials containing only a single target image. Lag trials are those in which we presented two target images, wherein the second target appeared at the specified lag in time from the first (e.g., Lag 1 had no intervening distractors, Lag 2 trials had one intervening distractor, and so on). Error bars denote the 95% CI of the means in each condition. Target and distractor images in (A) Copyright © 2019 Jacob G. Martin and Corel and its licensors. All rights reserved.
Figure 1
 
Experiment 1: Paradigm and detection performance (N = 14). (A) Illustration of the experimental paradigm. Participants counted instances of animal target images amongst distractor images in RSVP streams presented at 12 Hz. (B) Participant accuracies for the number of targets detected within each condition. Chance level was 33% (response choices were zero animals, one animal, and two animals, and each correct response choice was equally likely). “No targets” refers to trials without target images, and T1-only refers to trials containing only a single target image. Lag trials are those in which we presented two target images, wherein the second target appeared at the specified lag in time from the first (e.g., Lag 1 had no intervening distractors, Lag 2 trials had one intervening distractor, and so on). Error bars denote the 95% CI of the means in each condition. Target and distractor images in (A) Copyright © 2019 Jacob G. Martin and Corel and its licensors. All rights reserved.
Each experimental session contained 10 blocks of 120 trials each. Within the entire experiment, 400 trials were distractor-only trials that contained no targets, 400 trials were T1 only trials that contained only one animal image target, and 400 trials were lag trials that contained two targets. The lag refers to the temporal position of the second target image, T2, relative to the first target image, T1. For example, lag 1 means that T2 appeared directly after T1, whereas lag 2 means that T2 was the second image to appear after T1. To obtain a large number of trials for each condition, each single experimental session contained only even or odd lags. Thus, during an experimental session, each subject was presented with either 100 trials of each lag 1, 3, 5, and 7 or 100 trials of each lag 2, 4, 6, and 8. Each of the 10 blocks of 120 trials contained 40 distractor-only trials, 40 T1-only trials, and 40 lag trials (10 trials for each of the different lag conditions). Therefore, each correct response had an equal probability within each block and across the entire experiment. We calculated the detection accuracy for each condition by computing the fraction of correct trials within that condition. 
Fourteen participants (two left-handed, eight males, six females, aged 18–34) with normal or corrected-to-normal vision participated in a total of nineteen separate EEG recording sessions. To increase the number of trials for each condition, the odd (1, 3, 5, and 7) and even (2, 4, 6, and 8) lag trials were separated into two sessions with session order counterbalanced and run on separate days. Only some of the participants returned for the second session, so that some participants completed only the odd lags and some only the even lags. Ten of the nineteen experimental sessions used odd lags, and nine of the nineteen had even lags. Five subjects completed both even lags and odd lags. Georgetown University's Institutional Review Board approved the experimental procedures and protocol, and we obtained written informed consent from each subject prior to participating in the study. 
Task presentation and participants for Experiment 2 (the two-stream behavioral experiment)
In Experiment 2, six participants (one left-handed, one male, five females, aged 18–21) with normal or corrected-to-normal vision participated in a behavioral-only dual stream experiment. Georgetown University's Institutional Review Board approved the experimental procedures and protocol, and we obtained written informed consent from each participant prior to each experiment. In Experiment 2, participants performed a standard Animal/no Animal target detection task (S. Thorpe et al., 1996; Evans & Treisman, 2005; Serre et al., 2007) on two side-by-side synchronous 12 Hz RSVP streams 3° to the left and right of a central fixation cross (see Figure 5BC) (S. Thorpe et al., 1996; Evans & Treisman, 2005; Serre et al., 2007). Each trial involved a dual stream of images that were presented simultaneously at a rate of 12 images/s. The computer pseudorandomly chose the images for the dual image streams for each trial, and all images within any given trial were unique. Trials either contained zero targets (all distractors), one target embedded in a stream of distractors (“T1-only”), or two targets embedded in a stream of distractors at a specified distance apart (“Lag” trials). Each trial started with an intertrial delay which varied randomly between 200–400 ms, and was then followed by a fixation cross which appeared for 300 ms, a blank gray screen (300ms), dual streams which each contained between eight and 14 different distractor images, a potential target image, eight images (which on lag trials contained a second target image at the appropriate location), and a final set of seven additional distractor images. Each trial had a small square that cued the subject to respond. Participants were told that each pair of streams could contain zero, one, or two images containing one or more animals, and that, after each dual stream was presented, they were to enter the number of images containing animals that they saw on a number pad by pressing 0, 1, or 2. Participants were further instructed that target images could appear in either the left or the right stream, and that, in two target trials, the second target could appear in the same or different stream as the first target. 
Each participant completed 10 blocks of 120 trials each. Each of the 10 blocks of 120 trials contained 40 distractor-only trials, 40 T1-only trials, 20 Lag 1 same side trials, and 20 Lag 1 different side trials. Within each block, the target in T1-only trials was counterbalanced to appear on either the left or the right stream (20 left, 20 right). Likewise, the 20 Lag 1 same side trials were designed to have 10 trials that had both targets appearing on the left side, and 10 trials that had both targets appeared on the right side. Furthermore, the 20 Lag 1 different side trials were constructed so that half of the trials had T1 appear in the left stream with T2 in the right stream, and the other half of the trials had T1 appear in the right stream and T2 in the left stream. As in the EEG experiment, each correct response had an equal probability within each block and across the entire experiment. Participants did a single practice trial of each trial type before the experiment started. 
Task, presentation, and participants for Experiment 3 (the two-stream EEG experiment)
In Experiment 3, eighteen participants (two left-handed, eight females, ten males, aged 19–36) with normal or corrected-to-normal vision participated in one EEG recording session each. Georgetown University's Institutional Review Board approved the experimental procedures and protocol, and we obtained written informed consent from each participant prior to each experiment. The participants performed a standard Animal/no Animal target detection task (S. Thorpe et al., 1996; Evans & Treisman, 2005; Serre et al., 2007) plus an additional categorization task on two parallel RSVP streams of equal durations. The two separate nonidentical parallel RSVP streams were positioned 3° to the left and right of a central fixation cross and presented synchronously at a rate of 12 images/s each (Rousselet et al., 2002; Rousselet et al., 2004). We informed the participants that each stream could contain zero, one, or two images containing one or more animals and asked them to count and remember the animals presented. We also instructed them to maintain central fixation throughout the trial. The computer pseudorandomly created image streams for each trial, and all images within any given trial were unique. Trials either contained zero targets (all distractors), one target embedded in one of the streams of distractors (“T1-only” trials), two targets embedded in either the same stream of distractors at the specified time apart (“Lag same side” trials), or one target in each of the separate streams at the specified time apart (“Lag different side” trials). Each trial started with an intertrial delay which varied randomly between 200–400 ms, and was then followed by a fixation cross which appeared for 300 ms, a blank gray screen (300ms), between eight and 14 initial distractors, a potential target image, eight images (that on lag trials contained a second target image at the appropriate position), and a final set of seven additional distractor images. After each stream, a small square cued the participants to respond (see Figure 1A), and they entered the number of images containing animals that they saw on a number pad by pressing 0, 1, or 2. After selecting the number of images with animals seen, the participants selected the category of each animal detected from a shuffled list of seven animal categories: canine, feline, bear, horse, deer, primate, bird, and don't know. If the participant reported seeing one or two images containing animal(s), then they were asked one or two categorization questions, respectively. We repeated no category of animal within any given trial in which there were two targets presented. Given the close temporal proximity of the two targets and the potential to confuse their sequential order, we analyzed the data by ignoring the order in which the participants categorized the two targets. That is, we considered that the categorization for T1 or T2 was correct regardless of whether the participant knew it was the first or second target. 
Each EEG experimental session had 1,000 trials in 10 blocks total: 100 no-target trials, 50 trials of T1-only Left (L), 50 trials of T1-only Right (R), 100 each of the four Lag 1 trial types (LL, LR, RL, RR) and 100 each of the four Lag 7 trial types (LL, LR, RL, RR). Each of the 10 blocks contained 100 trials split into 10 no-target trials, five trials of T1-only Left (L), five trials of T1-only Right (R), 10 each of the four Lag 1 trial types (LL, LR, RL, RR), and 10 each of the four Lag 7 trial types (LL, LR, RL, RR). 
Electrophysiological methods
EEG signals were recorded using an Electrical Geodesics Net Amps 300 system with a Hydrocel GSN 128 channel EEG net at a 500 Hz sampling rate. We referenced EEG signals to the vertex, which was placed on the top of the head at the intersection of the midpoint between the nasion and inion and the midpoint between the right and left preauricular points. Before each of the ten blocks, we ensured that the electrode impedance for each channel was below 50 kΩ. The raw binary EEG data file was preprocessed using a block-specific direct current trend removal and filtered using EEGLAB version 9 (Delorme & Makeig, 2004). Each block was bandpass-filtered between 2 Hz and 30 Hz (achieved by first low-pass, and then highpass filtering), using a 48-point low-pass filtering with a low-pass transition band width of 5 Hz and a 750-point highpass filtering using a highpass transition band width of 0.3 Hz. We segmented the EEG signals for each trial on boundaries from 200 ms before initial target image onset to 1,200 ms postinitial target image onset. In distractor-only trials, we used a randomly selected distractor image after the eight through 14 initial distractor images to mark the stimulus onset. For each channel and trial, we subtracted the baseline EEG voltage over the 200 ms prestimulus period from the entire trial (Luck, 2014). We removed trials that contained muscle and eye blink artifacts by eliminating any trial that had any voltage reading for any channel that exceeded 75 μV. To preserve the consistency of behavioral and electrophysiological data, we calculated behavioral performance on the same set of trials as used for the EEG analysis. We combined the EEG and behavioral data from participants that participated in both even and odd lag experiments in Experiment 1 into a single experimental data unit by concatenating the postprocessed EEG and behavioral data from each experimental session. 
EEG source localization
We averaged EEG potentials using a participant-weighted approach that computed the mean of the average of each participant's EEG signals. Source localization was subsequently accomplished using an unconstrained LCMV Beamforming method from the Brainstorm EEG analysis software (Van Veen, van Drongelen, Yuchtman, & Suzuki, 1997; Tadel, Baillet, Mosher, Pantazis, & Leahy, 2011) on the subject-averaged correct T1-only EEG signals for each of the 128 channels. The data covariance matrix was regularized with a parameter of 0.1. The noise covariance matrix was computed from the recordings using the single trial data from within the period of 200 ms to 2 ms before stimulus onset. The DC offset was removed by subtracting the average voltage between 200 ms to 2 ms from each trial. 
Statistical tests
Unless indicated otherwise, all statistical tests on EEG signals were done using appropriately adjusted p values according to the false discovery rate based on a q cutoff of 0.05 (Benjamini & Yekutieli, 2001; Groppe, 2011). 
Granger causality
Granger causality (Granger, 1969) has been used to investigate causal relationships between prefrontal and early visual neural recording sites in monkey electrophysiology recordings (Cui, Xu, Bressler, Ding, & Liang, 2008; Gregoriou, Gotts, Zhou, & Desimone, 2009; Seth, Barrett, & Barnett, 2015). We computed the population average of differences in Granger causality values between T1-only and Distractor-only trials at the single trial level for each subject using the BSMART toolbox (Cui et al., 2008) using a model order of 10 and a sliding window of 40 ms (results were robust with respect to variations in model order and window size). We computed the single trial Granger causalities for each subject using the aforementioned parameters between all pairs of a subset of channels within the 10–20 system, which corresponded to Occipital (O1, Oz, O2), Temporal (T3, T4, T5, T6), and Frontal (Fp1, Fz, Fp2) electrodes. We calculated Granger causalities for each time point for correct T1-only and Distractor-only trials separately for each channel pair. We then averaged the Granger causality values for each channel pair over the 2–30 Hz interval for each condition and then subsequently divided to obtain a Granger causality gain (Cohen et al., 2012) measurement for T1-only trials over Distractor-only trials for each subject at the single trial level for that particular channel pair. Next, we then averaged over all possible pairings of frontal to occipitotemporal electrodes (and vice-versa) and plotted the resulting values at each time point. Thus, each causality gain value at each particular time point was the magnitude of the single trial Granger causality increase from Distractor-only to T1-only trials within a 40 ms window centered at that time point between the referenced channel groups. Finally, we performed paired sample t tests at each time point on the average 2–30 Hz frontal to occipitotemporal Granger causality (and vice-versa) between T1-only and Distractor-only trials. 
Single trial EEG classification: Figure 8
We summarized all Experiment 3 classification results (Figure 8) by the mean and confidence intervals at each time point for the average linear-discriminant classification performance over 100 iterations of 10-fold cross validation. We used MATLAB's classify() function to train the linear discriminant classifiers at each time point. Participant-specific classifiers were trained individually for each channel grouping (all frontal or posterior contralateral channels in the 10–20 EEG system) at each time point on the voltage in each channel grouping at that time point as features. The resulting classifiers aimed to discriminate trials in which participants correctly detected one target from those trials in which participants correctly responded that there were no targets. We trained each T1-only participant-specific classifier for each channel grouping and time point separately for right T1-only targets and left T1-only targets. Afterwards, we tested these channel, time, location, and participant-specific T1-only classifiers separately on Lag 1 same side trials in which T1 was the only target correctly categorized and trials in which T2 only was the only target correctly categorized. Next, we tested each T1-specific classifier at each corresponding time point on the appropriate Lag 1 same side trial—i.e., we tested T1-only left (right) trained classifiers on all Lag 1 same side LL (RR) trials, respectively. Next, we aggregated the results of the classifiers for each participant and side and plotted the mean and confidence intervals of the distributions of the areas under the Receiver Operator Characteristic curves for each iteration of each participant's classification fold. Finally, we calculated statistics for the various comparisons as the paired t test of participant-specific average classification values. 
Single trial EEG animal category classification: Figure 9
To classify the categories of animals in Experiments 1 and 3 (see Figure 9), we used MATLAB's RUSBoost ensemble classification method at each time sample between 0 ms and 500 ms poststimulus onset. We used the RUSBoost method because it handles imbalances in the frequency of occurrence between the animal categories. The learners were linear discriminant classifiers without regularization parameters (i.e., Gamma was 0). There were 500 learning cycles for each time point. We built and tested the single trial classifiers for each participant individually. The training and testing data consisted of the mean raw voltage in the channels used within 60 ms windows centered at each time point. For each time point, we did 10 iterations of 10-fold cross-validation and reported the mean and standard error of the classification accuracy over the resulting 100 iterations for each subject. Statistical tests were done between the mean classification accuracies across participants at each time point and the level of chance. 
Results
Experiment 1
To test the hypothesis that feedback and feedforward activity can interfere with each other, we first conducted a high-density 128-channel EEG experiment in which fourteen participants did a standard rapid serial visual presentation (RSVP) animal detection task (Evans & Treisman, 2005). During Experiment 1, participants counted images of animals within a single short burst of images presented at a rate of 12 Hz (Figure 1). Each trial consisted of a burst of images that had either no target images (Distractor trials), one target image (T1-only trials), or two different target images separated by zero to seven distractors (Lag 1 to Lag 8 trials). The participant's task was to passively count the animals during the stream and subsequently enter the number of animals that they saw. 
Behaviorally, whenever two targets were presented within roughly 400 ms of each other, subjects had lower detection performance on short Lag trials compared to T1-only trials (Lag 1 to Lag 4 accuracies were each significantly different from T1-only trial accuracies (p < 0.01, paired t test, df = 13, Figure 1B). If target detection were an independent probabilistic process, the predicted two-target accuracy for each participant should be the square of each participant's T1-only accuracy, and performance below this level would indicate interference between the two recognition processes. Indeed, in our data, only behavioral accuracies in conditions Lag 1 to Lag 3 were significantly lower from what would be expected if target detection were indeed an independent process (Lag 1: p = 3.2713 × 10−5, Lag 2: p = 8.3593 × 10−5, Lag 3: p = 0.0037, Lag 4 and greater: all p ≥ 0.5704, paired sample left tailed t tests, df = 8). 
Analysis of EEG data from trials in which there was only one animal target in the stream revealed that, starting at around 100 ms and reaching a maximum difference around 150 ms, T1-only image streams had significantly larger voltage in channels over occipitotemporal (OT) areas than distractor-only streams (Figure 2B, FDR corrected p < 0.05). These voltage differences were followed by significantly more activation in frontal channels from around 170 ms to around 220 ms (Figure 2B). Beamforming analyses estimated that the neural sources for the earlier posterior signal were located in occipitotemporal cortex, and the later anterior signal was located in frontal cortex (Figure 2A; VanVeen et al., 1997). Therefore, trials with only one animal image target had EEG signals whose neural sources and timing corresponded well with both feedforward models of object recognition and human electrophysiological data (S. Thorpe et al., 1996; Ungerleider & Haxby, 1994; Liu, Agam, Madsen, & Kreiman, 2009). 
Figure 2
 
Experiment 1: source estimations, electrophysiological responses, and Granger causality for correct single target trials (N = 14). Time 0 corresponds to the target onset. (A) Pseudoneural activity indices obtained with beamforming signal source estimation for correct T1-only trials centered at the indicated time points. (B) EEG voltages for correct T1-only trials. The legend at the far left indicates the electrode sets for each group of potentials (frontal, temporal, parietal, and occipital) and each row in the colored plot to the right represents a single channel from that group. The circular outline of the head is aligned to the level of the eyes. All group-averaged T1-only responses were truncated to zero (shown in white) when a paired-sample t test over all participants between T1-only and Distractor-only participant averaged voltages indicated an FDR corrected p value greater than 0.05. (C) Average and standard error of the mean of the ratio between the single trial Granger causality for correct T1-only trials and the single trial Granger causality in correct Distractor trials (“Causality Gain”) at the centers of a sliding 40 ms temporal window. OT (Occipitotemporal) and Frontal refer to the respective groups of channels in the 10–20 system. Each causality gain value at each particular time point represents the average of the magnitude of the relative single trial Granger causality increases from Distractor-only to T1-only trials for all pairs of channels between the indicated channel groups. The black line indicates the average Granger causality ratio for the OT channel to frontal channel influence, and the gray line shows the Granger causality value ratio from frontal to OT channels. Shaded areas indicate the standard error of the mean of the causality ratio over all participants. The colored horizontal bars below the curve indicate the time intervals in which a paired sample t test between the average target and distractor Granger causality values across single participants resulted in p values below 0.01 for the condition corresponding to the respective color.
Figure 2
 
Experiment 1: source estimations, electrophysiological responses, and Granger causality for correct single target trials (N = 14). Time 0 corresponds to the target onset. (A) Pseudoneural activity indices obtained with beamforming signal source estimation for correct T1-only trials centered at the indicated time points. (B) EEG voltages for correct T1-only trials. The legend at the far left indicates the electrode sets for each group of potentials (frontal, temporal, parietal, and occipital) and each row in the colored plot to the right represents a single channel from that group. The circular outline of the head is aligned to the level of the eyes. All group-averaged T1-only responses were truncated to zero (shown in white) when a paired-sample t test over all participants between T1-only and Distractor-only participant averaged voltages indicated an FDR corrected p value greater than 0.05. (C) Average and standard error of the mean of the ratio between the single trial Granger causality for correct T1-only trials and the single trial Granger causality in correct Distractor trials (“Causality Gain”) at the centers of a sliding 40 ms temporal window. OT (Occipitotemporal) and Frontal refer to the respective groups of channels in the 10–20 system. Each causality gain value at each particular time point represents the average of the magnitude of the relative single trial Granger causality increases from Distractor-only to T1-only trials for all pairs of channels between the indicated channel groups. The black line indicates the average Granger causality ratio for the OT channel to frontal channel influence, and the gray line shows the Granger causality value ratio from frontal to OT channels. Shaded areas indicate the standard error of the mean of the causality ratio over all participants. The colored horizontal bars below the curve indicate the time intervals in which a paired sample t test between the average target and distractor Granger causality values across single participants resulted in p values below 0.01 for the condition corresponding to the respective color.
The 170–220 ms frontal potentials occurred before the 230–270 ms potentials in occipitotemporal channels. To test the hypothesis that the earlier 170–220 ms frontal channel activity caused the 230–270 ms occipitotemporal channel activity, we computed Granger causality between all pairs of occipitotemporal and frontal channels in correct T1-only trials (Granger, 1969; Cui et al., 2008; Gregoriou et al., 2009). The Granger analysis measured the strength of causal influence between the two channel groups within a 40 ms window surrounding each time point (channels were selected in an unbiased fashion based on the 10–20 electrode mapping system). We used the single trial EEG signals and not the activity of the beamforming sources because source localizations are based on the average of many trials and produce a single result, whereas the statistical power of Granger causality should be greater and more convincing when done on single trials. The Granger causality analysis revealed that visual target processing in correct T1-only trials involved significant increases in Granger causality from occipitotemporal channels to frontal channels within 40 ms time windows surrounding every point in the interval 201–214ms (t test, p < 0.01, df = 13), and from frontal channels to occipitotemporal channels within 40 ms time windows centered at every point in the interval between 248–260 ms (t test, p < 0.01, df = 13; Figure 2C). Moreover, the increase in frontal to occipitotemporal Granger causality in correct T1-only trials over that in distractor-only trials was larger than the change in Granger causality within occipitotemporal channels during 40 ms time windows centered at every point in the interval between 256–262 ms (t test, p < 0.01, df = 13). Thus, at the single trial level, there was evidence for an increase in the modulation of neural activity in occipitotemporal channels by frontal channels on target trials. The time frames of these Granger-causal modulations largely corresponded with the timing of the neural sources that were also localized in occipitotemporal and frontal cortices (see Figure 3). These results support the hypothesis that target presence was associated with a strong, re-entrant modulation of occipitotemporal channels by frontal channels. 
Figure 3
 
Experiment 1: source estimations for correct single target trials (N = 14). Time 0 corresponds to the target onset. Pseudoneural activity indices were obtained with beamforming signal source estimations for correct T1-only trials centered at the indicated time points.
Figure 3
 
Experiment 1: source estimations for correct single target trials (N = 14). Time 0 corresponds to the target onset. Pseudoneural activity indices were obtained with beamforming signal source estimations for correct T1-only trials centered at the indicated time points.
Next, we analyzed EEG signals in two-target trials. While responses for long lags clearly showed the signals associated with each individual target (target “templates”), the signals at progressively shorter lags had increasing degrees of overlapping interference between the two templates (Figure 4). The possibility that such interference between the two target templates was related to the behavioral impairment raised the question of whether we could increase participants' detection performance by reducing interference in some other way. One potential way of reducing interference would be to exploit the fact that early processing of lateralized visual stimuli is known to occur in the contralateral hemisphere (Rousselet et al., 2002; Del Cul, Baillet, & Dehaene, 2007). However, the lateralization of later processing is less well understood. We tested whether the feedback signals at 230 ms in Experiment 1 were lateralized according to target location by analyzing T1-only trials in which targets were either entirely contained within the left or the right third of the image. In these trials, the feedback potential at 230 ms was stronger in contralateral occipitotemporal channels than it was in ipsilateral occipitotemporal channels (see Figure 5A), indicating that the later re-entrant processing in occipitotemporal channels was also lateralized according to target location. We were therefore able to explore further the interference hypothesis by limiting the behavioral analyses to trials that had both targets lateralized oppositely within the image; so that their respective late and early neural processing would putatively be less likely to interfere. According to the interference hypothesis, detection accuracy for T1 and T2 should increase in a lateralized visual arrangement in which two targets appeared in different hemifields because the feedback triggered by T1 should not have “crashed” into the low-level visual processing of T2, and vice versa. Lag 1 trials should especially show this effect because the 230 ms T1 feedback signal and the 150 ms T2 feedforward signal should coincide in OT when targets were presented 83 ms apart. Indeed, a re-analysis of the data in Experiment 1 showed that behavioral performance for Lag 1 trials was on average 11% higher when the center of successive targets appeared on opposite sides of fixation than when the centers of successive targets appeared on the same side of fixation (paired t test, p = 0.0022, df = 13). 
Figure 4
 
Experiment 1: average EEG voltages within each type of two-target “Lag” trial (N = 14). Color range for voltages corresponds to the color bar shown at the far right. Animal 1 onset (T1) is at 0 ms and Animal 2 onset (T2) is marked with an appropriately shifted vertical black line for each Lag. Channels are grouped according to the electrode groupings shown in the legend at the far left (from top: Frontal, Temporal, Parietal, Occipital). The circular outline of the head is aligned to the level of the eyes. There are two clear repetitions of the same target “template” at Lag 8 which slowly mix together when approaching Lag 1. For clarity, voltage averages that were between −0.5 and 0.5 microvolts were truncated to zero.
Figure 4
 
Experiment 1: average EEG voltages within each type of two-target “Lag” trial (N = 14). Color range for voltages corresponds to the color bar shown at the far right. Animal 1 onset (T1) is at 0 ms and Animal 2 onset (T2) is marked with an appropriately shifted vertical black line for each Lag. Channels are grouped according to the electrode groupings shown in the legend at the far left (from top: Frontal, Temporal, Parietal, Occipital). The circular outline of the head is aligned to the level of the eyes. There are two clear repetitions of the same target “template” at Lag 8 which slowly mix together when approaching Lag 1. For clarity, voltage averages that were between −0.5 and 0.5 microvolts were truncated to zero.
Figure 5
 
Reducing interference improved detection performance. (A) Experiment 1: average EEG difference of absolute values of voltage in each channel at 230 ms (left plot) and significance of differences (right plot) in each channel for correct T1-only trials between targets that were entirely contained within the left third of the image and targets that were entirely contained in the right third of the image (t test, df = 13, displayed as the negative base-10 logarithm of the p value, so that a value of 2 corresponds to p = 0.01). (B) Experiment 2: an example Lag 1 trial in Experiment 2, in which six participants performed target detection on two simultaneously presented side-by-side image streams while fixating on a central fixation cross. The example shown is a different sides trial in which the two consecutive targets appeared on different sides. (C) Experiment 2: an example Lag 1 same sides trial in which the two targets appeared on the same side. (D) Detection accuracy in Experiment 2. Error bars denote the standard error of the mean (N = 6) and the p values are from a paired t test (df = 5). Images in (B) and (C) are Copyright © 2019 Jacob G. Martin and Corel and its licensors. All rights reserved.
Figure 5
 
Reducing interference improved detection performance. (A) Experiment 1: average EEG difference of absolute values of voltage in each channel at 230 ms (left plot) and significance of differences (right plot) in each channel for correct T1-only trials between targets that were entirely contained within the left third of the image and targets that were entirely contained in the right third of the image (t test, df = 13, displayed as the negative base-10 logarithm of the p value, so that a value of 2 corresponds to p = 0.01). (B) Experiment 2: an example Lag 1 trial in Experiment 2, in which six participants performed target detection on two simultaneously presented side-by-side image streams while fixating on a central fixation cross. The example shown is a different sides trial in which the two consecutive targets appeared on different sides. (C) Experiment 2: an example Lag 1 same sides trial in which the two targets appeared on the same side. (D) Detection accuracy in Experiment 2. Error bars denote the standard error of the mean (N = 6) and the p values are from a paired t test (df = 5). Images in (B) and (C) are Copyright © 2019 Jacob G. Martin and Corel and its licensors. All rights reserved.
Experiment 2
To further investigate the effect of lateralized target presentation on detection performance, we conducted a separate behavioral-only experiment (Experiment 2) in which six participants counted instances of animals in two side-by-side concurrent streams of images while maintaining central fixation (see Figure 5B, C, and D). This second experiment verified that presenting T1 and T2 in different lateralized streams was associated with an increase in awareness of the number of targets in Lag 1 conditions (27.4% to 44.2%, paired t test, p = 0.0009; see Figure 5D). 
Experiment 3
The first two experiments support the hypothesis that detection performance for successive targets was impaired due to interference between the feedforward and feedback processes of multiple targets. Yet, the experiments leave open the question of which target was affected in a pair because in Experiments 1 and 2 we only asked participants to report the number of targets that they detected. To answer this question, we did a third experiment on 18 participants to determine which of the two targets the participants missed in two-target trials. In the third experiment, after each 12 Hz burst of two parallel image streams, participants entered the number of targets they saw, and then selected the category for each detected target from a shuffled list (canine, feline, bear, horse, deer, primate, bird, and don't know). In the analyses for this experiment, we calculated the percent correct without requiring that targets were identified in the correct order. 
As in the analysis of Experiment 1, EEG potentials and their corresponding beamforming source localizations in occipitotemporal channels were largest contralateral to the target location (see Figure 6C and D and the Appendix Movies A1 and A2). Although some studies have shown that performance is different for targets in the left side versus the right side (Śmigasiewicz et al., 2010; Śmigasiewicz, Asanowicz, Westphal, & Verleger, 2014; Goodbourn & Holcombe, 2015; Holcombe, Nguyen, & Goodbourn, 2017; Ransley, Goodbourn, Nguyen, Moustafa, & Holcombe, 2018), we did not find any significant differences in performance between left and right target sides in any condition for either counting or categorization (see Supplementary Figure S1). Accuracy for counting the animals was lower for Lag 1 same side trials than different side trials (see Figure 7A). Interestingly, although T2|T1 categorization accuracy (i.e., the ability to categorize T2 correctly given that T1 was categorized correctly) increased from 30% to 38% across the same/different manipulation, T1|T2 categorization accuracy increased more than twice as much, from 21% to 42% (Figure 7B). Accuracy for correctly categorizing both targets in Lag 1 trials increased from 11% of the time on same side trials to 19% of the time on different side trials (the level of chance for correctly categorizing both targets was 2.0%). Thus, reducing the expected neuronal signal interference associated with T1 and T2 by presenting the animals in separate visual hemifields improved the ability to report and categorize both targets, but more strongly improved the categorization of the first target (Figure 7B). In the majority of Lag 1 trials, subjects detected just one target (a mean of 73% of all Lag 1 same-side trials and a mean of 56% of all Lag 1 different-side trials [Figure 7C]). In this subset of Lag 1 trials, categorization accuracy strongly improved for the first target when the two consecutive targets were in different hemifields versus when they were in the same hemifield (from 33% to 48%, paired t test, p = 1.833e-08, Figure 7D). 
Figure 6
 
Experiment 3: two-stream target detection and categorization source localizations for correctly categorized T1-only trials (N = 18). Participants were told to fixate on a central fixation cross between the two streams. T1-only trials contained only one animal that appeared either in the left (A) or the right hemifield (B). (C) Beamforming neural source localizations at each indicated time for T1-only left trials. (D) Beamforming neural source localizations at each indicated time for T1-only right trials. Images in (A) and (B) Copyright © 2019 Jacob G. Martin and Corel and its licensors. All rights reserved.
Figure 6
 
Experiment 3: two-stream target detection and categorization source localizations for correctly categorized T1-only trials (N = 18). Participants were told to fixate on a central fixation cross between the two streams. T1-only trials contained only one animal that appeared either in the left (A) or the right hemifield (B). (C) Beamforming neural source localizations at each indicated time for T1-only left trials. (D) Beamforming neural source localizations at each indicated time for T1-only right trials. Images in (A) and (B) Copyright © 2019 Jacob G. Martin and Corel and its licensors. All rights reserved.
Figure 7
 
Experiment 3: behavioral results (N = 18). (A) Percent of the time each subject detected the correct number of targets in each condition. (B) Behavioral results for categorization of animals across different conditions. T1|T2 (respectively T2|T1) means that categorization accuracy was scored on T1 (T2) for trials in which T2 (T1) had been categorized correctly. T1&T2 means that participants categorized both targets correctly. Same sides and Different sides correspond to whether the two targets appeared in the same stream or not. Lag indicates when the second target was presented (Lag 1 is directly afterwards at T1+83 ms and Lag 7 is T1+583 ms). Results of paired t tests are indicated above each comparison (df = 17). (C) Frequency of responses in Lag 1 trials, separated into trials in which only 1 target was reported before the categorization quiz (Reported 1) and trials in which two targets were reported before the categorization quiz (Reported 2). (D) Behavioral categorization accuracy in Lag 1 trials, separated into trials in which only 1 target was reported before the categorization quiz (Reported 1) and trials in which two targets were reported before the categorization quiz (Reported 2). Categorization accuracy was calculated within each type of trial (e.g., the number of correctly categorized trials for T1 (or T2) out of all Lag 1 trials in which a participant reported two targets and T1 and T2 were presented on the same side). White bars denote the accuracy for the categorization of T1, and gray bars denote the accuracy for the categorization of T2. Error bars show 95% CI. Black dotted line is the chance level for a single categorization. Results of paired t tests are indicated above each comparison (df = 17).
Figure 7
 
Experiment 3: behavioral results (N = 18). (A) Percent of the time each subject detected the correct number of targets in each condition. (B) Behavioral results for categorization of animals across different conditions. T1|T2 (respectively T2|T1) means that categorization accuracy was scored on T1 (T2) for trials in which T2 (T1) had been categorized correctly. T1&T2 means that participants categorized both targets correctly. Same sides and Different sides correspond to whether the two targets appeared in the same stream or not. Lag indicates when the second target was presented (Lag 1 is directly afterwards at T1+83 ms and Lag 7 is T1+583 ms). Results of paired t tests are indicated above each comparison (df = 17). (C) Frequency of responses in Lag 1 trials, separated into trials in which only 1 target was reported before the categorization quiz (Reported 1) and trials in which two targets were reported before the categorization quiz (Reported 2). (D) Behavioral categorization accuracy in Lag 1 trials, separated into trials in which only 1 target was reported before the categorization quiz (Reported 1) and trials in which two targets were reported before the categorization quiz (Reported 2). Categorization accuracy was calculated within each type of trial (e.g., the number of correctly categorized trials for T1 (or T2) out of all Lag 1 trials in which a participant reported two targets and T1 and T2 were presented on the same side). White bars denote the accuracy for the categorization of T1, and gray bars denote the accuracy for the categorization of T2. Error bars show 95% CI. Black dotted line is the chance level for a single categorization. Results of paired t tests are indicated above each comparison (df = 17).
Figure 8
 
Experiment 3: classifiers trained on T1-only and tested on Lag-1 trials in which only one target was reported (N = 18). (A) Single-trial classification results obtained by training on T1-only trials (left and right separately) and testing on Lag 1 same side trials (LL and RR separately) showed differences for frontal (top row) and occipitotemporal channels (bottom row) only after the initial feedforward pass of T1. The legends at the left show the channels that were used for the frontal classifications (top row) and the occipitotemporal classifications (bottom row). The outline of the head is aligned to the level of the eyes. The detection performance (ROC area) of the T1-only based classifier differed depending on whether T1 was detected or T2 was detected starting at ∼200 ms in OT channels, whereas no significant difference was found between classifications before ∼200 ms (see Materials and methods).
Figure 8
 
Experiment 3: classifiers trained on T1-only and tested on Lag-1 trials in which only one target was reported (N = 18). (A) Single-trial classification results obtained by training on T1-only trials (left and right separately) and testing on Lag 1 same side trials (LL and RR separately) showed differences for frontal (top row) and occipitotemporal channels (bottom row) only after the initial feedforward pass of T1. The legends at the left show the channels that were used for the frontal classifications (top row) and the occipitotemporal classifications (bottom row). The outline of the head is aligned to the level of the eyes. The detection performance (ROC area) of the T1-only based classifier differed depending on whether T1 was detected or T2 was detected starting at ∼200 ms in OT channels, whereas no significant difference was found between classifications before ∼200 ms (see Materials and methods).
Figure 9
 
Classification of target category from EEG signals. (A) Experiment 1 (N = 14): Accuracies for classifying the target category based on the mean of the 60 ms of Occipital and Temporal EEG signals surrounding each timepoint (10–20 electrodes only). The results in the top plot show the mean classification accuracy (dark line) and standard error of the mean (light gray shaded areas) for each subject and timepoint over 30 iterations of 10-fold cross-validation. The bottom plot shows the results of a t test between subject performances and chance level at each time point. (B) Experiment 3 (N = 18): accuracies for classifying the target category based on the mean of the 60ms of Occipital and Temporal EEG signals surrounding each timepoint (10–20 electrodes only). Each classifier was trained and tested separately for T1-only Left and T1-only Right trials, and the results of the two were aggregated. The results in the top plot show the mean classification accuracy (dark line) and standard error of the mean (light gray shaded areas) for each subject and timepoint over 30 iterations of 10-fold cross-validation. The bottom plot shows the results of a t test between subject performances and chance level at each time point.
Figure 9
 
Classification of target category from EEG signals. (A) Experiment 1 (N = 14): Accuracies for classifying the target category based on the mean of the 60 ms of Occipital and Temporal EEG signals surrounding each timepoint (10–20 electrodes only). The results in the top plot show the mean classification accuracy (dark line) and standard error of the mean (light gray shaded areas) for each subject and timepoint over 30 iterations of 10-fold cross-validation. The bottom plot shows the results of a t test between subject performances and chance level at each time point. (B) Experiment 3 (N = 18): accuracies for classifying the target category based on the mean of the 60ms of Occipital and Temporal EEG signals surrounding each timepoint (10–20 electrodes only). Each classifier was trained and tested separately for T1-only Left and T1-only Right trials, and the results of the two were aggregated. The results in the top plot show the mean classification accuracy (dark line) and standard error of the mean (light gray shaded areas) for each subject and timepoint over 30 iterations of 10-fold cross-validation. The bottom plot shows the results of a t test between subject performances and chance level at each time point.
On the other hand, categorization accuracy of the second target correspondingly decreased (from 51% to 37%, paired t test, p = 1.117e-07). In Lag 1 trials in which both targets were detected, the categorization performance for T1 increased when the second target was in the opposite hemifield (from 62% to 78%, paired t test, p = 0.0005, df = 17), whereas T2 categorization performance remained the same when the two targets were in the same hemifield (79%, paired t test, p = 0.6659, df = 17). 
Eye movements towards a first target might be expected to be beneficial for the second same side target; however, this is the opposite of what was observed. Nevertheless, because eye movements could be a potential confound, we did another analysis on the electrooculography (EOG) channels that verified that eye movements were minimal during the streams. EOG potentials are known to have amplitudes that range from 5 microvolts/° to 20 microvolts/° in response to saccades (Constable et al., 2017; Duchowski, 2017). Given a Cz reference electrode, the polarity of the EOG signal will be positive in the electrode towards which the eye saccades. However, in our data for Experiment 3, after T1 onset, there was no difference in EOG potentials between left and right T1-only trials (Wilcoxon rank sum tests, p > 0.30, df = 17). There was also no difference between left and right EOG electrodes in Lag 1 same side and Lag 1 different side trials (Wilcoxon rank sum tests, p > 0.17, df = 17). Furthermore, the EOG potentials after target onset were all smaller than 26.9 microvolts. These analyses indicate that there were no large saccadic eye movements in response to animal onset. 
To obtain additional evidence for the hypothesis that the feedback period of T1 was critical for the ability to both detect and categorize T1, we separately trained frontal channel and occipital channel classifiers to detect animal targets based on single target trials; one on T1-left versus Distractor trials, and another one on T1-right versus Distractor trials (see Methods). Next, we tested these single-trial classifiers on both T1 and T2 targets in Lag 1 LL and Lag 1 RR trials, respectively. The results indicated significant differences in classifier performance in frontal channels for Lag 1 same side trials in which subjects only detected one target and correctly categorized T1 versus classifier performance on trials in which subjects detected only one target and correctly categorized T2. These differences in classification performance over frontal channels occurred only for the P300 signal (t test, p < 0.0001, df = 17; see Figure 8A, bottom row), but not during the initial frontal activity around 170–220 ms. In classifiers using contralateral posterior channels, there were differences in performance for the same comparison during the feedback portion of the signal, starting around 200 ms (t test, p < 0.00001, df = 17, see Figure 8A, top row). Thus, whether a participant correctly categorized T1 or T2 when they only detected one target was associated with significant differences in the performance of subject-specific contralateral T1-only trained classifiers in the re-entrant processing period of T1, but not during the initial feedforward processing period of T1. 
Finally, we investigated the possibility that the feedback signals at 230 ms contained information about the category of the animal, and not only information about whether the subject detected the target. For Experiment 1, we built ensemble category classifiers for each subject individually based on the mean of the 60 ms of signals surrounding every time point using the Random Undersampling Boosting ensemble machine learning method with linear discriminant learners (RUSBoost in MATLAB; see Methods). Because there were over 30 types of animal categories in Experiment 1, some were more underrepresented than others. Therefore, we limited the classification to trials which had the categories with the majority representations (felines, birds, canines, and deer). We found that there was indeed a small, but significantly above chance, cross-validated classification accuracy when classifying which particular animal category appeared in T1-only trials at 230 ms (see Figure 9A). For Experiment 3, we trained classifiers on all seven of the animal categories that appeared in the experiment because their frequencies were approximately equal (e.g., canine, feline, bear, horse, deer, primate, or bird). Each subject-specific classifier was cross-validated at each time point on T1-only left-side and T1-only right-side trials. In order to obtain more training trials, we also included Lag 7 same and different trials (whose T1 signals could not have been affected by T2 which appeared 581 ms after T1). Even though there were not many training examples per trial (about 100 per category for each subject), we still found above-chance classification for the category of the animal at 230 ms (see Figure 9B). 
Discussion
The human visual system's ability to detect targets rapidly in complex natural scenes is crucial to its success in rapid serial visual presentation (RSVP) paradigms. Based on Hubel and Wiesel's seminal works (1962, 1968), this ability has been posited to rely on a feedforward processing hierarchy that starts in occipitotemporal cortex and travels to target detection circuits in frontal cortex (Riesenhuber & Poggio, 2002; VanRullen & Thorpe, 2002; Serre et al., 2007; S. J. Thorpe & Fabre-Thorpe, 2001). After this initial feedforward period, a wave of feedback into early visual areas has been shown to be key for the successful detection of the target (Lamme & Roelfsema, 2000; Boehler et al., 2008; Muggleton et al., 2011; Camprodon et al., 2012; Edelman & Gally, 2013; Koivisto et al., 2016; Fahrenfort et al., 2017). Our results show how feedforward and feedback signals associated with multiple targets can interfere and affect the conscious perception of both targets. 
In more detail, we conducted a series of behavioral and EEG experiments in which subjects detected animals that were embedded in natural scenes amongst distractor images (natural scenes and buildings). The images were presented both serially and rapidly (at a rate of 12 images per second) in various configurations. These single-task paradigms, in which subjects performed the same task on all stimuli, allowed us to focus on the ability of subjects to detect and categorize two animal targets in the absence of additional factors such as task switching (Kawahara, Zuvic, Enns, & Di Lollo, 2003). In Experiment 1, successful target detection was associated with frontal areas triggering a Granger-caused wave of feedback into occipitotemporal visual areas. The neural timing of feedback signals in our experiments is supported by human EEG studies on the conscious awareness of visual stimuli which have likewise reported 200–280 ms target-related reactivations in visual areas (Del Cul et al., 2007; Koivisto et al., 2016; Fahrenfort et al., 2017). Furthermore, when the expected amount of interference between feedforward and feedback signals between two targets was reduced by displaying the targets in different hemifields, participants had an improved detection and categorization behavioral performance relative to trials in which the feedforward and feedback portions of the two targets interfered (Experiments 1, 2, and 3). That is, interference between this posterior feedback and the ipsilateral feedforward signals triggered by subsequent targets was associated with a perceptual crash that reduced the brain's ability to detect both targets rapidly. While both targets benefited from this reduction in interference, T1 showed the most benefit. This result provides evidence that the top-down feedback activity related to T1 that was important for detection and categorization of T1 was interfered with by the feedforward activity related to T2 when the two targets were presented in the same hemifield. 
The experiments not only support the importance of feedback for detection, but also show how feedback is part of the categorization process. First of all, detection and categorization were clearly disassociated in Experiment 3 because there were instances in which subjects detected both targets, and yet could only categorize one target correctly (T2 was more often correctly categorized compared to T1). That is, even when subjects detected both targets, categorization performance for the first target was significantly lower within the high interference, Lag 1 same-side, conditions (see Figure 7D). Furthermore, we were able to build above-chance single-trial classifiers for the category of each target using the EEG signals corresponding to the feedback period around 230ms in both Experiments 1 and 3. Taken together, these results indicate that the feedback portions were not only important for detecting the target, but also for categorizing the target. 
The interplay between feedforward and feedback signals in the visual processing hierarchy may also be relevant factors in the Attentional Blink (AB) and Backward Masking (BM) phenomena. Previous studies have shown that detection is often impaired when two successive visual targets are presented within about 400 ms of each other in the midst of a stream of task irrelevant stimuli. This fundamental limitation of human vision, the “Attentional Blink,” has been postulated to arise from attentional or working memory bottlenecks in parietal and frontal cortex (Broadbent & Broadbent, 1987; Raymond et al., 1992; S. J. Luck, Vogel, & Shapiro, 1996; Shapiro et al., 1997; Marois & Ivanoff, 2005; Nieuwenhuis et al., 2005; Hommel et al., 2006; Del Cul et al., 2007; Dux & Marois, 2009; Martens & Wyble, 2010; Marti & Dehaene, 2017), and subcortical structures (Nieuwenhuis et al., 2005; Colzato, Slagter, & Rover, 2009). Our results show that avoiding the “crash” between the feedback activity from T1 and the feedforward activity from T2 can mitigate “Attentional Blink”-like effects, but does not completely abolish them. For instance, in Experiment 3 performance at Lag 1 did not recover completely by going from Lag 1-same side to Lag 1-different sides to the levels of performance at Lag 7 (see Figure 7B). This demonstrates that other types of bottleneck effects may still be relevant. On the other hand, several studies of the neural basis of the AB are in agreement with our findings, including work showing the involvement of striate (Williams et al., 2008) and extrastriate cortex (Marois, Yi, & Chun, 2004; Dell'Acqua, Sessa, Jolicoeur, & Robitaille, 2006). In addition, some theories have been proposed that include a role for feedback in the blink (Wyble, Bowman, & Nieuwenstein, 2009). 
A related phenomenon to the Attentional Blink is called Backward Masking (B. G. Breitmeyer, 1984; Vorberg et al., 2003; B. Breitmeyer, 2007), in which perception of a briefly presented first stimulus is affected by a rapidly following second stimulus. Attentional Blink and Backward Masking studies have used a vast diversity of different combinations of visual targets (e.g., letters, numbers, etc.) and paradigms (e.g., with or without task switching, with or without blanks between stimuli). Thus, it is hard to juxtapose the results of our RSVP animal detection and categorization studies (which did not use blanks between stimuli, did not find lag 1 sparing, did not use letters or numbers, and only had one target category) with those results from more complex Attentional Blink or Backward Masking paradigms that might involve additional cognitive processes. For example, a dual stream Attentional Blink study that used red characters from different languages as T1 (Chinese, Hebrew, German) and a black number for T2, did not find an Attentional Blink at Lag 1 for T1, but did find an Attentional Blink for T2 given that T1 was detected and T2 was in a different position than T1 (Śmigasiewicz et al., 2010). On the other hand, another study showed that when targets (numbers) appeared in multiple streams of distractors (characters), that T2 was missed more often when it was presented closer to T1 in the visual field (Kristjánsson & Nakayama, 2002). Another dual stream study that used number targets embedded in streams of distractor letters showed a right visual field advantage for T2, whereas our study found no significant differences in performance between hemifields for either the categorization or the detection of T1 or T2 (see Supplementary Figure S1; Bergerbest, Shilkrot, Joseph, & Salti, 2017). Nevertheless, our work provides a high-level theoretical framework for natural images that makes several predictions that can be tested in future experiments. 
Conclusion
Hubel and Wiesel showed that vision is accomplished in part by a simple-to-complex feedforward processing pass starting in visual cortex. More recently, re-entrant activity has been shown to be crucial for both learning and awareness, and has inspired deep learning frameworks that use feedback to optimize processing in low-level areas for specific tasks. However, there is a price to pay for such a framework. Our study shows that interference between “bottom-up” and “top-down” signals can impair the ability to process two images in rapid succession. Thus, multiplexing different cognitive operations along the visual hierarchy (e.g., object processing, awareness, and possibly learning) can create interference bottlenecks that limit object detection and categorization abilities. Yet, our results also show ways to improve cognitive abilities by mitigating these bottlenecks. For example, by placing the feedback processing of the first target on a separate cortical track from the feedforward processing of the second target, detection and identification behavior for both the first and the second targets was substantially improved, with the first target gaining the most benefit. These findings open a door to further studies that expand our growing understanding of the complex neural dynamics underlying human vision. 
Acknowledgments
We thank Dr. Simon Thorpe for advice on experimental design and Dr. Ramakrishna Chakravarthi for his comments on the manuscript. 
This research was supported in part by an NSF CAREER award (#0449743). 
JGM and MR conceived the hypotheses and experimental protocols. JGM programmed the experiments, ran human participants, analyzed, processed, and prepared all figures and data. JGM and MR wrote the main manuscript text. PHC assisted with data collection and helped edit the paper. CAS assisted with data collection and analysis. 
Commercial relationships: JGM is on several patents, none of which are related to this paper: PCT Patent PCT/EP2017/079767; PCT Patent EP16306525; US Patent 12/632,062; US Patent 7,630,992. 
Corresponding author: Jacob G. Martin. 
Address: Georgetown University Medical Center, Department of Neuroscience, Washington, DC, USA. 
References
Ahissar, M., & Hochstein, S. (2004). The reverse hierarchy theory of visual perceptual learning. Trends in Cognitive Sciences, 8 (10), 457–464, https://doi.org/10.1016/j.tics.2004.08.011.
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics, 29 (4), 1165–1188, https://doi.org/10.1214/aos/1013699998.
Bergerbest, D., Shilkrot, O., Joseph, M., & Salti, M. (2017). Right visual-field advantage in the attentional blink: Asymmetry in attentional gating across time and space. Attention, Perception, and Psychophysics, 79 (7), 1979–1992, https://doi.org/10.3758/s13414-017-1356-z.
Boehler, C. N., Schoenfeld, M. A., Heinze, H.-J., & Hopf, J.-M. (2008). Rapid recurrent processing gates awareness in primary visual cortex. Proceedings of the National Academy of Sciences, USA, 105 (25), 8742–8747, https://doi.org/10.1073/pnas.0801999105.
Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10 (4), 433–436. Available from http://www.ncbi.nlm.nih.gov/pubmed/9176952
Breitmeyer, B. G. (1984). Visual masking: An integrative approach. Oxford, UK: Oxford University Press.
Breitmeyer, B. G. (2007). Visual masking: Past accomplishments, present status, future developments. Advances in Cognitive Psychology, 3 (1–2), 9–20, https://doi.org/10.2478/v10053-008-0010-7.
Broadbent, D. E., & Broadbent, M. H. (1987). From detection to identification: Response to multiple targets in rapid serial visual presentation. Perception and Psychophysics, 42 (2), 105–113. Available from http://www.ncbi.nlm.nih.gov/pubmed/3627930
Camprodon, J., Zohary, E., Brodbeck, V., & Pascual-Leone, A. (2012). Two phases of V1 activity for visual recognition of natural images. Journal of Cognitive Neuroscience, 22 (6), 1262–1269, https://doi.org/10.1162/jocn.2009.21253.
Cohen, M. X., Bour, L., Mantione, M., Figee, M., Vink, M., Tijssen, M. A.,… Denys, D. (2012). Top-down-directed synchrony from medial frontal cortex to nucleus accumbens during reward anticipation. Human Brain Mapping, 33 (1), 246–252, https://doi.org/10.1002/hbm.21195.
Colzato, L. S., Slagter, H. A., de Rover, M. & Hommel, B. (2011). Dopamine and the management of attentional resources: Genetic markers of striatal D2 dopamine predict individual differences in the attentional blink. Journal of Cognitive Neuroscience, 23 (11), 3576–3585. Available from http://bernhard-hommel.eu/Colzato_AB_genetic.pdf
Constable, P. A., Bach, M., Frishman, L. J., Jeffrey, B. G., Robson, A. G., & International Society for Clinical Electrophysiology of Vision. (2017). ISCEV standard for clinical electro-oculography (2017 update). Documenta Ophthalmologica. Advances in Ophthalmology, 134 (1), 1–9, https://doi.org/10.1007/s10633-017-9573-2.
Cui, J., Xu, L., Bressler, S. L., Ding, M., & Liang, H. (2008). BSMART: A Matlab/C toolbox for analysis of multichannel neural time series. Neural Networks, 21 (8), 1094–1104, https://doi.org/10.1016/j.neunet.2008.05.007.
De Pasquale, R., & Sherman, S. M. (2013). A modulatory effect of the feedback from higher visual areas to V1 in the mouse. Journal of Neurophysiology, 109 (10), 2618–2631, https://doi.org/10.1152/jn.01083.2012.
Del Cul, A., Baillet, S., & Dehaene, S. (2007). Brain dynamics underlying the nonlinear threshold for access to consciousness. PLoS Biology, 5 (10), e260, https://doi.org/10.1371/journal.pbio.0050260.
Dell'Acqua, R., Sessa, P., Jolicœur, P., & Robitaille, N. (2006). Spatial attention freezes during the attention blink. Psychophysiology, 43 (4), 394–400, https://doi.org/10.1111/j.1469-8986.2006.00411.x.
Delorme, A., & Makeig, S. (2004). EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods, 134, 9–21, https://doi.org/10.1016/j.jneumeth.2003.10.009.
Duchowski, A. T. (2017). Eye tracking methodology. Cham, Switzerland: Springer. Available from http://link.springer.com/10.1007/978-3-319-57883-5
Dux, P. E., & Marois, R. (2009). The attentional blink: A review of data and theory. Attention, Perception and Psychophysics, 71 (8), 1683–1700, https://doi.org/10.3758/APP.71.8.1683.
Edelman, G. M., & Gally, J. A. (2013). Reentry: A key mechanism for integration of brain function. Frontiers in Integrative Neuroscience, 7 (August), 1–6, https://doi.org/10.3389/fnint.2013.00063.
Evans, K. K., & Treisman, A. (2005). Perception of objects in natural scenes: Is it really attention free? Journal of Experimental Psychology. Human Perception and Performance, 31 (6), 1476–1492, https://doi.org/10.1037/0096-1523.31.6.1476.
Fahrenfort, J. J., Leeuwen, J. van., Olivers, C. N. L., & Hogendoorn, H. (2017). Perceptual integration without conscious access. Proceedings of the National Academy of Sciences, USA, 114 (14), 3744–3749, https://doi.org/10.1073/pnas.1617268114.
Fahrenfort, J. J., Scholte, H. S., & Lamme, V. A. F. (2007). Masking disrupts reentrant processing in human visual cortex. Journal of Cognitive Neuroscience, 19 (9), 1488–1497, https://doi.org/10.1162/jocn.2007.19.9.1488.
Freedman, D. J., Riesenhuber, M., Poggio, T., & Miller, E. K. (2003). A comparison of primate prefrontal and inferior temporal cortices during visual categorization. Journal of Neuroscience, 23 (12), 5235–5246. Available from http://www.ncbi.nlm.nih.gov/pubmed/12832548
Gilbert, C. D., & Li, W. (2012). Adult visual cortical plasticity. Neuron, 75 (2), 250–264, https://doi.org/10.1016/j.neuron.2012.06.030.
Goodbourn, P., & Holcombe, A. (2015). “Pseudoextinction”: Asymmetries in simultaneous attentional selection. Journal of Experimental Psychology. Human Perception and Performance, 41 (2), 364–384, https://doi.org/10.1037/a0038734.
Granger, C. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37 (3), 424–438.
Gregoriou, G. G., Gotts, S. J., Zhou, H., & Desimone, R. (2009, May 29). High-frequency, long-range coupling between prefrontal and visual cortex during attention. Science, 324 (5931), 1207–1210, https://doi.org/10.1126/science.1171402.
Groppe, D. M. (2011). Mass univariate analysis of event-related brain potentials/fields I: A critical tutorial review. Psychophysiology, 48 (12), 1711–1725, https://doi.org/10.1111/j.1469-8986.2011.01273.x.Mass.
Grossberg, S. (1999). The link between brain learning, attention, and consciousness. Consciousness and Cognition, 8 (1), 1–44, https://doi.org/10.1006/ccog.1998.0372.
He, K., Zhang, X., Ren, S., & Sun, J. (2014). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Washington, DC: IEEE Computer Society. https://doi.org/10.1.1.725.4861.
Holcombe, A. O., Nguyen, E. H., & Goodbourn, P. T. (2017). Implied reading direction and prioritization of letter encoding. Journal of Experimental Psychology: General, 146 (10), 1420–1437, https://doi.org/10.1037/xge0000357.
Hommel, B., Kessler, K., Schmitz, F., Gross, J., Akyürek, E., Shapiro, K., & Schnitzler, A. (2006). How the brain blinks: Towards a neurocognitive model of the attentional blink. Psychological Research, 70 (6), 425–435, https://doi.org/10.1007/s00426-005-0009-3.
Hong, H., Yamins, D. L. K., Majaj, N. J., & DiCarlo, J. J. (2016). Explicit information for category-orthogonal object properties increases along the ventral stream. Nature Neuroscience, 19 (4), 613–622, https://doi.org/10.1038/nn.4247.
Hubel, D., & Wiesel, T. (1962). Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of Physiology, 160, 106–154.
Hubel, D., & Wiesel, T. (1968). Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology, 195 (1), 215–243.
Kawahara, J. I., Zuvic, S. M., Enns, J. T., & Di Lollo, V. (2003). Task switching mediates the attentional blink even without backward masking. Perception and Psychophysics, 65 (3), 339–351, https://doi.org/10.3758/BF03194565.
Koivisto, M., Railo, H., Revonsuo, A., Vanni, S., & Salminen-Vaparanta, N. (2011). Recurrent processing in V1/V2 contributes to categorization of natural scenes. Journal of Neuroscience, 31 (7), 2488–2492, https://doi.org/10.1523/JNEUROSCI.3074-10.2011.
Koivisto, M., Salminen-Vaparanta, N., Grassini, S., & Revonsuo, A. (2016). Subjective visual awareness emerges prior to P3. European Journal of Neuroscience, 43 (12), 1601–1611, https://doi.org/10.1111/ejn.13264.
Kristjánsson, Á., & Nakayama, K. (2002). The attentional blink in space and time. Vision Research, 42 (17), 2039–2050, https://doi.org/10.1016/S0042-6989(02)00129-3.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems (pp. 1097–1105).
Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends in Neurosciences, 23 (11), 571–579, https://doi.org/10.1016/S0166-2236(00)01657-X.
Lamme, V. A. F. (2000). Neural mechanisms of visual awareness: A linking proposition. Brain and Mind, 1 (3), 385–406, https://doi.org/10.1023/A:1011569019782.
Larkum, M. E., Senn, W., & Lüscher, H. R. (2004). Top-down dendritic input increases the gain of layer 5 pyramidal neurons. Cerebral Cortex, 14 (10), 1059–1070, https://doi.org/10.1093/cercor/bhh065.
Li, W., Piëch, V., & Gilbert, C. D. (2008). Learning to link visual contours. Neuron, 57 (3), 442–451, https://doi.org/10.1016/j.neuron.2007.12.011.
Liu, H., Agam, Y., Madsen, J. R., & Kreiman, G. (2009). Timing, timing, timing: Fast decoding of object information from intracranial field potentials in human visual cortex. Neuron, 62 (2), 281–290. Available from http://www.ncbi.nlm.nih.gov/pubmed/19409272, https://doi.org/10.1016/j.neuron.2009.02.025
Luck, S. J. (2014). An introduction to the event-related potential technique (2nd ed.). Boston, MA: MIT Press.
Luck, S. J., Vogel, E. K., & Shapiro, K. L. (1996, October 17). Word meanings can be accessed but not reported during the attentional blink. Nature, 383 (6601), 616–618, https://doi.org/10.1038/383616a0.
Macknik, S. L., & Martinez-conde, S. (2007). The role of feedback in visual masking and visual processing. Advances in Cognitive Psychology, 3 (1), 125–152, https://doi.org/10.2478/v10053-008-0020-5.
Marois, R. (2005). Two-timing attention. Nature Neuroscience, 8 (10), 1285–1286, https://doi.org/10.1038/nn1005-1285.
Marois, R., & Ivanoff, J. (2005). Capacity limits of information processing in the brain. Trends in Cognitive Sciences, 9 (6), 296–305, https://doi.org/10.1016/j.tics.2005.04.010.
Marois, R., Yi, D.-J., & Chun, M. M. (2004). The neural fate of consciously perceived and missed events in the attentional blink. Neuron, 41, 465–472.
Martens, S., & Wyble, B. (2010). The attentional blink: Past, present, and future of a blind spot in perceptual awareness. Neuroscience and Biobehavioral Reviews, 34 (6), 947–957, https://doi.org/10.1016/j.neubiorev.2009.12.005.
Marti, S., & Dehaene, S. (2017). Discrete and continuous mechanisms of temporal selection in rapid visual streams. Nature Communications, 8 (1): 1955, https://doi.org/10.1038/s41467-017-02079-x.
Meyer, K. (2012, January 27). Another remembered present. Science, 335 (6067), 415–416, https://doi.org/10.1126/science.1214652.
Muggleton, N. G., Banissy, M. J., & Walsh, V. Z. (2011). Cognitive neuroscience: Feedback for natural visual stimuli. Current Biology: CB, 21 (8), 282–283, https://doi.org/10.1016/j.cub.2011.03.024.
Nieuwenhuis, S., Holmes, B. D., Gilzenrat, M. S., & Cohen, J. D. (2005). The role of the locus coeruleus in mediating the attentional blink: A neurocomputational theory. Journal of Experimental Psychology: General, 134 (3), 291–307, https://doi.org/10.1037/0096-3445.134.3.291.
Pelli, D. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10 (4), 437–442. Available from http://www.ingentaconnect.com/content/vsp/spv/1997/00000010/00000004/art00016
Perrett, D. I. (2001). The speed of sight. Journal of Cognitive Neuroscience, 13 (1), 90–101.
Pollen, D. (1999). On the neural correlates of visual perception. Cerebral Cortex, 9 (1), 4–19, https://doi.org/10.1093/cercor/9.1.4.
Potter, M. (1975, March 14). Meaning in visual search. Science, 187 (4180), 965–966. Available from http://www.sciencemag.org/content/187/4180/965.short
Ransley, K., Goodbourn, P. T., Nguyen, E. H., Moustafa, A. A., & Holcombe, A. O. (2018). Reading direction influences lateral biases in letter processing. Journal of Experimental Psychology: Learning Memory and Cognition, 44 (10), 1678–1686, https://doi.org/10.1037/xlm0000540.
Raymond, J. E., Shapiro, K. L., & Arnell, K. M. (1992). Temporary suppression of visual processing in an RSVP task: An attentional blink? Journal of Experimental Psychology. Human Perception and Performance, 18 (3), 849–860. Available from http://www.ncbi.nlm.nih.gov/pubmed/1500880
Riesenhuber, M., & Poggio, T. (2000). Models of object recognition. Nature Neuroscience, 3, 1199–1204, https://doi.org/10.1038/81479.
Riesenhuber, M., & Poggio, T. (2002). Neural mechanisms of object recognition. Current Opinion in Neurobiology, 12 (2), 162–168. Available from http://www.ncbi.nlm.nih.gov/pubmed/12015232
Rousselet, G. A., Fabre-Thorpe, M., & Thorpe, S. J. (2002). Parallel processing in high-level categorization of natural images. Nature Neuroscience, 5 (7), 629–30, https://doi.org/10.1038/nn866.
Rousselet, G. A., Thorpe, S. J., & Fabre-Thorpe, M. (2004). Processing of one, two or four natural scenes in humans: The limits of parallelism. Vision Research, 44 (9), 877–894, https://doi.org/10.1016/j.visres.2003.11.014.
Seitz, A. R., & Dinse, H. R. (2007). A common framework for perceptual learning. Current Opinion in Neurobiology, 17 (2), 148–153, https://doi.org/10.1016/j.conb.2007.02.004.
Serre, T., Oliva, A., & Poggio, T. (2007). A feedforward architecture accounts for rapid categorization. Proceedings of the National Academy of Sciences, USA, 104 (15), 6424–6429, https://doi.org/10.1073/pnas.0700622104.
Seth, A. K., Barrett, A. B., & Barnett, L. (2015). Granger causality analysis in neuroscience and neuroimaging. The Journal of Neuroscience, 35 (8), 3293–3297, https://doi.org/10.1523/JNEUROSCI.4399-14.2015.
Shapiro, K., Raymond, J., & Arnell, K. (1997). The attentional blink. Trends in Cognitive Sciences, 1 (8), 291–296, https://doi.org/10.1016/S1364-6613(97)01094-2.
Śmigasiewicz, K., Asanowicz, D., Westphal, N., & Verleger, R. (2014). Bias for the left visual field in rapid serial visual presentation: Effects of additional salient cues suggest a critical role of attention. Journal of Cognitive Neuroscience, 27 (2), 266–279, https://doi.org/10.1162/jocn_a_00714.
Śmigasiewicz, K., Shalgi, S., Hsieh, S., Möller, F., Jaffe, S., Chang, C. C., & Verleger, R. (2010). Left visual-field advantage in the dual-stream RSVP task and reading-direction: A study in three nations. Neuropsychologia, 48 (10), 2852–2860, https://doi.org/10.1016/j.neuropsychologia.2010.05.027.
Stokes, M., Thompson, R., Cusack, R., & Duncan, J. (2009). Top-down activation of shape-specific population codes in visual cortex during mental imagery. Journal of Neuroscience, 29 (5), 1565–1572, https://doi.org/10.1523/JNEUROSCI.4657-08.2009.
Tadel, F., Baillet, S., Mosher, J. C., Pantazis, D., & Leahy, R. M. (2011). Brainstorm: A user-friendly application for MEG/EEG analysis. Computational Intelligence and Neuroscience, 2011: 879716, https://doi.org/10.1155/2011/879716.
Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. 2014 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1701–1708). Washington, DC: IEEE Computer Society. https://doi.org/10.1109/CVPR.2014.220.
Thorpe, S., Fize, D., & Marlot, C. (1996, June 6). Speed of processing in the human visual system. Nature, 381 (6582), 520–522, https://doi.org/10.1038/381520a0.
Thorpe, S. J., & Fabre-Thorpe, M. (2001, January 12). Neuroscience. Seeking categories in the brain. Science, 291 (5502), 260–263.
Ungerleider, L. G., & Haxby, J. V. (1994). ‘What' and ‘where' in the human brain. Current Opinion in Neurobiology, 4 (2), 157–165, https://doi.org/10.1016/0959-4388(94)90066-3.
Van Veen, B. D., van Drongelen, W., Yuchtman, M., & Suzuki, A. (1997). Localization of brain electrical activity via linearly constrained minimum variance spatial filtering. IEEE Transactions on Biomedical Engineering, 44 (9), 867–880.
VanRullen, R., & Thorpe, S. J. (2001). The time course of visual processing: From early perception to decision-making. Journal of Cognitive Neuroscience, 13 (4), 454–461. Available from http://www.ncbi.nlm.nih.gov/pubmed/11388919
VanRullen, R., & Thorpe, S. J. (2002). Surfing a spike wave down the ventral stream. Vision Research, 42 (23), 2593–2615. Available from http://www.ncbi.nlm.nih.gov/pubmed/12446033
Vorberg, D., Mattler, U., Heinecke, A., Schmidt, T., & Schwarzbach, J. (2003). Different time courses for visual perception and action priming. Proceedings of the National Academy of Sciences, USA, 100 (10), 6275–6280, https://doi.org/10.1073/pnas.0931489100.
Wang, P., & Nikolić, D. (2011). An LCD monitor with sufficiently precise timing for research in vision. Frontiers in Human Neuroscience, 5: 85, https://doi.org/10.3389/fnhum.2011.00085.
Williams, M. A., Baker, C. I., de Beeck, H. P. O., Shim, W. M., Dang, S., Triantafyllou, C., & Kanwisher, N. (2008). Feedback of visual object information to foveal retinotopic cortex. Nature Neuroscience, 11 (12), 1439–1445, https://doi.org/10.1038/nn.2218.
Wyble, B., Bowman, H., & Nieuwenstein, M. (2009). The attentional blink provides episodic distinctiveness: Sparing at a cost. Journal of Experimental Psychology. Human Perception and Performance, 35 (3), 787–807, https://doi.org/10.1037/a0013902.
Appendix
 
Movie A1
 
Experiment 1: Beamforming source localization for correct T1-only trials (N = 14). Target onset was at 0 ms. The movie is also available as Supplementary Movie S1 and at https://www.youtube.com/watch?v=_HfnHd3EIEM.
 
Movie A2
 
Experiment 3: The left (right) column of the movie corresponds to correct T1-only left (right) trials, respectively. Target onset was at 0 ms. The movie is also available as Supplementary Movie S2 at https://www.youtube.com/watch?v=nMcMf8pVwHw.
Figure 1
 
Experiment 1: Paradigm and detection performance (N = 14). (A) Illustration of the experimental paradigm. Participants counted instances of animal target images amongst distractor images in RSVP streams presented at 12 Hz. (B) Participant accuracies for the number of targets detected within each condition. Chance level was 33% (response choices were zero animals, one animal, and two animals, and each correct response choice was equally likely). “No targets” refers to trials without target images, and T1-only refers to trials containing only a single target image. Lag trials are those in which we presented two target images, wherein the second target appeared at the specified lag in time from the first (e.g., Lag 1 had no intervening distractors, Lag 2 trials had one intervening distractor, and so on). Error bars denote the 95% CI of the means in each condition. Target and distractor images in (A) Copyright © 2019 Jacob G. Martin and Corel and its licensors. All rights reserved.
Figure 1
 
Experiment 1: Paradigm and detection performance (N = 14). (A) Illustration of the experimental paradigm. Participants counted instances of animal target images amongst distractor images in RSVP streams presented at 12 Hz. (B) Participant accuracies for the number of targets detected within each condition. Chance level was 33% (response choices were zero animals, one animal, and two animals, and each correct response choice was equally likely). “No targets” refers to trials without target images, and T1-only refers to trials containing only a single target image. Lag trials are those in which we presented two target images, wherein the second target appeared at the specified lag in time from the first (e.g., Lag 1 had no intervening distractors, Lag 2 trials had one intervening distractor, and so on). Error bars denote the 95% CI of the means in each condition. Target and distractor images in (A) Copyright © 2019 Jacob G. Martin and Corel and its licensors. All rights reserved.
Figure 2
 
Experiment 1: source estimations, electrophysiological responses, and Granger causality for correct single target trials (N = 14). Time 0 corresponds to the target onset. (A) Pseudoneural activity indices obtained with beamforming signal source estimation for correct T1-only trials centered at the indicated time points. (B) EEG voltages for correct T1-only trials. The legend at the far left indicates the electrode sets for each group of potentials (frontal, temporal, parietal, and occipital) and each row in the colored plot to the right represents a single channel from that group. The circular outline of the head is aligned to the level of the eyes. All group-averaged T1-only responses were truncated to zero (shown in white) when a paired-sample t test over all participants between T1-only and Distractor-only participant averaged voltages indicated an FDR corrected p value greater than 0.05. (C) Average and standard error of the mean of the ratio between the single trial Granger causality for correct T1-only trials and the single trial Granger causality in correct Distractor trials (“Causality Gain”) at the centers of a sliding 40 ms temporal window. OT (Occipitotemporal) and Frontal refer to the respective groups of channels in the 10–20 system. Each causality gain value at each particular time point represents the average of the magnitude of the relative single trial Granger causality increases from Distractor-only to T1-only trials for all pairs of channels between the indicated channel groups. The black line indicates the average Granger causality ratio for the OT channel to frontal channel influence, and the gray line shows the Granger causality value ratio from frontal to OT channels. Shaded areas indicate the standard error of the mean of the causality ratio over all participants. The colored horizontal bars below the curve indicate the time intervals in which a paired sample t test between the average target and distractor Granger causality values across single participants resulted in p values below 0.01 for the condition corresponding to the respective color.
Figure 2
 
Experiment 1: source estimations, electrophysiological responses, and Granger causality for correct single target trials (N = 14). Time 0 corresponds to the target onset. (A) Pseudoneural activity indices obtained with beamforming signal source estimation for correct T1-only trials centered at the indicated time points. (B) EEG voltages for correct T1-only trials. The legend at the far left indicates the electrode sets for each group of potentials (frontal, temporal, parietal, and occipital) and each row in the colored plot to the right represents a single channel from that group. The circular outline of the head is aligned to the level of the eyes. All group-averaged T1-only responses were truncated to zero (shown in white) when a paired-sample t test over all participants between T1-only and Distractor-only participant averaged voltages indicated an FDR corrected p value greater than 0.05. (C) Average and standard error of the mean of the ratio between the single trial Granger causality for correct T1-only trials and the single trial Granger causality in correct Distractor trials (“Causality Gain”) at the centers of a sliding 40 ms temporal window. OT (Occipitotemporal) and Frontal refer to the respective groups of channels in the 10–20 system. Each causality gain value at each particular time point represents the average of the magnitude of the relative single trial Granger causality increases from Distractor-only to T1-only trials for all pairs of channels between the indicated channel groups. The black line indicates the average Granger causality ratio for the OT channel to frontal channel influence, and the gray line shows the Granger causality value ratio from frontal to OT channels. Shaded areas indicate the standard error of the mean of the causality ratio over all participants. The colored horizontal bars below the curve indicate the time intervals in which a paired sample t test between the average target and distractor Granger causality values across single participants resulted in p values below 0.01 for the condition corresponding to the respective color.
Figure 3
 
Experiment 1: source estimations for correct single target trials (N = 14). Time 0 corresponds to the target onset. Pseudoneural activity indices were obtained with beamforming signal source estimations for correct T1-only trials centered at the indicated time points.
Figure 3
 
Experiment 1: source estimations for correct single target trials (N = 14). Time 0 corresponds to the target onset. Pseudoneural activity indices were obtained with beamforming signal source estimations for correct T1-only trials centered at the indicated time points.
Figure 4
 
Experiment 1: average EEG voltages within each type of two-target “Lag” trial (N = 14). Color range for voltages corresponds to the color bar shown at the far right. Animal 1 onset (T1) is at 0 ms and Animal 2 onset (T2) is marked with an appropriately shifted vertical black line for each Lag. Channels are grouped according to the electrode groupings shown in the legend at the far left (from top: Frontal, Temporal, Parietal, Occipital). The circular outline of the head is aligned to the level of the eyes. There are two clear repetitions of the same target “template” at Lag 8 which slowly mix together when approaching Lag 1. For clarity, voltage averages that were between −0.5 and 0.5 microvolts were truncated to zero.
Figure 4
 
Experiment 1: average EEG voltages within each type of two-target “Lag” trial (N = 14). Color range for voltages corresponds to the color bar shown at the far right. Animal 1 onset (T1) is at 0 ms and Animal 2 onset (T2) is marked with an appropriately shifted vertical black line for each Lag. Channels are grouped according to the electrode groupings shown in the legend at the far left (from top: Frontal, Temporal, Parietal, Occipital). The circular outline of the head is aligned to the level of the eyes. There are two clear repetitions of the same target “template” at Lag 8 which slowly mix together when approaching Lag 1. For clarity, voltage averages that were between −0.5 and 0.5 microvolts were truncated to zero.
Figure 5
 
Reducing interference improved detection performance. (A) Experiment 1: average EEG difference of absolute values of voltage in each channel at 230 ms (left plot) and significance of differences (right plot) in each channel for correct T1-only trials between targets that were entirely contained within the left third of the image and targets that were entirely contained in the right third of the image (t test, df = 13, displayed as the negative base-10 logarithm of the p value, so that a value of 2 corresponds to p = 0.01). (B) Experiment 2: an example Lag 1 trial in Experiment 2, in which six participants performed target detection on two simultaneously presented side-by-side image streams while fixating on a central fixation cross. The example shown is a different sides trial in which the two consecutive targets appeared on different sides. (C) Experiment 2: an example Lag 1 same sides trial in which the two targets appeared on the same side. (D) Detection accuracy in Experiment 2. Error bars denote the standard error of the mean (N = 6) and the p values are from a paired t test (df = 5). Images in (B) and (C) are Copyright © 2019 Jacob G. Martin and Corel and its licensors. All rights reserved.
Figure 5
 
Reducing interference improved detection performance. (A) Experiment 1: average EEG difference of absolute values of voltage in each channel at 230 ms (left plot) and significance of differences (right plot) in each channel for correct T1-only trials between targets that were entirely contained within the left third of the image and targets that were entirely contained in the right third of the image (t test, df = 13, displayed as the negative base-10 logarithm of the p value, so that a value of 2 corresponds to p = 0.01). (B) Experiment 2: an example Lag 1 trial in Experiment 2, in which six participants performed target detection on two simultaneously presented side-by-side image streams while fixating on a central fixation cross. The example shown is a different sides trial in which the two consecutive targets appeared on different sides. (C) Experiment 2: an example Lag 1 same sides trial in which the two targets appeared on the same side. (D) Detection accuracy in Experiment 2. Error bars denote the standard error of the mean (N = 6) and the p values are from a paired t test (df = 5). Images in (B) and (C) are Copyright © 2019 Jacob G. Martin and Corel and its licensors. All rights reserved.
Figure 6
 
Experiment 3: two-stream target detection and categorization source localizations for correctly categorized T1-only trials (N = 18). Participants were told to fixate on a central fixation cross between the two streams. T1-only trials contained only one animal that appeared either in the left (A) or the right hemifield (B). (C) Beamforming neural source localizations at each indicated time for T1-only left trials. (D) Beamforming neural source localizations at each indicated time for T1-only right trials. Images in (A) and (B) Copyright © 2019 Jacob G. Martin and Corel and its licensors. All rights reserved.
Figure 6
 
Experiment 3: two-stream target detection and categorization source localizations for correctly categorized T1-only trials (N = 18). Participants were told to fixate on a central fixation cross between the two streams. T1-only trials contained only one animal that appeared either in the left (A) or the right hemifield (B). (C) Beamforming neural source localizations at each indicated time for T1-only left trials. (D) Beamforming neural source localizations at each indicated time for T1-only right trials. Images in (A) and (B) Copyright © 2019 Jacob G. Martin and Corel and its licensors. All rights reserved.
Figure 7
 
Experiment 3: behavioral results (N = 18). (A) Percent of the time each subject detected the correct number of targets in each condition. (B) Behavioral results for categorization of animals across different conditions. T1|T2 (respectively T2|T1) means that categorization accuracy was scored on T1 (T2) for trials in which T2 (T1) had been categorized correctly. T1&T2 means that participants categorized both targets correctly. Same sides and Different sides correspond to whether the two targets appeared in the same stream or not. Lag indicates when the second target was presented (Lag 1 is directly afterwards at T1+83 ms and Lag 7 is T1+583 ms). Results of paired t tests are indicated above each comparison (df = 17). (C) Frequency of responses in Lag 1 trials, separated into trials in which only 1 target was reported before the categorization quiz (Reported 1) and trials in which two targets were reported before the categorization quiz (Reported 2). (D) Behavioral categorization accuracy in Lag 1 trials, separated into trials in which only 1 target was reported before the categorization quiz (Reported 1) and trials in which two targets were reported before the categorization quiz (Reported 2). Categorization accuracy was calculated within each type of trial (e.g., the number of correctly categorized trials for T1 (or T2) out of all Lag 1 trials in which a participant reported two targets and T1 and T2 were presented on the same side). White bars denote the accuracy for the categorization of T1, and gray bars denote the accuracy for the categorization of T2. Error bars show 95% CI. Black dotted line is the chance level for a single categorization. Results of paired t tests are indicated above each comparison (df = 17).
Figure 7
 
Experiment 3: behavioral results (N = 18). (A) Percent of the time each subject detected the correct number of targets in each condition. (B) Behavioral results for categorization of animals across different conditions. T1|T2 (respectively T2|T1) means that categorization accuracy was scored on T1 (T2) for trials in which T2 (T1) had been categorized correctly. T1&T2 means that participants categorized both targets correctly. Same sides and Different sides correspond to whether the two targets appeared in the same stream or not. Lag indicates when the second target was presented (Lag 1 is directly afterwards at T1+83 ms and Lag 7 is T1+583 ms). Results of paired t tests are indicated above each comparison (df = 17). (C) Frequency of responses in Lag 1 trials, separated into trials in which only 1 target was reported before the categorization quiz (Reported 1) and trials in which two targets were reported before the categorization quiz (Reported 2). (D) Behavioral categorization accuracy in Lag 1 trials, separated into trials in which only 1 target was reported before the categorization quiz (Reported 1) and trials in which two targets were reported before the categorization quiz (Reported 2). Categorization accuracy was calculated within each type of trial (e.g., the number of correctly categorized trials for T1 (or T2) out of all Lag 1 trials in which a participant reported two targets and T1 and T2 were presented on the same side). White bars denote the accuracy for the categorization of T1, and gray bars denote the accuracy for the categorization of T2. Error bars show 95% CI. Black dotted line is the chance level for a single categorization. Results of paired t tests are indicated above each comparison (df = 17).
Figure 8
 
Experiment 3: classifiers trained on T1-only and tested on Lag-1 trials in which only one target was reported (N = 18). (A) Single-trial classification results obtained by training on T1-only trials (left and right separately) and testing on Lag 1 same side trials (LL and RR separately) showed differences for frontal (top row) and occipitotemporal channels (bottom row) only after the initial feedforward pass of T1. The legends at the left show the channels that were used for the frontal classifications (top row) and the occipitotemporal classifications (bottom row). The outline of the head is aligned to the level of the eyes. The detection performance (ROC area) of the T1-only based classifier differed depending on whether T1 was detected or T2 was detected starting at ∼200 ms in OT channels, whereas no significant difference was found between classifications before ∼200 ms (see Materials and methods).
Figure 8
 
Experiment 3: classifiers trained on T1-only and tested on Lag-1 trials in which only one target was reported (N = 18). (A) Single-trial classification results obtained by training on T1-only trials (left and right separately) and testing on Lag 1 same side trials (LL and RR separately) showed differences for frontal (top row) and occipitotemporal channels (bottom row) only after the initial feedforward pass of T1. The legends at the left show the channels that were used for the frontal classifications (top row) and the occipitotemporal classifications (bottom row). The outline of the head is aligned to the level of the eyes. The detection performance (ROC area) of the T1-only based classifier differed depending on whether T1 was detected or T2 was detected starting at ∼200 ms in OT channels, whereas no significant difference was found between classifications before ∼200 ms (see Materials and methods).
Figure 9
 
Classification of target category from EEG signals. (A) Experiment 1 (N = 14): Accuracies for classifying the target category based on the mean of the 60 ms of Occipital and Temporal EEG signals surrounding each timepoint (10–20 electrodes only). The results in the top plot show the mean classification accuracy (dark line) and standard error of the mean (light gray shaded areas) for each subject and timepoint over 30 iterations of 10-fold cross-validation. The bottom plot shows the results of a t test between subject performances and chance level at each time point. (B) Experiment 3 (N = 18): accuracies for classifying the target category based on the mean of the 60ms of Occipital and Temporal EEG signals surrounding each timepoint (10–20 electrodes only). Each classifier was trained and tested separately for T1-only Left and T1-only Right trials, and the results of the two were aggregated. The results in the top plot show the mean classification accuracy (dark line) and standard error of the mean (light gray shaded areas) for each subject and timepoint over 30 iterations of 10-fold cross-validation. The bottom plot shows the results of a t test between subject performances and chance level at each time point.
Figure 9
 
Classification of target category from EEG signals. (A) Experiment 1 (N = 14): Accuracies for classifying the target category based on the mean of the 60 ms of Occipital and Temporal EEG signals surrounding each timepoint (10–20 electrodes only). The results in the top plot show the mean classification accuracy (dark line) and standard error of the mean (light gray shaded areas) for each subject and timepoint over 30 iterations of 10-fold cross-validation. The bottom plot shows the results of a t test between subject performances and chance level at each time point. (B) Experiment 3 (N = 18): accuracies for classifying the target category based on the mean of the 60ms of Occipital and Temporal EEG signals surrounding each timepoint (10–20 electrodes only). Each classifier was trained and tested separately for T1-only Left and T1-only Right trials, and the results of the two were aggregated. The results in the top plot show the mean classification accuracy (dark line) and standard error of the mean (light gray shaded areas) for each subject and timepoint over 30 iterations of 10-fold cross-validation. The bottom plot shows the results of a t test between subject performances and chance level at each time point.
Supplement 1
Supplement 2
Supplement 3
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×