Open Access
Article  |   July 2024
Measuring attentional selection of object categories using hierarchical frequency tagging
Author Affiliations
Journal of Vision July 2024, Vol.24, 8. doi:https://doi.org/10.1167/jov.24.7.8
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Florian Gagsch, Christian Valuch, Thorsten Albrecht; Measuring attentional selection of object categories using hierarchical frequency tagging. Journal of Vision 2024;24(7):8. https://doi.org/10.1167/jov.24.7.8.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

In the present study, we used Hierarchical Frequency Tagging (Gordon et al., 2017) to investigate in electroencephalography how different levels of the neural processing hierarchy interact with category-selective attention during visual object recognition. We constructed stimulus sequences of cyclic wavelet scrambled face and house stimuli at two different frequencies (f1 = 0.8 Hz and f2 = 1 Hz). For each trial, two stimulus sequences of different frequencies were superimposed and additionally augmented by a sinusoidal contrast modulation with f3 = 12.5 Hz. This allowed us to simultaneously assess higher level processing using semantic wavelet-induced frequency-tagging (SWIFT) and processing in earlier visual levels using steady-state visually evoked potentials (SSVEPs), along with their intermodulation (IM) components. To investigate the category specificity of the SWIFT signal, we manipulated the category congruence between target and distractor by superimposing two sequences containing stimuli from the same or different object categories. Participants attended to one stimulus (target) and ignored the other (distractor). Our results showed successful tagging of different levels of the cortical hierarchy. Using linear mixed-effects modeling, we detected different attentional modulation effects on lower versus higher processing levels. SWIFT and IM components were substantially increased for target versus distractor stimuli, reflecting attentional selection of the target stimuli. In addition, distractor stimuli from the same category as targets elicited stronger SWIFT signals than distractor stimuli from a different category indicating category-selective attention. In contrast, for IM components, this category-selective attention effect was largely absent, indicating that IM components probably reflect more stimulus-specific processing.

Introduction
The mammal brain handles large amounts of incoming sensory data and has evolved neural mechanisms to extract and interpret relevant and meaningful information which is important for survival (Logothetis & Sheinberg, 1996; Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). In the case of the visual domain, the brain manages to detect and recognize visual objects and categories from simple differences in light and shadow that reach the eyes. It can also prioritize certain objects or categories of objects that are most important for the current needs of the organism at the expense of those that are secondary. These fascinating functions rely on a hierarchy of visual brain areas that jointly solve the task of seeing and making sense of the world. 
Neural processing of object categories in humans
Non-invasive studies with humans—for example, using functional magnetic resonance imaging (fMRI)—have identified several distinct brain areas that respond selectively to object categories. The fusiform face area (FFA) in the extrastriate cortex is highly category selective for faces (Andrews, Schluppeck, Homfray, Matthews, & Blakemore, 2002; Grill-Spector, Knouf, & Kanwisher, 2004; Kanwisher, McDermott, & Chun, 1997; Tong, Nakayama, Vaughan, & Kanwisher, 1998). Together with the occipital face area (OFA) (Gauthier et al., 2000), it forms a main part of the so-called “core system” of face perception (Haxby, Hoffman, & Gobbini, 2000). Palmisano et al. (2023) showed that stimulating this network can induce illusory perception of faces. Also, lesions of the FFA are associated with an inability to recognize faces (Barton, Press, Keenan, & O'Connor, 2002; Wada & Yamamoto, 2001). In electroencephalography (EEG) research, the presentation of face stimuli results in an event-related potential (ERP) peaking around 170 ms, which has been termed the N170 (Bentin, Allison, Puce, Perez, & McCarthy, 1996). It has been shown to correlate with the face-selective activation of the FFA in fMRI (Yovel, Sadeh, Podlipsky, Hendler, & Zhdanov, 2008), with the perceptual awareness of faces (Harris, Wu, & Woldorff, 2011), as well as interpretation of a stimulus as a face (Bentin, Sagiv, Mecklinger, Friederici, & von Cramon, 2002; Caharel et al., 2013; George, Jemel, Fiori, Chaby, & Renault, 2005). Another example of category selectivity is the parahippocampal place area (PPA), which is strongly responsive to the category of scenes and houses (Epstein & Kanwisher, 1998; Tong et al., 1998), and stimulation of the PPA can result in visual hallucinations of scenes (Mégevand et al., 2014). 
Visual selection of object categories
Visual perception does not merely aggregate sensory signals in a bottom–up process to form categorical representations. It also involves the influence of top–down processes, such as attention, that regulate neuronal processing across various levels of the visual hierarchy. When several visual objects compete for processing, the visual system is able to bias processing toward task-relevant stimulus features (Kastner & Ungerleider, 2000; Treue & Martinez-Trujillo, 2007). Attention thus leads to a top–down enhancement of neurons tuned to features associated with relevant objects (Andersen, Hillyard, & Müller, 2013; Saenz, Buracas, & Boynton, 2002; Serences & Boynton, 2007; Störmer & Alvarez, 2014; Treue & Martinez Trujillo, 1999). This top–down modulation can be observed not only for basic visual features, such as orientation or color, but also for complex objects and categories higher up the processing hierarchy (Baldauf & Desimone, 2014; Furey et al., 2006; Gazzaley, Cooney, McEvoy, Knight, & D'Esposito, 2005; Lueschow et al., 2004; O'Craven, Downing, & Kanwisher, 1999; Quek, Nemrodov, Rossion, & Liu-Shuang, 2017; Störmer, Cohen, & Alvarez, 2019; Thorat & Peelen, 2022). Such an enhancement can also occur for stimuli that are not task relevant per se but share similarities with the relevant target stimuli. If an irrelevant distractor shares the category of a target, it evokes an enhanced category-selective response compared to a non-target from a different category (Lueschow et al., 2004) even if spatial attention is not directed to the distractor (Störmer et al., 2019; Thorat & Peelen, 2022). Importantly, neural activation differences can result not only from an enhancement of attended target categories but also from suppression unattended distractor categories (Furey et al., 2006; Gazzaley et al., 2005; Quek et al., 2017). 
Measuring visual processing using EEG and frequency tagging
Although classical ERPs such as the N170 have long been employed as a valuable tool in studying category-selective processing and the process of attention, they come with inherent limitations. Interpreting ERPs to isolate category-selective responses can be complex due to differences in latency and topography of the ERP and the presence of noise and artifacts. Additionally, accurately defining these components and achieving reliable differences in ERPs often requires recording and averaging a large number of trials due to the low signal-to-noise ratio (SNR). 
An alternative approach is to present visual stimuli periodically at a fixed temporal frequency, generating a stable and measurable response in the frequency domain of the EEG (Regan, 1966). This method allows for “tagging” the processing of the stimulus using the specific presentation frequency. This steady-state visual evoked potential (SSVEP) is produced by modulating the low-level features of a stimulus at frequencies commonly ranging from 3 to 20 Hz (Norcia, Appelbaum, Ales, Cottereau, & Rossion, 2015). One major advantage of this method compared to the classical ERP is the high SNR, as artifacts and stimulus-unrelated activity are unlikely to follow a certain pattern that occurs in the narrow frequency band of the SSVEP. This allows isolating an objective response to a stimulus with a relatively short trial duration, explaining the wide adoption in the research on basic visual processes (Norcia et al., 2015), spatial attention (Morgan, Hansen, & Hillyard, 1996; Toffanin, de Jong, Johnson, & Martens, 2009), feature-based attention (Störmer & Alvarez, 2014; Wang, Clementz, & Keil, 2007), object recognition (Kaspar, Hassler, Martens, Trujillo-Barreto, & Gruber, 2010; Martens, Wahl, Hassler, Friese, & Gruber, 2012; Minami, Azuma, & Nakauchi, 2020), and conscious perception (Brown & Norcia, 1997; Tononi, Srinivasan, Russell, & Edelman, 1998). However, when using SSVEPs alone, it is difficult to disentangle tagging of higher order visual processes from tagging of neuronal processing on lower levels of the visual hierarchy (Norcia et al., 2015; but see Rossion, Torfs, Jacques, & Liu-Shuang, 2015). To better separate between these processing stages, Koenig-Robert and VanRullen (2013) introduced a new frequency-tagging approach using cyclic wavelet scrambling. 
The semantic wavelet-induced frequency-tagging (SWIFT) method periodically modulates local contours of an image while keeping luminance, contrast, and distribution of spatial frequency constant across time (Koenig-Robert & VanRullen, 2013). Because the detection and processing of these latter features takes place at the bottom of the cortical processing hierarchy (Hubel & Wiesel, 1968; Nauhaus, Nielsen, Disney, & Callaway, 2012; Riesenhuber & Poggio, 1999; Sclar, Maunsell, & Lennie, 1990), the method avoids tagging of lower visual areas such as V1 and V2 (Koenig-Robert, VanRullen, & Tsuchiya, 2015). Therefore, image sequences created with the SWIFT method generate a frequency-tagging response, which can be seen as an isolation of higher order visual processes. Because contours carry the semantic information of an image, the authors describe SWIFT as a marker for semantic object recognition (Koenig-Robert & VanRullen, 2013). Indeed, they showed that a SWIFT signal is only detectable when the image of the SWIFT sequence is recognized from the scrambling noise (Koenig-Robert & VanRullen, 2013). An unrecognized image or an abstract texture did not generate a reliable SWIFT response, validating SWIFT as a marker for object recognition. SWIFT can also serve as an all-or-none marker of awareness in binocular rivalry (Koenig-Robert & VanRullen, 2012), which is an advantage over the classical SSVEP that differs only gradually between aware and unaware conditions (de Heering, Beauny, Vuillaume, Salvesen, & Cleeremans, 2020; Smout & Mattingley, 2018; Toffanin et al., 2009; Tononi et al., 1998). The modulation of SWIFT by spatial attention appears considerably more pronounced than that of the SSVEP (Koenig-Robert & VanRullen, 2013), suggesting an origin in higher level visual cortices because selective attention is seen as a top–down process originating from higher cortical structures (Lauritzen, D'Esposito, Heeger, & Silver, 2009; Saalmann, Pigarev, & Vidyasagar, 2007). This interpretation was supported by a following fMRI study that found category-selective frequency-tagged blood oxygen level–dependent (BOLD) signals in the FFA and the PPA for SWIFT sequences of faces and scenes, and constant activation was found in early visual areas (Koenig-Robert et al., 2015). 
Hierarchical Frequency Tagging (HFT) combines both approaches, SSVEP and SWIFT, by overlaying SWIFT stimulus sequences with a modulation of simple features at a different frequency (e.g., a contrast modulation) (Gordon, Koenig-Robert, Tsuchiya, van Boxtel, & Hohwy, 2017). This approach allows tagging different levels of the visual hierarchy simultaneously. Whereas the SSVEP reflects processing of low-level visual features and is confined to early visual areas, the SWIFT signal reflects more categorical or semantic processing in higher visual areas (Koenig-Robert & VanRullen, 2013; Koenig-Robert et al., 2015). HFT also allows the investigation of interactions between higher and lower visual areas by exploiting the so-called intermodulation (IM) components. In general, IM components are additional observable frequencies in the EEG spectrum that are not present in the initial input signal but are generated through their neuronal interaction. For example, when two SSVEP stimuli are presented simultaneously to tag visual processing with two individual frequencies (e.g., f1 and f2), additional IM components can appear as the non-zero integer multiple of the fundamental frequencies (e.g., f1 + f2, 2f1 + f2). These IM components can be seen as evidence for neural integration of processes that are tagged by the fundamental frequencies (for a detailed discussion of the IM research body, see Gordon, Hohwy, Davidson, van Boxtel, & Tsuchiya, 2019). Previous studies showed that IM components correlate with perceptual binding of visual features into integrated forms (Aissani, Cottereau, Dumas, Paradis, & Lorenceau, 2011), the formation of an illusory percept resulting from visual integration of individually tagged objects into something more holistic (Alp, Kogo, Van Belle, Wagemans, & Rossion, 2016; Gundlach & Müller, 2013), and the perception of motion synchrony of point light displays and their integration into human like shapes (Alp, Nikolaev, Wagemans, & Kogo, 2017). Boremanse, Norcia, and Rossion (2014) showed that the IM signals resulting from the presentation of two contrast modulated face halves were strongly diminished or absent when the face halves were more difficult to integrate due to spatial misalignment or the fact that they belonged to different identities. Other studies used IM as a marker for attentional selection of multiple, individually tagged, visual stimuli (Kim, Tsai, Ojemann, & Verghese, 2017), as well as binocular integration (Brown, Candy, & Norcia, 1999; Zhang, Jamison, Engel, He, & He, 2011) and hemifield integration (Sutoyo & Srinivasan, 2009), in binocular rivalry paradigms. Gordon et al. (2017) framed their results in regard to a predictive coding account of perception (Friston, 2005; Rao & Ballard, 1999), according to which the IM signal is seen as a marker for fit and integration of predictive top–down signals (SWIFT) and sensory bottom–up (SSVEP) signals (Coll, Whelan, Catmur, & Bird, 2020; Gordon et al., 2017; Gordon, Tsuchiya, Koenig-Robert, & Hohwy, 2019). Gordon et al. (2017) showed diverging effects on the three HFT components by manipulating stimulus predictability, arguing for dissociable neural processes. A replication study by Coll et al. (2020) supported the notion of IM being a marker for integration of predictive top–down signals and sensory bottom–up signals in connection with the theory that a dysfunction of this integration is correlated with autistic traits. 
In a subsequent study, Gordon, Tsuchiya, et al. (2019) utilized the HFT method with two superimposed SWIFT sequences—a face and a house—tagged by different frequencies. Their findings revealed enhanced IM responses for attended stimuli compared with unattended ones, along with an attentional modulation of SWIFT (Gordon, Tsuchiya, et al., 2019, supplement 1). More recently, Koenig-Robert, Pace, Pearson, and Hohwy (2023) investigated the time course of HFT signals in a free foraging task. Their results demonstrated increased SWIFT signals associated with correct target recognition, as well as a positive correlation of IM signals with predictability of perceptual information, replicating earlier findings (Koenig-Robert & VanRullen, 2013; Gordon, Tsuchiya, et al., 2019). 
In sum, the HFT method has proven promising to investigate different levels of the visual hierarchy with SWIFT as a marker for object recognition originating in higher levels of this hierarchy that likely tags the processing in category-selective areas such as the FFA and PPA (Koenig-Robert et al., 2015). However, additional research and a clear interpretation of SWIFT are necessary to better understand if and how HFT captures interactions between lower and higher order visual processes and what kind of integration is measured by IM signals. 
The present study
We investigated the impact of category-selective attention on different levels of the visual hierarchy using the HFT method. Building upon Gordon, Tsuchiya, et al. (2019), we introduced a manipulation to investigate the influence of the target category on the SWIFT signal of an irrelevant distractor, as well as on other HFT-evoked responses (SSVEP and IM). Additionally, we sought to validate HFT as a method for simultaneously tagging various visual processing levels using EEG, particularly aiming to confirm SWIFT as an objective marker for category-selective processing, thereby extending previous findings on category-selective tagging in fMRI (Koenig-Robert et al., 2015) and other studies that have used SWIFT or HFT (Coll et al., 2020; Koenig-Robert et al., 2023; Koenig-Robert & VanRullen, 2013; Gordon et al., 2017; Gordon, Tsuchiya, et al., 2019). 
We employed a 2 × 2 × 2 within-participant design with attention, category, and category congruence as independent variables. We manipulated attention by making one of the two stimuli in each trial task relevant, allowing us to measure and compare EEG responses to target and distractor frequencies. The category variable was defined by the stimulus being either a face or a house. In our experiment, target and distractor objects were either exemplars of different categories (category-incongruent conditions, such as a face and a house) or exemplars of the same category (category-congruent conditions, such as two different faces). 
Because SWIFT and IM have been shown to be enhanced by attention (Gordon, Tsuchiya, et al., 2019; Koenig-Robert & VanRullen, 2013), we hypothesized that the SWIFT and IM signal of a target stimulus should be enhanced compared to the SWIFT (H1) and IM signals (H2) of a distractor stimulus. Based on previous findings of global enhancement of category processing by attention (Baldauf & Desimone, 2014; Furey et al., 2006; Gazzaley et al., 2005; Lueschow et al., 2004; O'Craven et al., 1999; Quek et al., 2017; Störmer et al., 2019; Thorat & Peelen, 2022), we expected that category congruence (vs. category incongruence) should also enhance distractor-evoked SWIFT signals (H3). Regarding the distractor associated IM signals, our hypothesis was more exploratory. Assuming that integration of bottom–up distractor signals should be enhanced when they contain categorical information similar to the target, we also expected stronger IM signals in category-congruent compared to category-incongruent trials (H4). Finally, because the SSVEP should tag mostly low-level visual processing, we did not expect to find an effect of category congruence on the global SSVEP signal (H5). 
Methods
Participants
Participants were 24 healthy and right-handed psychology students (four males; M = 20.3 years of age, SD = 1.90) of Georg-August-University Göttingen, Germany. All reported having normal or corrected-to-normal vision and no neurological or psychiatric condition. All participants gave informed written consent before attending the experiment and received course credit for their participation. Four participants were excluded from all analyses due to poor EEG data, so the final sample was comprised of 20 participants (three males; M = 20.4 years of age, SD = 2.06). All experimental procedures were in accordance with the tenets of the Declaration of Helsinki. 
Stimulus construction
Image selection
We used a set of 40 face images and 40 house images to construct the SWIFT sequences. All images were scaled to a size of 11.5° of visual angle (600 × 600 pixels), grayscaled, and equalized in luminance and contrast using the SHINE toolbox (Willenbockel et al., 2010). Face images were partly adopted from Coll et al. (2020). Additional face images were sourced using Google Images (https://www.google.com/imghp). We selected images labeled as “free to use, share, or modify, even commercially.” These images included portrait-like photographs featuring individuals of various genders, ages, and ethnicities against natural backgrounds. The house images were taken from the Pasadena Houses 2000 dataset (Perona & Helle, 2000) and the Pasadena Buildings dataset (Aly, Welinder, Munich, & Perona, 2009) showing frontal and diagonal views of houses within a natural environment. 
SWIFT sequences
The SWIFT sequences were constructed using MATLAB and the Wavelet Toolbox (MathWorks, Natick, MA), together with custom code provided by Koenig-Robert and VanRullen (2013). The following parameters were used for the deconstruction: wavelet family = discrete Meyer wavelet, number of harmonics = 5, decomposition levels = 6, frame rate of monitor = 100 Hz. The normalization option in the function was not used because it led to luminance differences between sequences. 
Two original SWIFT sequences were created for each stimulus, one for each frequency (f1 = 0.8 Hz and f2 = 1 Hz). During each trial, a 0.8-Hz SWIFT sequence of one specific stimulus was presented, blended with a 1-Hz SWIFT sequence of another stimulus via alpha blending (see Figure 1). The blended SWIFT sequences were additionally augmented with a contrast modulation with a frequency of 12.5 Hz. 
Figure 1.
 
Illustration of trial sequence creation. Two SWIFT sequences are pictured as a series of individual frames and combined via alpha blending into the final trial sequence. Colored frames indicate the original stimulus (red = face; green = house). The face image is a private photograph and is used here for illustrative purposes only; the original stimuli are available from the authors on request. The house image is taken from the Pasadena Houses 2000 dataset (Perona & Helle, 2000).
Figure 1.
 
Illustration of trial sequence creation. Two SWIFT sequences are pictured as a series of individual frames and combined via alpha blending into the final trial sequence. Colored frames indicate the original stimulus (red = face; green = house). The face image is a private photograph and is used here for illustrative purposes only; the original stimuli are available from the authors on request. The house image is taken from the Pasadena Houses 2000 dataset (Perona & Helle, 2000).
Presenting two blended and overlapping SWIFT sequences posed a challenge to maintaining the recognizability of the original images. The predominant scrambled noise from each SWIFT sequence could obscure the cyclically appearing original image of its partner stimulus, potentially disrupting the semantic information that is crucial for the SWIFT signal to appear (Koenig-Robert & VanRullen, 2013). To achieve clear distinctiveness of each individual image, we carefully determined compatible SWIFT stimuli and assigned each one to a partner stimulus, either from the congruent category (e.g., face 1 and face 2) or from the incongruent category (e.g., face 1 and house 1). Next, we needed a rigorous counterbalancing to disentangle effects of specific stimulus features from experimental effects, especially because of probable remaining differences in distinctiveness between stimulus pairs. Due to the fixed assignment of stimulus pairs, conventional randomization of experimental variables across trials was not possible, and counterbalancing of congruence, SWIFT frequency (0.8 Hz/1 Hz), and attentional role (target/distractor) was done manually beforehand. For each participant, we ensured that each SWIFT stimulus was presented once with a congruent partner (i.e., face/face or house/house) and once with an incongruent partner (i.e., face/house), as well as once with 0.8 Hz and once with 1 Hz. Attentional roles were kept unchanged for each participant to prevent inhibited responses due to negative priming effects on target stimuli that had a distractor role in a previous trial (Tipper & Cranston, 1985). Nevertheless, attentional roles as well as SWIFT frequencies were switched and counterbalanced across participants (see Figure 2 for an illustration). In sum, 40 face and 40 house SWIFT stimuli were created in each frequency, resulting in 160 SWIFT stimuli combined into 80 SWIFT pairs (40 congruent and 40 incongruent), plus another 80 SWIFT pairs with switched attentional roles and frequencies. In the actual experiment, the blended trial sequences were presented in a random order. 
Figure 2.
 
Example stimulus pairings across trial sequences for two exemplary participants with counterbalancing of stimulus identity, frequency, congruence, and attentional role. In each trial, a 1-Hz SWIFT sequence and a 0.8-Hz SWIFT sequence were superimposed with alpha blending (juxtaposed in the figure for illustration), one serving as a target and one serving as distractor. Each stimulus had one fixed partner in the congruent condition and in the incongruent condition to ensure visual recognizability. The target was defined before each trial. Target and distractor roles of individual stimuli were fixed for each participant to prevent negative priming, and attentional roles were counterbalanced across participants; for example, the exemplary participant A was only presented with the man as a target, but participant B was only presented with the woman as a target. Congruent and incongruent trials were presented in a randomized manner. Face images are private photographs and are used here for illustrative purposes only; the original stimuli are available from the authors on request. The house images are taken from the Pasadena Houses 2000 dataset (Perona & Helle, 2000) and the Pasadena Buildings dataset (Aly et al., 2009).
Figure 2.
 
Example stimulus pairings across trial sequences for two exemplary participants with counterbalancing of stimulus identity, frequency, congruence, and attentional role. In each trial, a 1-Hz SWIFT sequence and a 0.8-Hz SWIFT sequence were superimposed with alpha blending (juxtaposed in the figure for illustration), one serving as a target and one serving as distractor. Each stimulus had one fixed partner in the congruent condition and in the incongruent condition to ensure visual recognizability. The target was defined before each trial. Target and distractor roles of individual stimuli were fixed for each participant to prevent negative priming, and attentional roles were counterbalanced across participants; for example, the exemplary participant A was only presented with the man as a target, but participant B was only presented with the woman as a target. Congruent and incongruent trials were presented in a randomized manner. Face images are private photographs and are used here for illustrative purposes only; the original stimuli are available from the authors on request. The house images are taken from the Pasadena Houses 2000 dataset (Perona & Helle, 2000) and the Pasadena Buildings dataset (Aly et al., 2009).
To ensure attentional engagement to the target, the counting task of Gordon, Tsuchiya, et al. (2019) was adopted. For that, noise sequences were created to achieve more variability in the amount of countable original images across trials. The most scrambled frame of each original sequence was used to create a noise SWIFT sequence for each stimulus. This additional noise sequence resembled the source sequence in its low-level visual features with the exception that it contained no original image of a house or a face. During each trial, these noise sequences were intertwined randomly with the original SWIFT sequences. The frame sequences were rearranged to always end with the most scrambled frame to ensure a fluent transition between original and noise sequences. The total amount of noise sequences in each trial was determined by one of five noise levels (10%, 15%, 20%, 25%, or 30%) which were counterbalanced across all conditions. All sequences had a duration of 35 seconds and were comprised of 28 and 35 cycles for the slow and fast sequence, respectively. Disregarding the noise cycles (10%–30%), this resulted in 20 to 32 target images per trial (see Supplementary Material for an example video clip). 
Electrophysiological recording
EEG data was recorded using a BioSemi ActiveTwo system (BioSemi, Amsterdam Netherlands) with 64 active silver/silver chloride (Ag/AgCl) electrodes mounted on an elastic cap according to the international extended 10-20 system, and a sampling rate of 512 Hz. Two additional electrodes were placed at both outer canthi of the eyes and one below the left eye to measure horizontal and vertical eye movements, respectively. Additionally, two electrodes were placed on the left and right mastoid for optional offline re-referencing. In active electrodes, the signal is already (passively) amplified within the electrodes by impedance transformation, which leads to very low output impedances of the electrodes (<1 ohm). Therefore, electrode impedance is no longer a valid measure for quality of the contact between electrode and skin. Instead, the manufacturer recommends keeping the electrode offset (i.e., the voltage difference between each electrode and the common mode sense) and stable below ±25 mV. In the present experiment, all electrode offsets were below ±20 mV. 
Procedure
The whole procedure was performed under the Guidelines for Experimental Laboratory Work During the Covid-19 Pandemic of the Department of Experimental Psychology of the Georg-Elias-Müller-Institute of Psychology (University of Göttingen) to ensure the health of the participants and the investigator. After participants received an introduction of the procedure and were given informed consent, they were escorted into a dimly lit room. They were seated comfortably in an armchair at a distance of 100 cm from the presentation screen. Demographic information was assessed verbally. The EEG recording cap and external electrodes were applied, and the input keyboard was provided. The participants were instructed to keep movements and eye blinks to a minimum during the trial. 
The experiment was run via Psychtoolbox 3 (Brainard, 1997; Kleiner et al., 2007) and presented on a 19-inch CRT monitor with a resolution of 1024 × 768 pixels and 100-Hz refresh rate. It started with a short instruction screen and a practice trial, followed by 80 experimental trials. Each trial began with a static screen showing the target image for the counting task. After the space bar was pressed, a fixation point and the two alpha blended SWIFT sequences (one with f1 = 0.8 Hz and one with f2 = 1 Hz) were presented at the center of the screen. The SWIFT sequences could contain either one house and one face sequence (incongruent condition) or two sequences of one image category (congruent condition). Participants were instructed to keep their gaze on the fixation point and to attend to one of the sequences by counting the target image while ignoring the distractor sequence. After a trial duration of 35 seconds, a response screen was presented, and participants had to type in the counted number of target images. After a block of 10 trials, participants were instructed to take a short break. The mean duration of the experiment including breaks was 80.60 minutes (SD = 10.36; range, 70–123). After the experiment, participants gave a subjective report on nine predefined questions regarding their phenomenology and behavior during the task. The whole procedure including preparation and debriefing took about 140 minutes. 
Analyses
Behavioral analyses
To evaluate if participants were engaged in the counting task, we calculated the deviation between the counted images and the actual presented target images for each trial. Mean deviation and the standard deviation were calculated separately for each condition. Individual trial performance that fell further than 2.5 SD from the participants’ mean deviation in the particular condition were marked as incorrect. Because trial performance was good across all participants (number of incorrect trials: M = 1.81%; SD = 1.60%; range, 0%–6.25%), no participants or trials were excluded on the basis of behavioral data. 
EEG preprocessing
EEG data was analyzed using EEGLAB 2019.1 (Delorme & Makeig, 2004) and ERPLAB 7.0.0 (Lopez-Calderon & Luck, 2014). Raw EEG data were bandpass filtered between 0.2 and 30 Hz (IIR Butterworth filter, 24 dB/octave) and segmented into 30-second epochs starting 5 seconds after stimulus onset. Each epoch contained an integer number of SWIFT cycles (28 cycles with 0.8 Hz and 35 cycles with 1 Hz). The first 5 seconds of (i.e., five cycles with 1 Hz, four cycles with 0.8 Hz) were excluded from the analysis to avoid stimulus-onset artifacts. An independent component analysis (ICA) was used to remove signals associated with eye movements and blinks (Jung et al., 2000). To improve the performance of the ICA, new epochs 4 seconds long were created, and bad channels and epochs were excluded. The Cz electrode served as reference for the ICA. For participants who showed no clear vertical or horizontal electrooculogram (EOG) components, the ICA was conducted over selected epochs, which contained eye movements according to a prior EOG artifact detection. On average, M = 1.95 ICs (SD = 1.0; range, 1–5) were flagged as artifacts and removed from the data. Remaining artifacts were detected using a simple threshold algorithm with a cutoff of 30 mV for the EOG channel and 200 mV for scalp channels. Four participants with more than 20% of rejected EEG data (within one condition or across all conditions) were excluded from further analyses. After artifact rejection, the data were re-referenced to average reference. 
Frequency-domain analysis
Spectral power was extracted by applying the fast Fourier transform (FFT) over each trial. The time window for the FFT was 30 seconds (5 seconds after trial onset), resulting in a frequency resolution of 0.033 Hz (1/30 second). This ensured that the frequency bins could be centered around the frequencies of interest. The evoked power (evoPOW) was calculated as the squared absolute value of the complex Fourier spectra averaged across trials (Keitel et al., 2019):  
\begin{eqnarray*} &&{\rm{evoPOW\ }}( f ) = {{\left| {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {{Z}_i} ( f )} \right|}^2},\\ &&\qquad {\rm{\ with\ }}n = {\rm{number\ of\ trials}} \end{eqnarray*}
 
This measure represents the stimulus evoked power which is phase-locked to the stimulus presentation (Keitel et al., 2019). This method ensures that activity that is not attributable to stimulus phase has less impact on the power distribution. An important partition of the frequencies of interest, more specifically the lower IM components, overlaps with the alpha power range, which might show inverse modulations by attention (Keitel et al., 2019). 
Next, SNRs were calculated by dividing the evoPOW at each frequency by the mean evoPOW of 16 surrounding frequencies (eight on each side; from f – 0.3 Hz to f + 0.3 Hz), leaving out the immediately adjacent bins, as well as the frequencies of interest including all harmonics (Gordon et al., 2017; Gordon, Tsuchiya, et al., 2019; Jacques, Retter, & Rossion, 2016; Norcia et al., 2015; Rossion et al., 2015). Frequencies of interest were the two first-order SWIFT frequencies (f1 = 0.8 Hz and f2 = 1 Hz), as well as their first-order harmonics (2f1 = 1.6 Hz and 2f2 = 2 Hz), the SSVEP frequency and its first harmonic (f3 = 12.5 Hz and 2f3 = 25 Hz), and the IM frequencies as linear combinations of the SSVEP and the SWIFT frequencies, including their harmonics (f3 − 2f2 = 10.5 Hz, f3 − 2f1 = 10.9 Hz, f3 − f2= 11.5 Hz, f3 − f1 = 11.7 Hz, f3 + f1 = 13.3 Hz, f3 + f2 = 13.5 Hz, f3 + 2f1 = 14.1 Hz, and f3 + 2f2 = 14.5 Hz). Theoretically, many more IM frequencies could have been analyzed, but we restricted our selection based on Gordon et al. (2017). For exploratory purposes, the linear combination of the two SWIFT frequencies was also observed (f1 + f2 = 1.8 Hz). 
Statistical analyses
Statistical analyses were performed in R 4.3.2 (R Foundation for Statistical Computing, Vienna, Austria) using the packages nlme 3.1-152 (Pinheiro, Bates, DebRoy, & Sarkar, 2021), lme4 1.1.35 (Bates, Mächler, Bolker, & Walker, 2015), and RLRsim 3.1.8 (Scheipl, Greven, Kuechenhoff, 2008). The average SNR for each frequency bin between 0.3 and 28 Hz (835 bins) was calculated for each participant across all conditions and a broad posterior region of interest (ROI; CPz, C1–CP6, TP7–TP10, Pz, P1–P8, POz, PO3, PO4, PO7–PO10, Oz, O1, and O2) (Gordon et al., 2017). Following Gordon et al. (2017), the presence of SWIFT, SSVEP, and IM components was validated by one-tailed t-tests of the SNR with p values adjusted for false discovery rate (FDR) using the method by Benjamini and Yekutieli (2001). For further analyses, SWIFT, SSVEP, and IM signal were averaged across their respective fundamentals and first-order harmonics that differed significantly from 1. 
To evaluate the effects of attention, congruence, and stimulus category, as well as their interaction on SWIFT, SSVEP, and IM signal, linear mixed-effects (LME) models with sum contrasts were used. Although it may be still relatively uncommon to use LME models to analyze electrophysiological data, several recent studies have shown the advantages of these models over traditional methods, as they offer the possibility to take several nested or crossed random factors into account (e.g., Aarts, Verhage, Veenvliet, Dolan, & Van Der Sluis, 2014; Heise, Mon, & Bowman, 2022; Koerner & Zhang, 2017; Yu et al., 2022). Due to the hierarchical nested structure of the EEG data, channels nested in participants served as random effects by allowing random intercepts. Although the spatial dependence between electrodes (i.e., signal correlations) theoretically could challenge normality assumptions of the LME models, our analysis of diagnostic plots indicated satisfactory adherence to these assumptions (see Supplementary Material). Consequently, we chose to prioritize the random effects model over the fixed-effects model. This decision was driven by recognizing its benefits, such as accounting for interindividual signal variability between channels across participants (e.g., arising from interindividual differences in cortex folding or electrode placement). We concluded that these benefits outweighed any potential drawbacks (see Supplementary Material S2 for a more detailed argumentation). The averaged signal was used as dependent variable and was log2 transformed to produce a more normal distribution and better homoscedasticity for the linear models (Gordon et al., 2017). In contrast to Gordon et al. (2017), who used a uniform 30-channel posterior ROI for all three signals (CPz and CP1–CP6, TP7–TP10, Pz and P1–P8, POz, PO3, PO4, PO7–PO10, Oz, O1, and O2), we chose the channels based on the topographies of the detected signals. We chose a 16-channel occipital–parietal ROI for SSVEP and IM (Iz, O1/O2, Oz, P1–P6, PO3/PO4, PO7/PO8, POz, and Pz), and a more parietal–temporal spanning 19-channel ROI for the averaged SWIFT signal (Iz, O1/O2, Oz, P3–P10, PO3/PO4, PO7/PO8, POz, and TP7/TP8). The activations at central electrodes were not considered in the analyses of SWIFT. 
The models were built sequentially, starting with a basic form predicting the dependent variable using a fixed intercept. Complexity increased gradually by adding fixed effects and their interactions, with model differences evaluated using likelihood ratio tests. The contribution of the random effects was assessed by comparing the final model with and without random intercepts by computing simulations-based restricted likelihood ratio tests (RLRTs) using the RLRsim package. Models were refitted for this procedure using the lme4 package. In addition, we report the intraclass correlation coefficients (ICCs), which can be interpreted as the proportion of variance explained by the random effect. Interaction effects were further analyzed with separate LME models. Effects and differences were deemed statistically significant at an alpha level of 0.01, a more stringent threshold compared to the conventional 0.05 level, aiming to minimize type I errors and fortify the reliability of our findings. Effects of stimulus category and congruence on the behavioral data were also analyzed using LME models, with the proportional deviation from actual number of targets (i.e., absolute deviation in relation to varying number of actual targets in a trial) as the outcome variable. Non-parametric tests were used additionally when assumptions of LME models were violated. Because sum contrasts were used for the LME models, slopes were interpreted in relation to the grand mean (intercept). The variables were dummy coded in the following way: attention = target (+1), distractor (–1); congruence = congruent (+1), incongruent (–1); and category = face (+1), house (–1). 
Results
Behavioral results
The mean proportional deviation across all conditions and participants was 0.12 (SD = 0.16, range, 0–1). An LME model was built using target category and congruence to predict the proportional deviation including participants as the random intercept. Adding the interaction of category and congruence did not improve the model, χ2(1) = 0.23, p = 0.633. Intercepts varied across participants (SD = 0.06; 99% confidence interval [CI], 0.04–0.09; p < 0.001; ICC = 0.14), and inclusion of participants as the random factor significantly improved the model (RLRT = 185.22; p < 0.001). Consistent with the subjective reports (see Supplementary Material S1), category congruence of target and distractor led to higher error scores compared with incongruence, β = 0.02, t(1578) = 6.33, p < 0.001 (see Figure 3). The main effect of target category did not reach significance, β = –0.01, t(1578) = –1.62, p = 0.11. Because diagnostic plots indicated that the assumptions of homogeneity of variance and normality of residuals were violated by the model (see Supplementary Material for diagnostic plots), a Wilcoxon signed-rank test was computed additionally. The mean proportional deviation of each participant differed between the congruent and incongruent condition (V = 0; p < 0.001), but not between target categories (V = 130; p = 0.368), therefore replicating the LME model results. 
Figure 3.
 
Data on behavioral performance. Mean proportional deviation from actual number of targets in the counting task as a function of target category and congruence. Error bars indicate 99% confidence intervals. Category congruence led to higher error scores compared with category incongruence for faces and houses (**p < 0.001).
Figure 3.
 
Data on behavioral performance. Mean proportional deviation from actual number of targets in the counting task as a function of target category and congruence. Error bars indicate 99% confidence intervals. Category congruence led to higher error scores compared with category incongruence for faces and houses (**p < 0.001).
Frequency detection
The presence of the SWIFT, SSVEP, and IM signals was confirmed by the respective SNR that differed significantly from 1 (see Figure 4). Table 1 shows the results of the one-tailed t-tests for all frequency bins of interest. The fundamental frequency of the SSVEP signal (f3 = 12.5 Hz) showed the highest SNR, followed by its first harmonic, 2f3 = 25 Hz. The two fundamental SWIFT frequencies were clearly detectable (f1 = 0.8 Hz and f2 = 1 Hz), as well as their first-order harmonics (2f1 = 1.6 and 2f2 = 2 Hz). The harmonics of f1 = 0.8 Hz were continually detectable up to the sixth-order harmonic: 6f1 = 4.8 Hz; M = 1.46; 99% CI, 1.25–infinity; t(19) = 5.47, FDR-adjusted p < 0.01. However, the harmonics of f2 = 1 Hz were detectable up to the eighth-order harmonic: 8f2 = 8 Hz; M = 1.34; 99% CI, 1.22–infinity; t(19) = 7.12; FDR-adjusted p < 0.001. There were more higher order harmonics with an SNR greater than 1 that are not reported here (for more extensive t-test data, see Supplementary Table S1). The signal of the four first-order IM frequencies also differed from 1 (f3 − f2 = 11.5 Hz, f3 − f1 = 11.7 Hz, f3 + f1 = 13.3 Hz, f3 + f2 = 13.5 Hz; all FDR-adjusted p < 0.01). The same applied for three of four second-order IM frequencies (f3 − 2f1 = 10.9 Hz, f3 + 2f1 = 14.1 Hz, f3 + 2f2 = 14.5 Hz; all FDR-adjusted p < 0.01). Only one second order IM component did not differ from 1 (f3 – 2f2 = 10.5 Hz; FDR-adjusted p = 0.27). The linear combination of the two SWIFT frequencies (f1 + f2 = 1.8 Hz) also did not show an SNR greater than 1 (FDR-adjusted p = 1.00). 
Figure 4.
 
Evoked power for SWIFT (blue), IM (magenta), and SSVEP (red) frequencies averaged across all conditions, posterior channels, and participants (log2 transformed for better scaling). SNRs for other frequencies (spaced by 0.03 Hz) within the frequency range are drawn in gray. For all frequencies, filled and open circles indicate SNRs that are significant and non-significant, respectively (see Table 1 and Supplementary Table S1 for FDR-adjusted t-tests).
Figure 4.
 
Evoked power for SWIFT (blue), IM (magenta), and SSVEP (red) frequencies averaged across all conditions, posterior channels, and participants (log2 transformed for better scaling). SNRs for other frequencies (spaced by 0.03 Hz) within the frequency range are drawn in gray. For all frequencies, filled and open circles indicate SNRs that are significant and non-significant, respectively (see Table 1 and Supplementary Table S1 for FDR-adjusted t-tests).
Table 1.
 
Results of one-tailed t-tests for SNRs of frequencies of interest. All frequency bins from 0.30 Hz to 28 Hz (832 bins) were tested, and p values were FDR adjusted. See Supplementary Table S1 for further tested frequencies.
Table 1.
 
Results of one-tailed t-tests for SNRs of frequencies of interest. All frequency bins from 0.30 Hz to 28 Hz (832 bins) were tested, and p values were FDR adjusted. See Supplementary Table S1 for further tested frequencies.
Topography of SWIFT, SSVEP, and IM
To map the topography of each signal, it was averaged across all conditions and participants, as well as the respective fundamentals and all harmonics of interest. As illustrated in Figure 5, the highest SNRs for the averaged SSVEP signal (f3 and 2f3) were found centrally over occipital and parieto-occipital electrodes. The averaged SWIFT signal (f1, f2, 2f1, and 2f2) was more lateralized to the right, with the strongest signal over parietal, parieto-occipital, and temporoparietal electrodes, while also showing peaks at central electrodes. The averaged IM signal (f3 − 2f2, f3 − 2f1, f3 − f2, f3 − f1, f3 + f1, f3 + f2, f3 + 2f1, and f3 + 2f2) had a distribution that was more similar to the more centralized SSVEP, with peaks over occipital and parieto-occipital electrodes. Topographies for individual frequencies are illustrated in Supplementary Figure S1
Figure 5.
 
Scalp topography of the averaged SSVEP (f3 = 12.5 Hz and 2f3 = 25 Hz), averaged SWIFT (f1 = 0.8 Hz, f2 = 1 Hz, 2f1 = 1.6 Hz, and 2f2 = 2 Hz), and averaged IM signal (f3 − 2f2, f3 − 2f1, f3 − f2, f3 − f1, f3 + f1, f3 + f2, f3 + 2f1, and f3 + 2f2). SNR was log2 transformed for better scaling. See Supplementary Figure S1 for the topography of all individual frequencies.
Figure 5.
 
Scalp topography of the averaged SSVEP (f3 = 12.5 Hz and 2f3 = 25 Hz), averaged SWIFT (f1 = 0.8 Hz, f2 = 1 Hz, 2f1 = 1.6 Hz, and 2f2 = 2 Hz), and averaged IM signal (f3 − 2f2, f3 − 2f1, f3 − f2, f3 − f1, f3 + f1, f3 + f2, f3 + 2f1, and f3 + 2f2). SNR was log2 transformed for better scaling. See Supplementary Figure S1 for the topography of all individual frequencies.
Modulatory effects of attention, congruence, and category on HFT components
To assess the effects of the experimental manipulation on SWIFT, SSVEP, and IM, the respective signals were averaged across the fundamental frequencies and the harmonics of interest that reached significance (SWIFT = f1, f2, 2f2, and 2f3; SSVEP = f3 and 2f3; IM = f3 − f2, f3 − f1, f3 + f1, f3 + f2, f3 + 2f1, and f3 + 2f2). Because the second-order IM of f2 (f3 − 2f2 = 10.5 Hz) was excluded, f3 – 2f1 = 10.9 Hz was also excluded for the sake of balancing. The topographies of the detected signals resulted in the following channel selection: A 16-channel occipital–parietal ROI for SSVEP and IM (Iz, O1/O2, Oz, P1–P6, PO3/PO4, PO7/PO8, POz, and Pz), and a more parietotemporal spanning 19-channel ROI for the SWIFT signal (Iz, O1/O2, Oz, P3–P10, PO3/PO4, PO7/PO8, POz, and TP7/TP8). 
Modulation of SWIFT
The LME model with attention, congruence, and category, as well as all two-way interactions, was the best in explaining log2(SNR) of the averaged SWIFT, as the three-way interaction of attention, congruence, and category did not improve the model, χ2(1) = 0.12, p = 0.725. The model showed variance in intercepts across participants (SD = 0.48; 99% CI, 0.31–0.74; ICC = 0.14) and further across channels (SD = 0.39; 99% CI, 0.33–0.46; ICC = 0.08). The inclusion of participants as random intercepts significantly improved the model (RLRT = 707.51; p < 0.001) and further so did channels nested in participants (RLRT = 196.52; p < 0.001). The model revealed a main effect of attention, β = 0.63, t(5694) = 39.08, p < 0.001, and category, β = –0.08, t(5694) = –5.04, p < 0.001. The main effect of congruence did not reach significance, β = 0.04, t(5694) = 2.26, p = 0.024, but the two-way interaction between congruence and attention did, with an attenuation of the attention effect with congruence compared to incongruence, β = –0.12, t(5694) = –7.32, p < 0.001. The two-way interaction of attention and category also reached significance, β = 0.09, t(5694) = 5.45, p < 0.001, as well as the two-way interaction between congruence and category, β = 0.07, t(5694) = 4.08, p < 0.001. Diagnostic plots for the models are presented in the Supplementary Material
To disentangle the interaction effects, two separate LME models were built for the stimulus category of faces and houses separately (illustrated in Figure 6). Whereas the main effect of attention was present for face stimuli, β = 0.72, t(2657) = 31.89, p < 0.001, as well as house stimuli, β = 0.54, t(2657) = 25.65, p < 0.001, the main effect of congruence was only present for face stimuli, β = 0.10, t(2657) = 4.54, p < 0.001, but not for house stimuli, β = –0.03, t(2657) = –1.38, p = 0.167. The two-way interaction between attention and congruence reached significance for both faces, β = –0.12, t(2657) = –5.49, p < 0.001, and houses, β = –0.11, t(2657) = –5.31, p < 0.001. 
Figure 6.
 
SWIFT and IM SNR as a function of attention, congruence, and category. Error bars indicate 99% confidence intervals. Significant slopes are indicated by *p < 0.01 and **p < 0.001.
Figure 6.
 
SWIFT and IM SNR as a function of attention, congruence, and category. Error bars indicate 99% confidence intervals. Significant slopes are indicated by *p < 0.01 and **p < 0.001.
To further evaluate the interaction effects, LME models for target and distractor were built separately for each category. For face targets, SWIFT did not differ between the congruent and incongruent condition, β = –0.02, t(1139) = –0.83, p = 0.405, but SWIFT of house targets showed a decline in SNR in the congruent condition compared with the incongruent condition, β = –0.14, t(1139) = 5.68, p < 0.001. For distractor SWIFT, both face distractor and house distractors showed an increased SNR in the congruent (vs. incongruent) condition, β = 0.23, t(1139) = 7.42, p < 0.001, and β = 0.08, t(1139) = 2.78, p = 0.005, respectively. 
Modulation of IM components
The most parsimonious model for the log2(SNR) of IM that was significantly better than the null model was the simplest model with attention as the only predictor. There was a main effect of attention, with target-associated IM being greater than distractor-associated IM, β = 0.12, t(4799) = 8.77, p < 0.001. The complex model including all predictors and main effects was only marginally better than the simplest model with only attention as predictor, χ2(6) = 13.91, p = 0.031, as it contained only one significant predictor next to attention—namely, the three-way interaction of attention, congruence, and category, β = –0.04, t(4793) = –2.65, p = 0.008. The complex model showed variance in intercepts across participants (SD = 0.18; 99% CI, 0.12–0.29; ICC = 0.04), and inclusion of participants as random intercept improved model fit (RLRT = 120.07; p < 0.001). The variance in intercepts across channels was different from zero (SD = 0.10; 99% CI, 0.04–0.22; ICC = 0.01), and model fit improved marginally (RLRT = 2.84; p = 0.04). Diagnostic plots for the models are presented in the Supplementary Material
Independent LME models for each stimulus category revealed that the main effect of attention was present for both faces, β = 0.13, t(2239) = 6.75, p < 0.001, and houses, β = 0.11, t(2239) = 5.71, p < 0.001. The two-way interaction between attention and congruence reached significance only for faces, β = –0.07, t(2239) = –3.41, p < 0.001, but not for houses, β = 0.01, t(2239) = 0.37, p = 0.709. For congruence alone, there was no main effect for either faces, β = –0.03, t(2239) = –1.33, p = 0.182, or houses, β = 0.004, t(2239) = 0.21, p = 0.835. 
Separate LME models for target and distractor, as well as each category, showed that the IM for face targets significantly declined with congruence, β = –0.09, t(959) = –3.46, p < 0.001, whereas the increase of face distractor IM did not reach significance, β = 0.04, t(959) = 1.50, p = 0.133. For house stimuli, the IM signal did not change significantly as a function of congruence, not for house targets, β = 0.01, t(959) = 0.43, p = 0.667, and not for house distractors, β = –0.003, t(959) = –0.12, p = 0.908. 
Modulation of SSVEP
The LME model containing congruence, target category, and their interaction as fixed effects was the best in predicting log2(SNR) of the averaged SSVEP. The model showed variance in intercepts across participants (SD = 0.66; 99% CI, 0.45–1.01; ICC = 0.13; RLRT = 370.97; p < 0.001) and further across channels (SD = 1.57; 99% CI, 1.44–1.71; ICC = 0.67; RLRT = 2586.40; p < 0.001. Significant effects were the main effect of congruence, β = 0.07, t(2237) = 4.28, p < 0.001, and target category, β = 0.12, t(2237) = 7.22, p < 0.001, as well as their interaction, β = 0.10, t(2237) = 5.71, p < 0.001. Separate LME models were constructed for each target category separately, as well as for each congruence condition. As illustrated in Figure 7, the effect of congruence was only present for faces, β = 0.17, t(959) = 7.84, p < 0.001, and not for houses, β = –0.02, t(959) = –1.05, p = 0.294. In the incongruent condition, there was no difference between categories, β = 0.03, t(959) = 1.13, p = 0.261, whereas the combination of two faces led to higher SSVEP signal compared to two houses, β = 0.22, t(959) = 8.57, p < 0.001. Diagnostic plots for the models are presented in the Supplementary Material
Figure 7.
 
SSVEP log2(SNR) as a function of congruence and target category. The presentation of two faces led to significantly higher SSVEP signal compared to other conditions (**p < 0.001). Error bars indicate 99% confidence intervals.
Figure 7.
 
SSVEP log2(SNR) as a function of congruence and target category. The presentation of two faces led to significantly higher SSVEP signal compared to other conditions (**p < 0.001). Error bars indicate 99% confidence intervals.
Discussion
The present study investigated the effects of category-selective attention on the signals evoked by the HFT method introduced by Gordon et al. (2017). Participants were presented with two superimposed SWIFT sequences augmented with a contrast modulation, with instructions to count the frequency of target objects while ignoring distractors. The results showed successful frequency tagging for SWIFT, SSVEP, and IM frequencies. Regarding the attentional modulation of SWIFT and IM, previous findings were replicated and extended. As hypothesized (H1 and H2), target-associated SWIFT and IM signals were larger than respective distractor associated signals. Notably, distractor-associated SWIFT signals were amplified when the distractor shared the category of the target, supporting the hypothesis of global enhancement of category processing by attention (H3). However, contrary to our expectations, the accompanying IM signal did not exhibit this modulation (H4). Additionally, we observed globally enhanced SSVEP responses when two faces were paired, in contrast to two houses or one stimulus of each category (H5). Our results also yielded some unexpected findings concerning the modulation of target-associated signals by category congruence depending on the stimulus category. These findings suggest complex interactions between category-selective attention and neural responses, highlighting the need for further investigation into the underlying mechanisms. 
SWIFT as a category-selective response
The topographies as well as the different modulations found for SWIFT, IM, and SSVEP imply that the HFT components indeed tag different levels of processing. The peak of the SWIFT signal at lateral-occipital and temporoparietal electrodes resembles the topography of other category-selective EEG responses, such as the N170 (Bentin et al., 1996; Bentin et al., 2002; Harris et al., 2011; Störmer et al., 2019), face- and house-selective magnetoencephalography (MEG) components (Baldauf & Desimone, 2014; Furey et al., 2006; Lueschow et al., 2004), or the category-selective responses evoked by rapid visual stream paradigms (Jacques et al., 2016; Quek et al., 2017; Rekow, Baudouin, Brochard, Rossion, & Leleu, 2022; Rossion et al., 2015). The observed lateralization to the right also fits with common lateralization findings concerning processing of faces (Rossion, 2014). This hints at an origin in category-selective cortical areas such as the OFA/FFA and PPA (Andrews et al., 2002; Baldauf & Desimone, 2014; Barton et al., 2002; Epstein, Harris, Stanley, & Kanwisher, 1999; Grill-Spector et al., 2004; Kanwisher et al., 1997; Koenig-Robert et al., 2015; Mégevand et al., 2014; O'Craven et al., 1999; Palmisano et al., 2023; Tong et al., 1998; Wada & Yamamoto, 2001; Yovel et al., 2008). 
The average differences in SWIFT signals between targets and distractors mirror other findings of attentional modulation reported for the SWIFT/HFT method (Koenig-Robert & VanRullen, 2013; Gordon, Tsuchiya, et al., 2019), similar phase-scrambling methods (Baldauf & Desimone, 2014), fast periodic visual stimulation paradigms (Quek et al., 2017), or classical SSVEP paradigms (Morgan et al., 1996; Störmer & Alvarez, 2014; Toffanin et al., 2009; Wang et al., 2007). This effect can be explained by the mechanism of sensory gain control of selective attention (Hillyard, Vogel, & Luck, 1998). In our counting task, there was a need to bias stimulus-specific and target-associated visual processing over distractor-associated processing. This biasing has been shown to occur through enhancement of task-relevant and suppression of task-irrelevant neuronal processing already in early parts of the visual cortex (Kastner & Ungeleider, 2000; Treue & Martinez-Trujillo, 2007). The differences in SWIFT signals between the congruence conditions, on the other hand, cannot be solely explained by stimulus-specific attentional modulation. 
As previous studies have shown, the biasing of target-relevant processing also occurs on the categorical level of visual processing. When a complex visual stimulus (e.g., a face) is attended, category-selective neuronal activity is enhanced, whereas irrelevant category processing is suppressed. This has been shown for other category-selective EEG, MEG, and fMRI responses (Baldauf & Desimone, 2014; Furey et al., 2006; Gazzaley et al., 2005; Lueschow et al., 2004; Quek et al., 2017; Störmer et al., 2019; Thorat & Peelen, 2022). Our results regarding the congruence modulation of distractor SWIFT signals, therefore, support the assumption that SWIFT represents a distillation of higher-order visual processing (Koenig-Robert & VanRullen, 2013; Koenig-Robert et al., 2015). As long as target and distractor were from different categories, attention could effectively enhance the processing in category-selective areas necessary for target processing while suppressing the alternative category-selective areas. This mechanism appeared to be less effective when target and distractor shared the same category, as both stimuli demanded the same category-selective cortical areas. A stronger distractor SWIFT signal could therefore result from either an enhancement or a reduction of suppression of category-selective processing. Unfortunately, we cannot clearly distinguish between these processes in the present study due to the lack of a suitable attentional baseline condition. The subjective reports (see Supplementary Material S1), and the behavioral data indicated that this failed biasing had an impact on the phenomenology and thus impaired the performance in the counting task. The higher error scores in the congruent condition might be a consequence of the distractors being more consciously perceived and therefore distracting, as a result of category-selective attention. Adopting the statement of one participant, it was easier to suppress the irrelevant stimulus when it was from another category. Although this remains highly anecdotal, it fits with previous findings of SWIFT being tied to conscious recognition (Koenig-Robert & VanRullen, 2012; Koenig-Robert & VanRullen, 2013). Category-selective OFA/FFA and PPA activity has been shown to correlate with conscious experience of faces and houses, respectively (Andrews et al., 2002; Barton et al., 2002; Mégevand et al., 2014; Palmisano et al., 2023; Tong et al., 1998; Wada & Yamamoto, 2001), thus fostering the hypothesis that SWIFT originates from these category-selective areas. The persistent differences in signal between congruent target and distractor SWIFT signals indicate that sensitivity to individual stimulus characteristics remained intact. The FFA and PPA are known to be capable of differentiating between individual exemplars and mid-level features within their preferred categories (i.e., different faces or places based on their unique features) (Coggan, Baker, & Andrews, 2019; Epstein et al., 1999; Epstein, Higgins, Jablonski, & Feiler, 2007; Kanwisher & Yovel, 2006). Moreover, although Koenig-Robert et al. (2015) did not identify labeling of low-level visual areas in their fMRI study, they observed partial labeling of mid-level visual areas. This suggests that the SWIFT signal in our paradigm does not solely reflect the processing of generic categories through overarching or purely semantic processing. Instead, it retains the capacity to discern individual instances within those categories, similar to other category-selective EEG markers (Rossion, 2014). 
Integration of SWIFT and SSVEP: The IM component
The topography of the IM signal exhibited similarities to that of the SSVEP. In contrast to SWIFT, the SSVEP signal was centralized over parieto-occipital electrodes conforming with an origin in primary visual cortex (Di Russo et al., 2007), and suggesting a successful tagging of low-level visual processing. The resemblance in topography between the IM and SSVEP signals suggests that the integrating process tagged by IM occurs in earlier stages of the visual system (Gordon et al., 2017). 
An attentional modulation of the IM signal was reported earlier (Gordon, Tsuchiya, et al., 2019), suggesting that it is related to specific processing of one of the two individual stimuli that was modulated by a general attentional biasing process (Hillyard et al., 1998; Kastner & Ungerleider, 2000; Treue & Martinez-Trujillo, 2007). The absence of a congruence effect for IM signals of distractors, as found for associated SWIFT signals, yields two implications. First, it indicates that the process tagged by IM may not be modulated by category-selective attention. Second, it suggests that the IM component is not solely driven by the SWIFT signal, pointing to a potentially separable process. The isolated congruence effect observed for the IM of face targets, not present for the SWIFT signal, supports this notion, aligning with findings of independence or even inverse effects between SWIFT and IM in previous studies (Coll et al., 2020; Gordon et al., 2017; Gordon, Tsuchiya, et al., 2019; Koenig-Robert et al., 2023). Additionally, the found attenuation of congruent face target IM could also not be explained by the signal pattern found for the SSVEP, as the SSVEP appeared to be enhanced in the congruent face condition. This latter finding was surprising but could be explained by a stronger attentional capture of faces in general (Langton, Law, Burton, & Schweinberger, 2008), and a subsequent enhancement of attended low-level features of the alpha blended sequence in early visual cortex (Davidson, Mithen, Hogendoorn, van Boxtel, & Tsuchiya, 2020; Smout & Mattingley, 2018). In sum, these findings indicate that different mechanisms were tagged by SWIFT, SSVEP, and especially IM. However, the question remains as to exactly which process the IM components marks and how to explain the isolated effect on IM face targets. 
IM components in general are associated with integration of neuronal information that is represented by the driving fundamental frequencies (Gordon, Hohwy, et al., 2019). Gordon et al. (2017) pointed out that “the presence of IMs in themselves therefore cannot point conclusively at specific computational or neuronal processes to which the IMs could be mapped” (p. 9). The IM should be evaluated in the context of each experimental paradigm individually. Importantly, the IM frequencies were clearly detectable in the EEG spectrum, whereas the linear combination of the two SWIFT frequencies was not. This indicates that an integration of frequencies does not occur automatically for each presented frequency and that an actual process that integrates information between early visual cortex and higher category-selective areas is taking place. 
Previous IM studies investigated the integration of visual processes that were tagged by SSVEPs which were generated on the same level of visual processing (Aissani et al., 2011; Alp et al., 2016; Alp et al., 2017; Boremanse et al., 2014; Brown et al., 1999; Gundlach & Müller, 2013; Sutoyo & Srinivasan, 2009; Zhang et al., 2011). The resulting IM signal in these cases mostly represented an integration process that can be seen as more elaborated than the process that is tagged by fundamental frequencies, such as the integration of point light displays into human-like shapes (Alp et al., 2017) or the perceptual binding into integrated forms (Aissani et al., 2011). It is difficult to draw on these previous results when interpreting the present findings because the IM component in our study presumably tags the integration of high-level and low-level visual processing. Adhering to the predictive processing framework, as utilized in previous studies (Coll et al., 2020; Gordon et al., 2017; Gordon, Tsuchiya, et al., 2019; Koenig-Robert et al., 2023), IM components are considered markers of the efficient integration or fit between predictive top–down signals (SWIFT) and sensory bottom–up signals (SSVEP). Within this framework, the neuronal integration of face target processing was impaired when the distractor stimulus was also a face, whereas it exhibited enhanced integration when the distractor was a house. As attention modulates this integration (Gordon, Tsuchiya, et al., 2019), the reduction in the IM signal of face targets could signify a disruption of integration due to attentional capture by the congruent distractor (Ariga & Yokosawa, 2008; Folk, Remington, & Johnston, 1992; Wu, Liu, & Fu, 2016). However, one might anticipate an effect for the attention-sensitive SWIFT signal in this case. Another explanation could be that the assignment of higher order top–down signals to low-level visual information was less clear in the congruent condition, potentially hindering sufficient integration and leading to more prediction errors, resulting in a reduction of the IM signal (Coll et al., 2020; Gordon et al., 2017; Gordon, Hohwy, et al., 2019; Koenig-Robert et al., 2023). Nevertheless, because we did not deliberately manipulate predictability in our study, an interpretation in the light of the predictive coding framework remains speculative. Additionally, the absence of this effect for the IM of house targets remains unexplained. In sum, more research on the mechanism that is captured by the IM components of the HFT method would be necessary to better understand the reasons behind the isolated modulation of the face target IM signal. 
The effect of category congruence on the SWIFT signal of house targets
The observed reduction in SWIFT signals for congruent house targets raises questions about the underlying modulation mechanism and the specificity to houses. One explanation could be that distractors that share task-relevant visual features or more abstract properties can capture attention, thereby impairing the processing of the target (Ariga & Yokosawa, 2008; Folk et al., 1992; Wu et al., 2016). Gordon, Tsuchiya, et al. (2019) reported that the SWIFT signal disappeared completely when attention was withdrawn from the SWIFT stimulus and was captured by a fixation cross task. The decreased counting task accuracy in the congruent condition of our paradigm indeed suggest that congruent distractors were more difficult to ignore. Nevertheless, there was no effect of target category on the accuracy. Further, one would also expect to find a reduction of target SWIFT especially when faces are distractors, because faces are biologically relevant stimuli which capture attention even without task relevance (Langton et al., 2008). In fact, the present results suggest that congruent distractor faces did capture category-selective attention. Therefore, it does not seem likely that the SWIFT reduction of congruent house targets is a result of attentional capture. 
Considering some of the subjective reports, counting congruent house targets was challenging due to difficulties in differentiating blended house stimuli. The house stimuli were reported as being more similar to each other regarding the position and content of distinctive features. The house stimuli, primarily composed of straight lines and edges, also posed challenges in creating easily distinguishable congruent pairs during the creation phase of the experiment. Recognizing a house and thus eliciting a category-selective response relies on their features appearing in their characteristic combination. Partial occlusion by noise of an object with similar lines and edges could impair the defining configuration. If the recognizability of the house target was impaired in some trials due to the congeneric house noise of the distractor SWIFT sequence, then the decline of house-target SWIFT would be plausible, as SWIFT is tied to recognizability (Koenig-Robert & VanRullen, 2013). In contrast to houses, faces are comprised of more distinctive features, such as the eyes, which can produce a face-selective ERP response in isolation (Itier, Alain, Sedore, & McIntosh, 2007). As Rossion (2014) put it: “Unlike meaningful shapes that can be formed of simple meaningless elements (e.g., a triangle formed of three connected lines), a facial part such as an eye is meaningful by itself and may suffice to activate other parts or an entire face representation by completion” (p. 315). Thus, faces might have experienced less disruption in semantics in the congruent condition. Because participants almost unanimously stated that they concentrated on salient features to solve the task, this likely mitigated the impact of reduced house recognizability on behavioral performance. However, if the overall visibility of houses was compromised, this raises the question of why there was an enhancement of SWIFT for congruent house distractors compared to incongruent ones. One possible interpretation is that the congruency effect reflects not only enhancement of congruent distractors but also the suppression of incongruent distractors (Furey et al., 2006; Gazzaley et al., 2005; Quek et al., 2017). To disentangle these two possibilities, a neutral baseline condition would be necessary. Gordon, Tsuchiya, et al. (2019) tried this with their central fixation cross task, but this attempt failed because the SWIFT signal was not detectable in this condition, which illustrates the difficulty in constructing a neutral baseline condition. Maybe a less demanding neutral task or a task that directs attention to both SWIFT sequences could yield promising results. 
Limitations of the present study and future directions
There are some limitations regarding the task design that future studies could improve, such as the potential confounds concerning the stimulus construction and possible disruption of visibility in the congruent (house) condition. The absence of the congruence effect on house target IM (vs. face target IM) could be a result of that. Future studies could opt for stimuli that are less similar in their physical appearance or use attentional paradigms that do not require two blended SWIFT sequences. The phenomenology of the SWIFT sequences could be assessed more systematically in future studies given its association with conscious recognition (Koenig-Robert & VanRullen, 2013). In our present study, subjective reports served as supplementary rather than primary data sources. Nevertheless, they provided valuable insights into potential systematic processes. For example, some participants perceived a third stimulus next to target and distractor, raising the question of whether this could be objectively detected via SWIFT or the IM responses. Due to the limited number of participants and the post-experiment assessment of phenomenology, these observations could not be analyzed in detail. In summary, investigating the correlation between phenomenology and HFT components could provide further insights into their interpretation. 
Because the IM signals were quite weak compared with the SWIFT and SSVEP signals, there might have been effects that were undetected due to the present method of analysis or the moderate number of participants. Moreover, a wider range of IM frequencies was averaged in the present study. Because the IM components might behave differently, dependent on the order of the harmonics that are observed (Gordon, Hohwy, et al., 2019; Gordon, Tsuchiya, et al., 2019), there could have been different effects on different harmonics. Due to the small amplitude and less clear topography of the individual IM frequencies (see Supplementary Figure S1), we chose to aggregate them instead of analyzing individual frequencies. Nevertheless, future studies could concentrate on evoking clearer IM signals and analyzing potential differences between them. 
Originally, Gordon et al. (2017) used and developed the HFT method to study processes regarding the predictive coding framework (Friston, 2005; Rao & Ballard, 1999). The current study, on the other hand, mainly focused on the process of category-selective attention and its influence on the hierarchical-organized visual system. Because we addressed somewhat different theoretical questions than Gordon et al. (2017), a theoretical assessment of our (IM) results was less straightforward. Nevertheless, our research validates the HFT method as a promising tool to study a wide variety of processes that take place in the visual system. We hope that future research will make further use of this method to broaden the understanding of its different signal components. 
Conclusions
Taken all together, the present study successfully adopted the HFT method and gave some additional insights into the properties of its different components. Especially, SWIFT remains of interest, as it has been shown to share properties with other markers of category-selective processing. The advantages of the frequency tagging approach and its ability to tag processing at different levels of the visual hierarchy validate it as a valuable tool in the research of higher order visual processing. The HFT method remains promising, but more research is needed to clearly understand the interplay of its components. 
Acknowledgments
This publication was supported by two publications funds: “NiedersachsenOPEN”, funded by zukunft.niedersachsen, and the Open Access Publication Funds/transformative agreements of the “Göttingen University”. 
Commercial relationships: none. 
Corresponding author: Thorsten Albrecht. 
Email: thorsten.albrecht@biologie.uni-goettingen.de. 
Address: Georg-Elias-Müller Institute for Psychology, Georg-August Universität Göttingen, Goßlerstr. 14, Göttingen 37073, Germany. 
References
Aarts, E., Verhage, M., Veenvliet, J. V., Dolan, C. V., & Van Der Sluis, S. (2014). A solution to dependency: Using multilevel analysis to accommodate nested data. Nature Neuroscience, 17(4), 491–496, https://doi.org/10.1038/nn.3648. [CrossRef] [PubMed]
Aissani, C., Cottereau, B., Dumas, G., Paradis, A.-L., & Lorenceau, J. (2011). Magnetoencephalographic signatures of visual form and motion binding. Brain Research, 1408, 27–40, https://doi.org/10.1016/j.brainres.2011.05.051. [CrossRef] [PubMed]
Alp, N., Kogo, N., Van Belle, G., Wagemans, J., & Rossion, B. (2016). Frequency tagging yields an objective neural signature of Gestalt formation. Brain and Cognition, 104, 15–24, https://doi.org/10.1016/j.bandc.2016.01.008. [CrossRef] [PubMed]
Alp, N., Nikolaev, A. R., Wagemans, J., & Kogo, N. (2017). EEG frequency tagging dissociates between neural processing of motion synchrony and human quality of multiple point-light dancers. Scientific Reports, 7(1), 44012, https://doi.org/10.1038/srep44012. [CrossRef] [PubMed]
Aly, M., Welinder, P., Munich, M., & Perona, P. (2009). Scaling object recognition: Benchmark of current state of the art techniques. In 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops (pp. 2117–2124). IEEE.
Andersen, S. K., Hillyard, S. A., & Müller, M. M. (2013). Global facilitation of attended features is obligatory and restricts divided attention. Journal of Neuroscience, 33(46), 18200–18207, https://doi.org/10.1523/JNEUROSCI.1913-13.2013. [CrossRef]
Andrews, T. J., Schluppeck, D., Homfray, D., Matthews, P., & Blakemore, C. (2002). Activity in the fusiform gyrus predicts conscious perception of Rubin's vase–face illusion. NeuroImage, 17(2), 890–901, https://doi.org/10.1006/nimg.2002.1243. [PubMed]
Ariga, A., & Yokosawa, K. (2008). Contingent attentional capture occurs by activated target congruence. Perception & Psychophysics, 70(4), 680–687, https://doi.org/10.3758/PP.70.4.680. [PubMed]
Baldauf, D., & Desimone, R. (2014). Neural mechanisms of object-based attention. Science, 344(6182), 424–427, https://doi.org/10.1126/science.1247003. [PubMed]
Barton, J. J. S., Press, D. Z., Keenan, J. P., & O'Connor, M. (2002). Lesions of the fusiform face area impair perception of facial configuration in prosopagnosia. Neurology, 58(1), 71–78, https://doi.org/10.1212/WNL.58.1.71. [PubMed]
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48, https://doi.org/10.18637/jss.v067.i01.
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188, https://doi.org/10.1214/aos/1013699998.
Bentin, S., Allison, T., Puce, A., Perez, E., & McCarthy, G. (1996). Electrophysiological studies of face perception in humans. Journal of Cognitive Neuroscience, 8(6), 551–565, https://doi.org/10.1162/jocn.1996.8.6.551. [PubMed]
Bentin, S., Sagiv, N., Mecklinger, A., Friederici, A., & von Cramon, Y. D. (2002). Priming Visual face-processing mechanisms: Electrophysiological evidence. Psychological Science, 13(2), 190–193, https://doi.org/10.1111/1467-9280.00435. [PubMed]
Boremanse, A., Norcia, A. M., & Rossion, B. (2014). Dissociation of part-based and integrated neural responses to faces by means of electroencephalographic frequency tagging. European Journal of Neuroscience, 40(6), 2987–2997, https://doi.org/10.1111/ejn.12663.
Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10(4), 433–436, https://doi.org/10.1163/156856897X00357. [PubMed]
Brown, R. J., Candy, T. R., & Norcia, A. M. (1999). Development of rivalry and dichoptic masking in human infants. Investigative Ophthalmology & Visual Science, 40(13), 3324–3333. [PubMed]
Brown, R. J., & Norcia, A. M. (1997). A method for investigating binocular rivalry in real-time with the steady-state VEP. Vision Research, 37(17), 2401–2408, https://doi.org/10.1016/s0042-6989(97)00045-x. [PubMed]
Caharel, S., Leleu, A., Bernard, C., Viggiano, M.-P., Lalonde, R., & Rebaï, M. (2013). Early holistic face-like processing of Arcimboldo paintings in the right occipito-temporal cortex: Evidence from the N170 ERP component. International Journal of Psychophysiology, 90(2), 157–164, https://doi.org/10.1016/j.ijpsycho.2013.06.024.
Coggan, D. D., Baker, D. H., & Andrews, T. J. (2019). Selectivity for mid-level properties of faces and places in the fusiform face area and parahippocampal place area. European Journal of Neuroscience, 49(12), 1587–1596, https://doi.org/10.1111/ejn.14327.
Coll, M.-P., Whelan, E., Catmur, C., & Bird, G. (2020). Autistic traits are associated with atypical precision-weighted integration of top-down and bottom-up neural signals. Cognition, 199, 104236, https://doi.org/10.1016/j.cognition.2020.104236. [PubMed]
Davidson, M. J., Mithen, W., Hogendoorn, H., van Boxtel, J. J., & Tsuchiya, N. (2020). The SSVEP tracks attention, not consciousness, during perceptual filling-in. eLife, 9, e60031, https://doi.org/10.7554/eLife.60031. [PubMed]
de Heering, A., Beauny, A., Vuillaume, L., Salvesen, L., & Cleeremans, A. (2020). The SSVEP tool as a marker of subjective visibility. BioRxiv, 588236, https://doi.org/10.1101/588236.
Delorme, A., & Makeig, S. (2004). EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods, 134(1), 9–21, https://doi.org/10.1016/j.jneumeth.2003.10.009. [PubMed]
Di Russo, F., Pitzalis, S., Aprile, T., Spitoni, G., Patria, F., Stella, A., … Hillyard, S. A. (2007). Spatiotemporal analysis of the cortical sources of the steady-state visual evoked potential. Human Brain Mapping, 28(4), 323–334, https://doi.org/10.1002/hbm.20276. [PubMed]
Epstein, R. A., Higgins, J. S., Jablonski, K., & Feiler, A. M. (2007). Visual scene processing in familiar and unfamiliar environments. Journal of Neurophysiology, 97(5), 3670–3683, https://doi.org/10.1152/jn.00003.2007. [PubMed]
Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environment. Nature, 392(6676), 598–601, https://doi.org/10.1038/33402. [PubMed]
Epstein, R., Harris, A., Stanley, D., & Kanwisher, N. (1999). The parahippocampal place area: Recognition, navigation, or encoding? Neuron, 23(1), 115–125, https://doi.org/10.1016/S0896-6273(00)80758-8. [PubMed]
Folk, C. L., Remington, R. W., & Johnston, J. C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18(4), 1030–1044, https://doi.org/10.1037/0096-1523.18.4.1030. [PubMed]
Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1456), 815–836, https://doi.org/10.1098/rstb.2005.1622.
Furey, M. L., Tanskanen, T., Beauchamp, M. S., Avikainen, S., Uutela, K., Hari, R., … Haxby, J. V. (2006). Dissociation of face-selective cortical responses by attention. Proceedings of the National Academy of Sciences, USA, 103(4), 1065–1070, https://doi.org/10.1073/pnas.0510124103.
Gauthier, I., Tarr, M. J., Moylan, J., Skudlarski, P., Gore, J. C., & Anderson, A. W. (2000). The fusiform “face area” is part of a network that processes faces at the individual level. Journal of Cognitive Neuroscience, 12(3), 495–504, https://doi.org/10.1162/089892900562165. [PubMed]
Gazzaley, A., Cooney, J. W., McEvoy, K., Knight, R. T., & D'Esposito, M. (2005). Top-down enhancement and suppression of the magnitude and speed of neural activity. Journal of Cognitive Neuroscience, 17(3), 507–517, https://doi.org/10.1162/0898929053279522. [PubMed]
George, N., Jemel, B., Fiori, N., Chaby, L., & Renault, B. (2005). Electrophysiological correlates of facial decision: Insights from upright and upside-down Mooney-face perception. Cognitive Brain Research, 24(3), 663–673, https://doi.org/10.1016/j.cogbrainres.2005.03.017.
Gordon, N., Hohwy, J., Davidson, M. J., van Boxtel, J. J. A., & Tsuchiya, N. (2019). From intermodulation components to visual perception and cognition-a review. NeuroImage, 199, 480–494, https://doi.org/10.1016/j.neuroimage.2019.06.008. [PubMed]
Gordon, N., Koenig-Robert, R., Tsuchiya, N., van Boxtel, J. J., & Hohwy, J. (2017). Neural markers of predictive coding under perceptual uncertainty revealed with Hierarchical Frequency Tagging. eLife, 6, e22749, https://doi.org/10.7554/eLife.22749. [PubMed]
Gordon, N., Tsuchiya, N., Koenig-Robert, R., & Hohwy, J. (2019). Expectation and attention increase the integration of top-down and bottom-up signals in perception through different pathways. PLoS Biology, 17(4), e3000233, https://doi.org/10.1371/journal.pbio.3000233. [PubMed]
Grill-Spector, K., Knouf, N., & Kanwisher, N. (2004). The fusiform face area subserves face perception, not generic within-category identification. Nature Neuroscience, 7(5), 555–562, https://doi.org/10.1038/nn1224. [PubMed]
Gundlach, C., & Müller, M. M. (2013). Perception of illusory contours forms intermodulation responses of steady state visual evoked potentials as a neural signature of spatial integration. Biological Psychology, 94(1), 55–60, https://doi.org/10.1016/j.biopsycho.2013.04.014. [PubMed]
Harris, J. A., Wu, C.-T., & Woldorff, M. G. (2011). Sandwich masking eliminates both visual awareness of faces and face-specific brain activity through a feedforward mechanism. Journal of Vision, 11(7):3, 1–12, https://doi.org/10.1167/11.7.3.
Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4(6), 223–233, https://doi.org/10.1016/S1364-6613(00)01482-0. [PubMed]
Heise, M. J., Mon, S. K., & Bowman, L. C. (2022). Utility of linear mixed effects models for event-related potential research with infants and children. Developmental Cognitive Neuroscience, 54, 101070, https://doi.org/10.1016/j.dcn.2022.101070. [PubMed]
Hillyard, S. A., Vogel, E. K., & Luck, S. J. (1998). Sensory gain control (amplification) as a mechanism of selective attention: Electrophysiological and neuroimaging evidence. Philosophical Transactions of the Royal Society B: Biological Sciences, 353(1373), 1257–1270, https://doi.org/10.1098/rstb.1998.0281.
Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology, 195(1), 215–243, https://doi.org/10.1113/jphysiol.1968.sp008455. [PubMed]
Itier, R. J., Alain, C., Sedore, K., & McIntosh, A. R. (2007). Early face processing specificity: It's in the eyes! Journal of Cognitive Neuroscience, 19(11), 1815–1826, https://doi.org/10.1162/jocn.2007.19.11.1815. [PubMed]
Jacques, C., Retter, T. L., & Rossion, B. (2016). A single glance at natural face images generates larger and qualitatively different category-selective spatio-temporal signatures than other ecologically-relevant categories in the human brain. NeuroImage, 137, 21–33, https://doi.org/10.1016/j.neuroimage.2016.04.045. [PubMed]
Jung, T. P., Makeig, S., Humphries, C., Lee, T. W., Mckeown, M. J., Iragui, V., … Sejnowski, T. J. (2000). Removing electroencephalographic artifacts by blind source separation. Psychophysiology, 37(2), 163–178, https://doi.org/10.1111/1469-8986.3720163. [PubMed]
Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: A module in human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17(11), 4302–4311, https://doi.org/10.1523/JNEUROSCI.17-11-04302.1997.
Kanwisher, N., & Yovel, G. (2006). The fusiform face area: A cortical region specialized for the perception of faces. Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1476), 2109–2128, https://doi.org/10.1098/rstb.2006.1934.
Kaspar, K., Hassler, U., Martens, U., Trujillo-Barreto, N., & Gruber, T. (2010). Steady-state visually evoked potential correlates of object recognition. Brain Research, 1343, 112–121, https://doi.org/10.1016/j.brainres.2010.04.072. [PubMed]
Kastner, S., & Ungerleider, L. G. (2000). Mechanisms of visual attention in the human cortex. Annual Review of Neuroscience, 23, 315–341, https://doi.org/10.1146/annurev.neuro.23.1.315. [PubMed]
Keitel, C., Keitel, A., Benwell, C. S. Y., Daube, C., Thut, G., & Gross, J. (2019). Stimulus-driven brain rhythms within the alpha band: The attentional-modulation conundrum. Journal of Neuroscience, 39(16), 3119–3129, https://doi.org/10.1523/JNEUROSCI.1633-18.2019.
Kim, Y.-J., Tsai, J. J., Ojemann, J., & Verghese, P. (2017). Attention to multiple objects facilitates their integration in prefrontal and parietal cortex. Journal of Neuroscience, 37(19), 4942–4953, https://doi.org/10.1523/JNEUROSCI.2370-16.2017.
Kleiner, M., Brainard, D., Pelli, D., Ingling, A., Murray, R., & Broussard, C. (2007). What's new in Psychtoolbox-3. Perception, 36(14), 1–16.
Koenig-Robert, R., Pace, T., Pearson, J., & Hohwy, J. (2023). Time-resolved hierarchical frequency-tagging reveals markers of predictive processing in the action-perception loop. Authorea Preprints, https://doi.org/10.22541/au.167338513.30310833/v1.
Koenig-Robert, R., & VanRullen, R. (2012). Semantic wavelet-induced frequency tagging (SWIFT) tracks perceptual awareness alternations in an all-or-none fashion. Journal of Vision, 12(9), 114, https://doi.org/10.1167/12.9.114.
Koenig-Robert, R., & VanRullen, R. (2013). SWIFT: A novel method to track the neural correlates of recognition. NeuroImage, 81, 273–282, https://doi.org/10.1016/j.neuroimage.2013.04.116. [PubMed]
Koenig-Robert, R., VanRullen, R., & Tsuchiya, N. (2015). Semantic wavelet-induced frequency-tagging (SWIFT) periodically activates category selective areas while steadily activating early visual areas. PLoS One, 10(12), e0144858, https://doi.org/10.1371/journal.pone.0144858. [PubMed]
Koerner, T. K., & Zhang, Y. (2017). Application of linear mixed-effects models in human neuroscience research: A comparison with Pearson correlation in two auditory electrophysiology studies. Brain Sciences, 7(3), 26, https://doi.org/10.3390/brainsci7030026. [PubMed]
Langton, S. R. H., Law, A. S., Burton, A. M., & Schweinberger, S. R. (2008). Attention capture by faces. Cognition,107(1), 330–342, https://doi.org/10.1016/j.cognition.2007.07.012. [PubMed]
Lauritzen, T. Z., D'Esposito, M., Heeger, D. J., & Silver, M. A. (2009). Top–down flow of visual spatial attention signals from parietal to occipital cortex. Journal of Vision, 9(13):18, 1–14, https://doi.org/10.1167/9.13.18. [PubMed]
Logothetis, N. K., & Sheinberg, D. L. (1996). Visual object recognition. Annual Review of Neuroscience, 19(1), 577–621, https://doi.org/10.1146/annurev.ne.19.030196.003045. [PubMed]
Lopez-Calderon, J., & Luck, S. J. (2014). ERPLAB: An open-source toolbox for the analysis of event-related potentials. Frontiers in Human Neuroscience, 8, 213, https://doi.org/10.3389/fnhum.2014.00213. [PubMed]
Lueschow, A., Sander, T., Boehm, S. G., Nolte, G., Trahms, L., & Curio, G. (2004). Looking for faces: Attention modulates early occipitotemporal object processing. Psychophysiology, 41(3), 350–360, https://doi.org/10.1111/j.1469-8986.2004.00159.x. [PubMed]
Martens, U., Wahl, P., Hassler, U., Friese, U., & Gruber, T. (2012). Implicit and explicit contributions to object recognition: Evidence from rapid perceptual learning. PLoS One, 7(10), e47009, https://doi.org/10.1371/journal.pone.0047009. [PubMed]
Mégevand, P., Groppe, D. M., Goldfinger, M. S., Hwang, S. T., Kingsley, P. B., Davidesco, I., … Mehta, A. D. (2014). Seeing scenes: Topographic visual hallucinations evoked by direct electrical stimulation of the parahippocampal place area. Journal of Neuroscience, 34(16), 5399–5405, https://doi.org/10.1523/JNEUROSCI.5202-13.2014.
Minami, T., Azuma, K., & Nakauchi, S. (2020). Steady-state visually evoked potential is modulated by the difference of recognition condition. PLoS One, 15(7), e0235309, https://doi.org/10.1371/journal.pone.0235309. [PubMed]
Morgan, S. T., Hansen, J. C., & Hillyard, S. A. (1996). Selective attention to stimulus location modulates the steady-state visual evoked potential. Proceedings of the National Academy of Sciences, USA, 93(10), 4770–4774, https://doi.org/10.1073/pnas.93.10.4770.
Nauhaus, I., Nielsen, K. J., Disney, A. A., & Callaway, E. M. (2012). Orthogonal micro-organization of orientation and spatial frequency in primate primary visual cortex. Nature Neuroscience, 15(12), 1683–1690, https://doi.org/10.1038/nn.3255. [PubMed]
Norcia, A. M., Appelbaum, L. G., Ales, J. M., Cottereau, B. R., & Rossion, B. (2015). The steady-state visual evoked potential in vision research: A review. Journal of Vision, 15(6):4, 1–46, https://doi.org/10.1167/15.6.4.
O'Craven, K. M., Downing, P. E., & Kanwisher, N. (1999). FMRI evidence for objects as the units of attentional selection. Nature, 401(6753), 584–587, https://doi.org/10.1038/44134. [PubMed]
Palmisano, A., Chiarantoni, G., Bossi, F., Conti, A., D'Elia, V., Tagliente, S., … Rivolta, D. (2023). Face pareidolia is enhanced by 40 Hz transcranial alternating current stimulation (tACS) of the face perception network. Scientific Reports, 13(1), 2035, https://doi.org/10.1038/s41598-023-29124-8. [PubMed]
Perona, P., & Helle, K. (2000). Pasadena Houses 2000 [Data set]. CaltechDATA, https://doi.org/10.22002/xfh27-6t426.
Pinheiro, J., Bates, D., DebRoy, S., & Sarkar, D. (2021). R Core Team (2021) nlme: Linear and nonlinear mixed effects models. Retrieved from https://CRAN.R-project.org/package=nlme.
Quek, G., Nemrodov, D., Rossion, B., & Liu-Shuang, J. (2017). Selective attention to faces in a rapid visual stream: Hemispheric differences in enhancement and suppression of category-selective neural activity. Journal of Cognitive Neuroscience, 30(3), 393–410, https://doi.org/10.1162/jocn_a_01220. [PubMed]
Rao, R. P. N., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–87, https://doi.org/10.1038/4580. [PubMed]
Regan, D. (1966). Some characteristics of average steady-state and transient responses evoked by modulated light. Electroencephalography and Clinical Neurophysiology, 20(3), 238–248, https://doi.org/10.1016/0013-4694(66)90088-5. [PubMed]
Rekow, D., Baudouin, J. Y., Brochard, R., Rossion, B., & Leleu, A. (2022). Rapid neural categorization of facelike objects predicts the perceptual awareness of a face (face pareidolia). Cognition, 222, 105016, https://doi.org/10.1016/j.cognition.2022.105016. [PubMed]
Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019–1025, https://doi.org/10.1038/14819. [PubMed]
Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8(3), 382–439, https://doi.org/10.1016/0010-0285(76)90013-X.
Rossion, B. (2014). Understanding face perception by means of human electrophysiology. Trends in Cognitive Sciences, 18(6), 310–318, https://doi.org/10.1016/j.tics.2014.02.013. [PubMed]
Rossion, B., Torfs, K., Jacques, C., & Liu-Shuang, J. (2015). Fast periodic presentation of natural images reveals a robust face-selective electrophysiological response in the human brain. Journal of Vision, 15(1):18, 1–18, https://doi.org/10.1167/15.1.18.
Saalmann, Y. B., Pigarev, I. N., & Vidyasagar, T. R. (2007). Neural mechanisms of visual attention: How top-down feedback highlights relevant locations. Science, 316(5831), 1612–1615, https://doi.org/10.1126/science.1139140. [PubMed]
Saenz, M., Buracas, G. T., & Boynton, G. M. (2002). Global effects of feature-based attention in human visual cortex. Nature Neuroscience, 5(7), 631–632, https://doi.org/10.1038/nn876. [PubMed]
Scheipl, F., Greven, S., & Kuechenhoff, H. (2008). Size and power of tests for a zero random effect variance or polynomial regression in additive and linear mixed models. Computational Statistics & Data Analysis, 52(7), 3283–3299, https://doi.org/10.1016/j.csda.2007.10.022.
Sclar, G., Maunsell, J. H. R., & Lennie, P. (1990). Coding of image contrast in central visual pathways of the macaque monkey. Vision Research, 30(1), 1–10, https://doi.org/10.1016/0042-6989(90)90123-3. [PubMed]
Serences, J. T., & Boynton, G. M. (2007). Feature-based attentional modulations in the absence of direct visual stimulation. Neuron, 55(2), 301–312, https://doi.org/10.1016/j.neuron.2007.06.015. [PubMed]
Smout, C. A., & Mattingley, J. B. (2018). Spatial attention enhances the neural representation of invisible signals embedded in noise. Journal of Cognitive Neuroscience, 30(8), 1119–1129, https://doi.org/10.1162/jocn_a_01283. [PubMed]
Störmer, V. S., & Alvarez, G. A. (2014). Feature-based attention elicits surround suppression in feature space. Current Biology, 24(17), 1985–1988, https://doi.org/10.1016/j.cub.2014.07.030.
Störmer, V. S., Cohen, M. A., & Alvarez, G. A. (2019). Tuning attention to object categories: Spatially global effects of attention to faces in visual processing. Journal of Cognitive Neuroscience, 31(7), 937–947, https://doi.org/10.1162/jocn_a_01400. [PubMed]
Sutoyo, D., & Srinivasan, R. (2009). Nonlinear SSVEP responses are sensitive to the perceptual binding of visual hemifields during conventional ‘eye’ rivalry and interocular ‘percept’ rivalry. Brain Research, 1251, 245–255, https://doi.org/10.1016/j.brainres.2008.09.086. [PubMed]
Thorat, S., & Peelen, M. V. (2022). Body shape as a visual feature: Evidence from spatially-global attentional modulation in human visual cortex. NeuroImage, 255, 119207, https://doi.org/10.1016/j.neuroimage.2022.119207. [PubMed]
Tipper, S. P., & Cranston, M. (1985). Selective attention and priming: Inhibitory and facilitatory effects of ignored primes. The Quarterly Journal of Experimental Psychology Section A, 37(4), 591–611, https://doi.org/10.1080/14640748508400921.
Toffanin, P., de Jong, R., Johnson, A., & Martens, S. (2009). Using frequency tagging to quantify attentional deployment in a visual divided attention task. International Journal of Psychophysiology, 72(3), 289–298, https://doi.org/10.1016/j.ijpsycho.2009.01.006.
Tong, F., Nakayama, K., Vaughan, J. T., & Kanwisher, N. (1998). Binocular rivalry and visual awareness in human extrastriate cortex. Neuron, 21(4), 753–759, https://doi.org/10.1016/S0896-6273(00)80592-9. [PubMed]
Tononi, G., Srinivasan, R., Russell, D. P., & Edelman, G. M. (1998). Investigating neural correlates of conscious perception by frequency-tagged neuromagnetic responses. Proceedings of the National Academy of Sciences, USA, 95(6), 3198–3203, https://doi.org/10.1073/pnas.95.6.3198.
Treue, S., & Martínez-Trujillo, J. C. (1999). Feature-based attention influences motion processing gain in macaque visual cortex. Nature, 399(6736), 575–579, https://doi.org/10.1038/21176. [PubMed]
Treue, S., & Martinez-Trujillo, J. C. (2007). Attending to features inside and outside the spotlight of attention. Neuron, 55(2), 174–176, https://doi.org/10.1016/j.neuron.2007.07.005. [PubMed]
Wada, Y., & Yamamoto, T. (2001). Selective impairment of facial recognition due to a haematoma restricted to the right fusiform and lateral occipital region. Journal of Neurology, Neurosurgery, and Psychiatry, 71(2), 254–257, https://doi.org/10.1136/jnnp.71.2.254. [PubMed]
Wang, J., Clementz, B. A., & Keil, A. (2007). The neural correlates of feature-based selective attention when viewing spatially and temporally overlapping images. Neuropsychologia, 45(7), 1393–1399, https://doi.org/10.1016/j.neuropsychologia.2006.10.019. [PubMed]
Willenbockel, V., Sadr, J., Fiset, D., Horne, G. O., Gosselin, F., & Tanaka, J. W. (2010). Controlling low-level image properties: The SHINE toolbox. Behavior Research Methods, 42(3), 671–684, https://doi.org/10.3758/BRM.42.3.671. [PubMed]
Wu, X., Liu, X., & Fu, S. (2016). Feature- and category-specific attentional control settings are differently affected by attentional engagement in contingent attentional capture. Biological Psychology, 118, 8–16, https://doi.org/10.1016/j.biopsycho.2016.04.065. [PubMed]
Yovel, G., Sadeh, B., Podlipsky, I., Hendler, T., & Zhdanov, A. (2008). The face-selective ERP component (N170) is correlated with the face-selective areas in the fusiform gyrus (FFA) and the superior temporal sulcus (fSTS) but not the occipital face area (OFA): A simultaneous fMRI-EEG study. Journal of Vision, 8(6), 401, https://doi.org/10.1167/8.6.401.
Yu, Z., Guindani, M., Grieco, S. F., Chen, L., Holmes, T. C., & Xu, X. (2022). Beyond t test and ANOVA: Applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron, 110(1), 21–35, https://doi.org/10.1016/j.neuron.2021.10.030. [PubMed]
Zhang, P., Jamison, K., Engel, S., He, B., & He, S. (2011). Binocular rivalry requires visual attention. Neuron, 71(2), 362–369, https://doi.org/10.1016/j.neuron.2011.05.035. [PubMed]
Figure 1.
 
Illustration of trial sequence creation. Two SWIFT sequences are pictured as a series of individual frames and combined via alpha blending into the final trial sequence. Colored frames indicate the original stimulus (red = face; green = house). The face image is a private photograph and is used here for illustrative purposes only; the original stimuli are available from the authors on request. The house image is taken from the Pasadena Houses 2000 dataset (Perona & Helle, 2000).
Figure 1.
 
Illustration of trial sequence creation. Two SWIFT sequences are pictured as a series of individual frames and combined via alpha blending into the final trial sequence. Colored frames indicate the original stimulus (red = face; green = house). The face image is a private photograph and is used here for illustrative purposes only; the original stimuli are available from the authors on request. The house image is taken from the Pasadena Houses 2000 dataset (Perona & Helle, 2000).
Figure 2.
 
Example stimulus pairings across trial sequences for two exemplary participants with counterbalancing of stimulus identity, frequency, congruence, and attentional role. In each trial, a 1-Hz SWIFT sequence and a 0.8-Hz SWIFT sequence were superimposed with alpha blending (juxtaposed in the figure for illustration), one serving as a target and one serving as distractor. Each stimulus had one fixed partner in the congruent condition and in the incongruent condition to ensure visual recognizability. The target was defined before each trial. Target and distractor roles of individual stimuli were fixed for each participant to prevent negative priming, and attentional roles were counterbalanced across participants; for example, the exemplary participant A was only presented with the man as a target, but participant B was only presented with the woman as a target. Congruent and incongruent trials were presented in a randomized manner. Face images are private photographs and are used here for illustrative purposes only; the original stimuli are available from the authors on request. The house images are taken from the Pasadena Houses 2000 dataset (Perona & Helle, 2000) and the Pasadena Buildings dataset (Aly et al., 2009).
Figure 2.
 
Example stimulus pairings across trial sequences for two exemplary participants with counterbalancing of stimulus identity, frequency, congruence, and attentional role. In each trial, a 1-Hz SWIFT sequence and a 0.8-Hz SWIFT sequence were superimposed with alpha blending (juxtaposed in the figure for illustration), one serving as a target and one serving as distractor. Each stimulus had one fixed partner in the congruent condition and in the incongruent condition to ensure visual recognizability. The target was defined before each trial. Target and distractor roles of individual stimuli were fixed for each participant to prevent negative priming, and attentional roles were counterbalanced across participants; for example, the exemplary participant A was only presented with the man as a target, but participant B was only presented with the woman as a target. Congruent and incongruent trials were presented in a randomized manner. Face images are private photographs and are used here for illustrative purposes only; the original stimuli are available from the authors on request. The house images are taken from the Pasadena Houses 2000 dataset (Perona & Helle, 2000) and the Pasadena Buildings dataset (Aly et al., 2009).
Figure 3.
 
Data on behavioral performance. Mean proportional deviation from actual number of targets in the counting task as a function of target category and congruence. Error bars indicate 99% confidence intervals. Category congruence led to higher error scores compared with category incongruence for faces and houses (**p < 0.001).
Figure 3.
 
Data on behavioral performance. Mean proportional deviation from actual number of targets in the counting task as a function of target category and congruence. Error bars indicate 99% confidence intervals. Category congruence led to higher error scores compared with category incongruence for faces and houses (**p < 0.001).
Figure 4.
 
Evoked power for SWIFT (blue), IM (magenta), and SSVEP (red) frequencies averaged across all conditions, posterior channels, and participants (log2 transformed for better scaling). SNRs for other frequencies (spaced by 0.03 Hz) within the frequency range are drawn in gray. For all frequencies, filled and open circles indicate SNRs that are significant and non-significant, respectively (see Table 1 and Supplementary Table S1 for FDR-adjusted t-tests).
Figure 4.
 
Evoked power for SWIFT (blue), IM (magenta), and SSVEP (red) frequencies averaged across all conditions, posterior channels, and participants (log2 transformed for better scaling). SNRs for other frequencies (spaced by 0.03 Hz) within the frequency range are drawn in gray. For all frequencies, filled and open circles indicate SNRs that are significant and non-significant, respectively (see Table 1 and Supplementary Table S1 for FDR-adjusted t-tests).
Figure 5.
 
Scalp topography of the averaged SSVEP (f3 = 12.5 Hz and 2f3 = 25 Hz), averaged SWIFT (f1 = 0.8 Hz, f2 = 1 Hz, 2f1 = 1.6 Hz, and 2f2 = 2 Hz), and averaged IM signal (f3 − 2f2, f3 − 2f1, f3 − f2, f3 − f1, f3 + f1, f3 + f2, f3 + 2f1, and f3 + 2f2). SNR was log2 transformed for better scaling. See Supplementary Figure S1 for the topography of all individual frequencies.
Figure 5.
 
Scalp topography of the averaged SSVEP (f3 = 12.5 Hz and 2f3 = 25 Hz), averaged SWIFT (f1 = 0.8 Hz, f2 = 1 Hz, 2f1 = 1.6 Hz, and 2f2 = 2 Hz), and averaged IM signal (f3 − 2f2, f3 − 2f1, f3 − f2, f3 − f1, f3 + f1, f3 + f2, f3 + 2f1, and f3 + 2f2). SNR was log2 transformed for better scaling. See Supplementary Figure S1 for the topography of all individual frequencies.
Figure 6.
 
SWIFT and IM SNR as a function of attention, congruence, and category. Error bars indicate 99% confidence intervals. Significant slopes are indicated by *p < 0.01 and **p < 0.001.
Figure 6.
 
SWIFT and IM SNR as a function of attention, congruence, and category. Error bars indicate 99% confidence intervals. Significant slopes are indicated by *p < 0.01 and **p < 0.001.
Figure 7.
 
SSVEP log2(SNR) as a function of congruence and target category. The presentation of two faces led to significantly higher SSVEP signal compared to other conditions (**p < 0.001). Error bars indicate 99% confidence intervals.
Figure 7.
 
SSVEP log2(SNR) as a function of congruence and target category. The presentation of two faces led to significantly higher SSVEP signal compared to other conditions (**p < 0.001). Error bars indicate 99% confidence intervals.
Table 1.
 
Results of one-tailed t-tests for SNRs of frequencies of interest. All frequency bins from 0.30 Hz to 28 Hz (832 bins) were tested, and p values were FDR adjusted. See Supplementary Table S1 for further tested frequencies.
Table 1.
 
Results of one-tailed t-tests for SNRs of frequencies of interest. All frequency bins from 0.30 Hz to 28 Hz (832 bins) were tested, and p values were FDR adjusted. See Supplementary Table S1 for further tested frequencies.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×