Open Access
Article  |   January 2025
Modality-, feature-, and strategy-dependent organization of low-level working memory
Author Affiliations
  • Vivien Chopurian
    Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany
    Bernstein Center for Computational Neuroscience Berlin and Berlin Center for Advanced Neuroimaging, Charité Universitätsmedizin Berlin, Corporate member of the Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
    [email protected]
  • Anni Kienke
    Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany
    Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
    Faculty 2 Biology and Chemistry, Universität Bremen, Bremen, Germany
    [email protected]
  • Christoph Bledowski
    Institute of Medical Psychology, Medical Faculty, Goethe University, Frankfurt am Main, Germany
    Brain Imaging Center, Medical Faculty, Goethe University, Frankfurt am Main, Germany
    [email protected]
  • Thomas B. Christophel
    Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany
    Bernstein Center for Computational Neuroscience Berlin and Berlin Center for Advanced Neuroimaging, Charité Universitätsmedizin Berlin, Corporate member of the Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
    [email protected]
Journal of Vision January 2025, Vol.25, 16. doi:https://doi.org/10.1167/jov.25.1.16
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Vivien Chopurian, Anni Kienke, Christoph Bledowski, Thomas B. Christophel; Modality-, feature-, and strategy-dependent organization of low-level working memory. Journal of Vision 2025;25(1):16. https://doi.org/10.1167/jov.25.1.16.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Previous research has shown that, when multiple similar items are maintained in working memory, recall precision declines. Less is known about how heterogeneous sets of items across different features within and between modalities impact recall precision. In two experiments, we investigated modality (Experiment 1, n = 79) and feature-specific (Experiment 2, n = 154) load effects on working memory performance. First, we found a cross-modal advantage in continuous recall: Orientations that are memorized together with a pitch are recalled more precisely than orientations that are memorized together with another orientation. The results of our second experiment, however, suggest that this is not a pure effect of sensory modality but rather a feature-dependent effect. We combined orientations, pitches, and colors in pairs. We found that memorizing orientations together with a color benefits orientation recall to a similar extent as the cross-modal benefit. To investigate this absence of interference between orientations and colors held in working memory, we analyzed subjective reports of strategies used for the different features. We found that, although orientations and pitches rely almost exclusively on sensory strategies, colors are memorized not only visually but also with abstract and verbal strategies. Thus, although color stimuli are also visually presented, they might be represented by independent neural circuits. Our results suggest that working memory storage is organized in a modality-, feature-, and strategy-dependent way.

Introduction
Working memory, the ability to maintain and manipulate information in mind, has limited capacity (Baddeley, 1986; Luck & Vogel, 1997; Ma, Husain, & Bays, 2014). This limitation could arise due to interference between concurrently memorized items (Cohen, Konkle, Rhee, Nakayama, & Alvarez, 2014; Cowan, Saults, & Blume, 2014; Franconeri, Alvarez, & Cavanagh, 2013; Kiyonaga & D'Esposito, 2020; Oberauer et al., 2018; Schurgin, Wixted, & Brady, 2020; Wennberg & Serences, 2023). In this study, we investigated whether the load-based decline in behavioral precision is due to sensory- or feature-based interference. 
If a visual working memory task requires the precise recall of a feature, such as an orientation, visual cortex is recruited (Harrison & Tong, 2009; Serences, Ester, Vogel, & Awh, 2009). If working memory load is increased and multiple features must be maintained, neural populations storing similar representations can be expected to interfere with each other (Baddeley & Hitch, 1974; Chopurian, Weber, & Christophel, 2024; Kiyonaga & D'Esposito, 2020). This interference would lead to a reduction in recall performance. Memorizing multiple low-level features from different sensory modalities, represented in separate sensory regions, is expected to reduce such interference. This reduced interference in sensory stores should thus increase recall performance and lead to a cross-modal advantage
However, previous work investigating such low-level interference in working memory has presented multiple spatially separated visual and auditory features simultaneously and found no cross-modal advantage (Salmela, Moisala, & Alho, 2014). Auditory features interfered with visual features to the same degree as other visual features. Participants were cued before stimulus onset as to how many and which features they needed to attend to. The following four features were distributed across two stimuli: Gabor patches contained the features orientation and spatial frequency, and tones contained the features pitch and duration. Similarly, testing spatial location recall by presenting auditory or visual or cross-modal object locations did not show a significant cross-modal advantage (Lehnert & Zimmer, 2006). However, a cross-feature advantage was observed for simultaneously presented visual low-level features, such as colors, luminance, and orientations (Cai, Fulvio, Samaha, & Postle, 2022; Wennberg & Serences, 2023). These results indicate that the cross-modal benefit depends on the task relevancy of the features and how many discrete features have to be maintained. When multiple, simultaneously presented visual objects were presented, recall was improved for objects from different categories compared to objects from the same category (Cohen et al., 2014). The extent of this cross-category advantage was predicted by the neural overlap between the category representations. Together, previous studies show that the advantage of maintaining multiple features depends on the overall load and overlap of neural representations. 
Previous studies suggest that interference occurs in part during encoding, where similar items must be differentiated (Cohen, Rhee, & Alvarez, 2016; Wennberg & Serences, 2023). Although behavioral precision drops when increasing working memory load, it improves with longer item presentation durations (Bays, Wu, & Husain, 2011; Chunharas, Rademaker, Brady, & Serences, 2022), suggesting that longer presentation times can minimize interference during the encoding stage. Another way to minimize interference and facilitate the memorization of multiple items could be to employ different strategies, such as categorization and verbal labeling (Baddeley, 2003; Bae, Olkkonen, Allred, & Flombaum, 2015; Brown & Wesley, 2013; Gonthier, 2021; Hardman, Vergauwe, & Ricker, 2017; Overkott & Souza, 2022; Pereira Seabra, Chopurian, Souza, & Christophel, 2024; Zhou, Lorist, & Mathôt, 2022a; Zhou, Lorist, & Mathôt, 2022b) or binding items to the task-relevant context, such as the serial order position (Hurlstone, Hitch, & Baddeley, 2014; Manohar, Pertzov, & Husain, 2017). Categorization could serve as a pointer toward a more detailed representation of the stimulus, but it can also bias those representations toward those categories (Bae et al., 2015; Bae & Luck, 2019; Chunharas et al., 2022; Overkott & Souza, 2023; Thyer, Adam, Diaz, Velázquez Sánchez, Vogel, & Awh, 2022). Similarly, memorizing items in sequence can lead to serial order biases, such as primacy or recency effects (Gorgoraptis, Catalao, Bays, & Husain, 2011; Manohar et al., 2017). Although we did not instruct different strategies in this study, categorization and binding to task-relevant contexts occurs naturally. We asked participants after the experiment about the different strategies employed to memorize different features. 
In this study, we sought to investigate cross-modal and cross-feature advantages for sequentially presented low-level visual and auditory features. We compared how the response to each feature changes due to being presented in sets with either the same feature or a feature from a different modality. Across two experiments, the overall working memory load was kept constant at two items, but each item could either be an exemplar from the same modality and feature, from a different sensory modality (orientation, pitch), or a different within-modality feature (orientation, color). In Experiment 1, we compared continuous recall precision for orientations and pitches between cross-modal and within-modal sets to test for an advantage for cross-modal sets. We also varied the presentation duration of the samples to gain insight into whether interference was occurring during encoding or working-memory maintenance. 
In Experiment 2, we compared behavioral performance across within-feature, cross-modal, and cross-feature sets to test whether cross-feature advantages are comparable to the benefits of cross-modal sets. If memorized items share similar cortical representations, we expect that the overlap of those representations reduces behavioral recall precision (e.g., for two orientations) compared to memorizing two items from different sensory modalities, whose representations might overlap less (e.g., an orientation and a pitch). When memorizing sets from the same sensory modality but a different feature (e.g., a color and an orientation), we expect both features to be represented by neural populations in visual cortex. Thus, these populations would still overlap in cortical space, although to a smaller extent. Similarly, we would still expect a behavioral advantage, but less than the cross-modal benefit. However, low-level items can be memorized with different strategies, which might be represented differently on a neural level as well and thus decrease overlap further. We analyzed strategy questionnaires to explore how different memorization strategies and subjectively perceived formats for different items modulate the working memory performance. 
Experiment 1
Methods
Participants
We recruited 80 English-speaking participants via Prolific. All participants completed the experiment. One dataset was removed from the analysis due to a below-chance recall error. The final sample consisted of 79 participants (23 female; median age, 29 ± 5.87 years; range, 19–40 years). Participants gave their informed consent and were reimbursed £9 per hour. The experiment was approved by the Ethics Committee of the Department of Psychology from the Humboldt Universität zu Berlin (Application 2022-40) and conducted according to the tenets of the Declaration of Helsinki. 
Procedure
Participants took part in 288 experimental trials of the delayed-estimation task, divided into 24 blocks with short breaks in between (see Figure 1A). On each trial, participants had to memorize two stimuli: two orientations, two pitches, or one orientation and one pitch. This meant that the overall working memory load was two in all trials, but the sensory load varied between one and two items. We masked the respective other sensory modality to keep the sensory input at a similar level (see Stimuli section). A trial started with the sequential presentation of these two items. Visual items and masks were presented in the center of the screen, and auditory stimuli were presented via the participants’ audio system for 0.1 second, 0.4 second, or 1 second, with an interstimulus interval of 1 second. We counterbalanced target feature (orientation or pitch), non-target feature (orientation or pitch), and stimulus presentation duration (0.1, 0.4, or 1 second). We counterbalanced the number of trials across each recall modality to ensure an equal number of responses for each modality. Participants completed 24 trials for each of the 12 experimental conditions, which were presented in a randomized order. We randomized the 12-items (see Stimuli section) for each target and non-target feature within each experimental condition to avoid systematic correlation between these stimuli. The delay was 2 seconds long, where only the fixation dot was shown. The cue, indicating whether participants had to recall the first or second item, was shown for 1.2 seconds and randomized within each experimental condition, so that each cueing condition appeared six times within the experimental condition. During the visual recall, participants had to adjust a random probe orientation by pressing the left (counterclockwise) and right (clockwise) arrow keys on their keyboard. For auditory recall, participants adjusted a random probe pitch by pressing the left (lower) and right (higher) arrow keys. The recall was followed by a random intertrial interval of 1, 1.5, or 2 seconds. 
Figure 1.
 
(A) Experimental setup. Example trial for a cross-modal trial with continuous pitch recall. The two stimuli (here, one pitch and one orientation) are presented sequentially with an interstimulus interval of 1 second. After a 2-second delay, during which participants had to keep both items in mind, a serial order cue indicated whether they had to recall the first or second item. Here, the first item was cued, so they had to recall the pitch. The continuous recall allowed them to select any pitch between 246.94 Hz (B3) and 523.54 Hz (C5). (B) Experimental conditions with all combinations of target and non-target features. The trials for the experimental conditions were presented in a randomized order. (C, D) Main effects of each feature set for orientations (C) and pitches (D). Precision is the inverse of the SD in radians or semitones, respectively. **p < 0.01 and ***p < 0.001, uncorrected. Error bars show SEM.
Figure 1.
 
(A) Experimental setup. Example trial for a cross-modal trial with continuous pitch recall. The two stimuli (here, one pitch and one orientation) are presented sequentially with an interstimulus interval of 1 second. After a 2-second delay, during which participants had to keep both items in mind, a serial order cue indicated whether they had to recall the first or second item. Here, the first item was cued, so they had to recall the pitch. The continuous recall allowed them to select any pitch between 246.94 Hz (B3) and 523.54 Hz (C5). (B) Experimental conditions with all combinations of target and non-target features. The trials for the experimental conditions were presented in a randomized order. (C, D) Main effects of each feature set for orientations (C) and pitches (D). Precision is the inverse of the SD in radians or semitones, respectively. **p < 0.01 and ***p < 0.001, uncorrected. Error bars show SEM.
Before starting the experiment, participants were screened for working audio output. During the audio test, participants had to press a series of letters on the keyboard indicated by spoken audio input, and they could adjust the volume to their preference. After they passed, they read the instructions and completed the training trials. During the training trials, participants received feedback about their performance (as average absolute error in degrees or semitones). To motivate participants throughout the experiment, they were informed at the start of the experiment, that they receive information about what proportion of the trials they performed better than author TBC on the task at the very end of the experiment. During the main experiment, they received feedback for each feature regarding their accuracy after each block, during the short breaks. The main experiment was followed by a short quiz about the task instructions. If they passed this quiz by responding correctly to all questions, they could continue with the experiment. If they did not pass the quiz, they were forwarded to the instruction page and had to complete the training trials again. With this procedure, we wanted to make sure that the participants understood the instructions. After the main experiment, participants responded to a short questionnaire about their strategies used in the experiment (for results, see Supplementary Figure S3). 
Stimuli
Stimuli were generated with HTML, CSS, JavaScript, and MATLAB (MathWorks, Natick, MA). Background color was set to gray (#808080 Hex Color). A white fixation dot was centered on the canvas and had a size of 12 pixels, with a 1-pixel-wide fade to gray. To indicate the recall period, the fixation dot turned blue. Orientation stimuli and masks were presented as “donut” stimuli with a total diameter of 400 pixels and an inner annulus of 30 pixels. Twelve orientations were chosen (7.5°–172.5°) with a 22.5° distance and excluded the cardinal axes, with fades to gray on all edges. The spatial frequency was set to 0.12 cycles/pixel. Visual masks were generated by overlaying two Gabor patches, resulting in a plaid-like pattern. During the recall period, a random orientation was presented, and participants could select the orientation from 0° to 180° to match the target orientation. Twelve pure tones in an interval of one semitone, from 261.63 Hz (corresponding to scientific pitch C4) to 493.88 Hz (corresponding to scientific pitch A4), were chosen as pitch stimuli. White noise was presented as auditory masks. We generated the auditory items and masks in MATLAB with custom functions and played them as .wav files via jsPsych to ensure stability of the auditory precepts. During the recall period, a random pitch was played, and participants could continuously select the target pitch within a range of 246.94 Hz (B3) to 523.25 Hz (C5). 
Analyses
All analyses and plotting were performed in R (R Foundation for Statistical Computing, Vienna, Austria) (Wickham et al., 2019). We removed one outlier dataset based on the overall absolute error for each feature (see Participants section), but otherwise performed no further data cleaning. For precision, we calculated the reciprocal of the standard deviation of the error across trials for each participant and experimental condition. Error was the difference between the response and target item. For orientations, we calculated the circular difference in radians; for pitches, we calculated the difference between the target and response in semitones. For the final orientation error calculation, we followed the precision calculation from Gorgoraptis et al. (2011) by subtracted chance level (45°), so that an error of 0 corresponded to responses at random and used the circular SD as implemented in the R package circular (Lund et al., 2024). To calculate the cross-modal effect, we subtracted the within-modal precision from cross-modal precision for each participant, so that positive values indicate a precision benefit and negative values a disadvantage of cross-modal trials. We never compared the responses across features, as feature-specific continuous recall operates on a different error unit: semitones versus degrees (0°–180° for orientation recall and 0°–360° for color recall in Experiment 2). See Supplementary Table S1 for all absolute errors and SEMs for each feature. 
For differences in precision depending on the non-target feature, we tested conservatively with two-sided t-tests. For each stimulus presentation duration, we tested with two-sided t-tests whether the cross-modal effect was different from 0. We tested for main and interaction effects of non-target feature and stimulus presentation duration with repeated-measures analysis of variance (rmANOVA) (Girden, 1992), and the within-factor was participant number. If the sphericity assumption was violated, the result for the rmANOVA was corrected with the Greenhouse–Geisser correction, as implemented in the rstatix package (Kassambara, 2023). Where applicable, post hoc tests were Bonferroni-corrected for multiple comparisons. For effect sizes for these paired t-tests, we report Cohen's d, where are a d value of 0.2 can refer to a small effect, 0.5 to a moderate effect, and 0.8 to a large effect (Cohen, 2013). Experimental data can be found at https://osf.io/yfpjc/
Results
Cross-modal advantage for orientations and pitches
Our results show a cross-modal advantage for orientations and pitches. Participants were more precise in recalling the target orientation when this orientation was paired with a pitch (absolute error = 11.22°; SEM = 0.53°) compared to another orientation (absolute error = 12.31°; SEM = 0.55°; t(78) = 3.76, p < 0.001, Cohen's d = 0.43) (see Figure 1B). Similarly, when recalling a target pitch, participants were more precise when it was paired with an orientation (absolute error = 2.16 semitones; SEM = 0.13 semitone) compared with another pitch (absolute error = 2.26 semitones; SEM = 0.12 semitone; t(78) = 2.92, p = 0.005, Cohen's d = 0.33) (see Figure 1C). 
Encoding duration affects overall precision but not cross-modal advantage
Next, we analyzed whether the encoding duration had an effect on this cross-modal benefit. For orientation, we found the main effects of non-target item modality, F(1, 78) = 15.34, p < 0.001, and encoding duration, F(2, 156) = 14.22, p < 0.001 (Figure 2B), but no interaction effect (p = 0.75) (Figure 2A). Post hoc tests indicated that participants were more precise at recalling the target orientation when they heard or viewed the items for 0.4 second (p < 0.001, Cohen's d = 0.39) or 1 second (p < 0.001, Cohen's d = 0.45) compared with 0.1 second. There was no further significant increase in precision between 0.4 second and 1 second (p = 1, all Bonferroni corrected). Although recall performance increased with stimulus presentation duration, it did not affect the degree to which the cross-modal advantage was expressed. 
Figure 2.
 
(A, C) Interaction effects between encoding duration and cross-modal benefit for orientations (A) and pitches (C). Cross-modal advantage was calculated by subtracting the precision for within-modal conditions from cross-modal conditions. Thus, positive values suggest an advantage for cross-modal pairings, and negative values suggest a disadvantage. Asterisks above each point indicate a significant cross-modal advantage for the respective stimulus presentation condition (uncorrected t-test). Average shows the overall cross-modal advantage across all three encoding durations for each modality. (B, D) Main effects of stimulus presentation duration for orientations (B) and pitches (D). Precision is 1/SD in radians for orientations and semitones for pitches. *p < 0.05, **p < 0.01, and ***p < 0.001. Error bars show the SEM.
Figure 2.
 
(A, C) Interaction effects between encoding duration and cross-modal benefit for orientations (A) and pitches (C). Cross-modal advantage was calculated by subtracting the precision for within-modal conditions from cross-modal conditions. Thus, positive values suggest an advantage for cross-modal pairings, and negative values suggest a disadvantage. Asterisks above each point indicate a significant cross-modal advantage for the respective stimulus presentation condition (uncorrected t-test). Average shows the overall cross-modal advantage across all three encoding durations for each modality. (B, D) Main effects of stimulus presentation duration for orientations (B) and pitches (D). Precision is 1/SD in radians for orientations and semitones for pitches. *p < 0.05, **p < 0.01, and ***p < 0.001. Error bars show the SEM.
We found the same pattern of results for target pitches. Although there were main effects of non-target item modality, F(1, 78) = 9.86, p = 0.002, and stimulus presentation duration, F(1.82, 141.97) = 11.01, p < 0.001 (see Figure 2D), there was no significant interaction (p = 0.19) (see Figure 2C). When participants heard the pitch for 1 second, they were more precise at recalling the target pitch than when they heard it for 0.1 second (p < 0.001, Cohen's d = 0.41) or 0.4 second (p = 0.016, Cohen's d = 0.22). There was no difference between 0.1 second and 0.4 second (p = 0.65, all Bonferroni corrected). Although overall participants benefited from memorizing the pitch jointly with an orientation and were more precise when they had 1 second to encode the pitch, the cross-modal advantage did not differ between the encoding durations. See Supplementary Figure S1 for the error histograms for each condition. 
Faster response times in cross-modal trials
For orientation recall, reaction time analysis revealed a main effect of non-target modality, F(1, 78) = 14.47, p < 0.001, and stimulus presentation duration, F(2, 156) = 16.80, p < 0.001, but no interaction effect (p > 0.3). Post hoc paired t-tests show that participants responded faster in cross-modal trials than within-modal trials (p < 0.001, Cohen's d = 0.26). When participants had 1 second to view the item, they responded more slowly compared with stimulus presentation times of 0.1 second (p < 0.001, Cohen's d = 0.46) and 0.4 second (p = 0.002, Cohen's d = 0.28, all Bonferroni corrected). Results are shown in Supplementary Figure S2
For pitches, we found a main effect of non-target modality, F(1, 78) = 6.62, p = 0.012, and stimulus presentation duration, F(1.77, 137.87) = 5.36, p = 0.008, and an interaction effect, F(1.80, 140.54) = 3.57, p = 0.035. We also found the cross-modal advantage for reaction times, where participants responded faster in cross-modal trials compared with within-modal trials (p = 0.01, Cohen's d = 0.21). Participants responded faster when perceiving the items for 0.4 second than for 0.1 second (p = 0.002, Cohen's d = 0.25) and 1 second (p < 0.001, Cohen's d = 0.30). The reaction times for 0.1 second and 1 second were not significantly different from each other (p = 1, all Bonferroni corrected) (see Supplementary Figure S2). Although the interaction term was significant, Bonferroni-corrected pairwise t-tests did not show any significant difference in cross-modal reaction time advantages across stimulus presentation durations (all p > 0.09). 
Interim discussion
As expected, we found a cross-modal advantage for orientations as well as pitches. When two items were sequentially presented and both were maintained in memory, the recall precision for cross-modal sets (orientation and pitch) was higher than for within-modal sets (two orientations, two pitches). Furthermore, we also found an effect of stimulus presentation duration on overall precision. Participants were more precise when they had more time to view or hear the items. However, there was no significant interaction effect between stimulus presentation duration and the cross-modal advantage. The second experiment then sought to extend this cross-modal advantage to a cross-feature effect. 
Experiment 2
Methods
Participants
We recruited 162 participants via Prolific who were fluent in English and reported no colorblindness. After removing six incomplete datasets due to technical issues, one dataset because the participant reported using their finger on the screen to memorize the orientation, and one dataset due to a below-chance recall error, the final sample consisted of 154 participants (50 female; median age = 28 ± 5.81 years; range, 18–40 years). Participants gave their informed consent and were reimbursed £9 per hour. The experiment was approved by the Ethics Committee of the Department of Psychology from the Humboldt Universität zu Berlin (Application 2022-03) and was conducted according to the tenets of the Declaration of Helsinki. 
Procedure
As in the first experiment, participants completed the audio check to confirm that they heard the audio output. After reading the instructions and performing training trials, they completed a quiz with questions regarding the instructions. Participants took part in a total of 324 trials of the delayed estimation experiment (Figure 3). These trials were divided into 18 short blocks with breaks in between with the same feedback procedure as in Experiment 1. Because we were interested in whether our cross-modal advantage was truly a modality effect or was modulated by feature similarity, we combined three features: orientation, pitch, and color. Thus, for each target feature, participants completed 36 trials for each combination (e.g., for orientation as the target feature, orientation–pitch, orientation–orientation, and orientation–color). As in Experiment 1, we decided on this experimental design to balance the number of responses for each feature. The target position was counterbalanced across the first and second item positions. For each experimental condition, samples from 12 stimuli bins (see Stimuli section) were randomized across the target and non-target feature respectively, so that each target and non-target sample combination was possible. Both features were presented sequentially for 0.4 second, with a 1-second interstimulus interval. After the 2-second delay, a retro-cue (either a “1” or a “2”) was shown for 1.2 seconds. Then, the random probe for the target feature was presented. Participants continuously recalled the target feature by pressing the left (counterclockwise) and right (clockwise) arrow keys on their keyboard to rotate the probe orientation or probe color or by pressing the left (lower) and right (higher) arrow keys to adjust the probe pitch. 
Figure 3.
 
(A) Experiment 2 trial. Two features were shown sequentially for 0.4 second, with an interstimulus interval of 1 second. The delay was 2 seconds long, and, after the cue, participants had 4 seconds to recall the target feature. The intertrial interval was again randomized among 1, 1.5, and 2 seconds. (B) Experimental conditions. Each feature (orientation, pitch, color) was paired with another feature (orientation, color, pitch), so that each target feature had two cross-feature conditions (e.g., target orientation was paired with a pitch and an orientation) and one within-feature condition (e.g., target orientation was paired with another orientation). (CE) Recall precision for each target feature: orientation (C), color in radians−1 (D), and pitch in semitones−1 (E), as well as each within- and cross-feature set. *p < 0.05, **p < 0.01, and ***p < 0.001. Error bars show the SEM.
Figure 3.
 
(A) Experiment 2 trial. Two features were shown sequentially for 0.4 second, with an interstimulus interval of 1 second. The delay was 2 seconds long, and, after the cue, participants had 4 seconds to recall the target feature. The intertrial interval was again randomized among 1, 1.5, and 2 seconds. (B) Experimental conditions. Each feature (orientation, pitch, color) was paired with another feature (orientation, color, pitch), so that each target feature had two cross-feature conditions (e.g., target orientation was paired with a pitch and an orientation) and one within-feature condition (e.g., target orientation was paired with another orientation). (CE) Recall precision for each target feature: orientation (C), color in radians−1 (D), and pitch in semitones−1 (E), as well as each within- and cross-feature set. *p < 0.05, **p < 0.01, and ***p < 0.001. Error bars show the SEM.
Stimuli
Orientation and pitch samples were selected as described in Experiment 1. For color stimuli, we selected 12 hues from 7.5° to 337.5°, equally spaced 30° apart in a circular hue–chroma–luminance (HCL) color space, with luminance of 70 and chroma of 38. Here, we added a jitter in 2°, 4°, or 1-semitone increments to each orientation, color, and pitch, respectively. This jitter was binned around ±6°, ±12° or ±3 semitones around the items, without overlap, and were randomly assigned at each trial (see Supplementary Figure S5 for the distribution of target samples and response patterns). For visual masks, we generated 12 dynamic white-noise patterns with colored pixels (Rademaker, Chunharas, & Serences, 2019; Schneegans & Bays, 2018). As in Experiment 1, a random probe was chosen as a start point for the continuous orientation and pitch recall. For the color recall, participants saw the full color wheel, and we presented the color patch of the currently selected color in the center of the wheel to make the probe more similar to the memoranda. For each trial, the color wheel was rotated at a random angle, and a random color was selected as the starting probe to prevent participants from associating colors with a spatial location. 
Questionnaire
After the experimental task, participants completed a short questionnaire about their strategies. The prompt for each orientation, color, and pitch stated, “I memorized the orientation during the delay period …,” and was followed by 11 possible strategies, such as “by using words to describe them,” “by giving them some name, code, or number,” “through an associated action,” “by how they looked,” or “by how they might sound” (see Supplemental Material S1). We included strategies that were expected for our stimuli, such as visual or auditory strategies, as well as different verbal coding strategies, but also unexpected strategies (e.g., smell) to screen the validity of each participant's responses to the questionnaire data. Participants were asked to respond on a Likert scale from 0 (strongly disagree) to 7 (strongly agree). Furthermore, participants responded to general questions about their focus during the study, subjective task difficulty, whether they experienced any technical difficulties, or their experience in playing an instrument or performing visual arts. There was also a chance to respond to open-text questions about memorization strategies we did not include in the questionnaire or general comments about the study. These open field questions were included to screen the online experiment for potential technical problems but were not analyzed in this study. 
Analyses
For recall precision and reaction time, we used the same analysis procedure as described in Experiment 1. For the color error, we first calculated the circular difference in radians between the presented color hue and the selected color response. Similarly to the orientations, we subtracted the chance level (90°), so that an error of 0 corresponded to responses at random. We calculated the inverse of the circular SD for color and orientation recall. 
Results
No difference in cross-modal and cross-feature advantages for orientations and pitches
Our analyses replicate our initial result and demonstrate a cross-modal advantage for orientations, F(2, 306) = 6.64, p = 0.002, and pitches, F(1.51, 231) = 17.19, p < 0.001. Furthermore, we showed a cross-feature benefit when orientations and pitches were paired with colors. We did not find these effects reliably for colors (p = 0.10) (see Figures 3C to 3E). Post hoc t-tests showed that participants were more precise when recalling orientations when the target orientation was paired with a color (p = 0.009, Cohen's d = 0.24) or a pitch (p = 0.002, Cohen's d = 0.28) compared with another orientation. The precision between the color and pitch non-target conditions did not differ significantly (p = 1, all Bonferroni corrected). Similarly, pitches were recalled more precisely when they were paired with either a color (p < 0.001, Cohen's d = 0.37) or an orientation (p < 0.001, Cohen's d = 0.35) compared with another pitch. There was no difference in precision whether the pitch was paired with an orientation or color (p = 1, all Bonferroni corrected). However, for colors, participants showed neither a cross-feature advantage when a color was paired with a non-target orientation (p = 0.09, uncorrected p = 0.03) nor a cross-modality benefit when the target color was paired with a pitch (p = 0.53) compared with memorizing two colors. The precision for these two cross-feature conditions did not differ significantly from each other (p = 1, all Bonferroni corrected). See Supplementary Figure S4 for error histograms. 
Feature specific reaction times advantage for colors
For orientations, reaction time analysis revealed a main effect of non-target feature, F(2, 306) = 8.72, p = 0.0002. Participants responded faster during the orientation recall, when the other item was a pitch (p < 0.001, Cohen's d = 0.34; all other p > 0.1) compared to another orientation. We also found a main effect of non-target feature for pitch recall, F(1.8, 276.1) = 21.55, p < 0.001. Here, Participants responded faster when the non-target feature was an orientation (p < 0.001, Cohen's d = 0.45) or a color (p < 0.001, Cohen's d = 0.39), and reaction times for both visual features where similar (p = 1). Color recall reaction times were also affected by the non-target feature, F(2, 306) = 10.96, p < 0.001. Participants responded faster when the target color was paired with an orientation (p < 0.001, Cohen's d = 0.31) or a pitch (p < 0.001, Cohen's d = 0.34) compared to trials where they had to memorize two colors. This increase in reaction time was similar for both non-target features (p = 1, all Bonferroni corrected). See Supplementary Figure S6 for figures of the reaction time results. 
Cross-feature pitch recall benefits from recency effect
On each trial, we presented items sequentially and cued with a serial order cue. However, technically the cue only becomes important during within-feature trials (for a comparison to these effects in Experiment 1, see Supplementary Table S2). To see whether serial order plays a role in cross-feature advantages, we included the target item position in our analysis. For orientation recall, we again found a main effect of non-target feature, F(2, 306) = 6.72, p = 0.001, and also a main effect of target position, F(1, 153) = 6.60, p = 0.011, but no interaction (p > 0.6). Although orientation recall precision is higher when the orientation is presented as the second compared to the first item (p = 0.008), this effect is negligible (Cohen's d = 0.12), and item order did not influence the cross-feature benefit. Color recall was neither influenced by non-target feature nor target position (p > 0.14). Only for pitch recall did we find a significant main effect of non-target feature, F(1.53, 234.23) = 22.45, p < 0.001, and target position, F(1, 153) = 24.60, p < 0.001, as well as a significant interaction, F(2, 306) = 6.58, p = 0.001. Overall, participants were more precise when recalling a pitch if the pitch was presented as the second item (p < 0.001, Cohen's d = 0.25). Uncorrected t-tests show that the cross-feature benefit was higher when an orientation (p = 0.001, Cohen's d = 0.27) or color (p = 0.003, Cohen's d = 0.24) was presented first and the pitch was presented as the second item (see Figure 4). 
Figure 4.
 
(A, C, E) Overall cross-feature advantage for target orientations (A), colors (C), and pitches (E). (B, D, F) Cross-feature advantages were analyzed for each item position and non-target feature separately for target orientations (B), colors (D), and pitches (F). Asterisks above the error bars indicate a significant cross-feature effect. *p < 0.05, **p < 0.01, and ***p < 0.001. Error bars show the SEM.
Figure 4.
 
(A, C, E) Overall cross-feature advantage for target orientations (A), colors (C), and pitches (E). (B, D, F) Cross-feature advantages were analyzed for each item position and non-target feature separately for target orientations (B), colors (D), and pitches (F). Asterisks above the error bars indicate a significant cross-feature effect. *p < 0.05, **p < 0.01, and ***p < 0.001. Error bars show the SEM.
Differential strategy use for colors and orientations
The similarity of the cross-modal and cross-feature advantages for orientation recall is a curious finding given that, if represented visually and recalled in a visual continuous recall task, color and orientation would rely on the visual processing stream and thus share neural resources to a larger extent than orientation and pitch, which are likely to rely on more distinct cortical regions. To understand this effect better, we analyzed questionnaire data about the strategies used to memorize each feature. We included strategies expected for orientation or color stimuli, such as relying on the visual percept or verbal labels (Gonthier, 2021; Overkott & Souza, 2022), as well as strategies such as relying on their intensity, which could also mean the perception of spatial frequency for orientation stimuli (Chua, 1990) and luminance for color (Hansen & Gegenfurtner, 2006). Similarly, we expected colors to be associated with temperatures (Ho, Iwai, Yoshikawa, Watanabe, & Nishida, 2014), but not orientations. Specifically, we were interested in whether our result indicated a difference in memorization strategies applied to the two visual features. 
Our results show that the participants recruited different strategies to memorize the items (see Figure 5A). We found that both orientation and pitch relied more on visual and auditory strategies, respectively, compared with naming or numbering the items. There was a significant difference in visual strategy use (see Figure 5) between features, F(1.66, 253.95) = 640.45, p < 0.001. It was used more for color (p < 0.001, Cohen's d = 2.10) and orientation (p < 0.001, Cohen's d = 2.48) than pitch, but not more for orientation than color (p = 0.49, all Bonferroni corrected). Memorizing the items by how they sounded (see Figure 5) differed between features, F(1.65, 251.78) = 1087.08, p < 0.001. This strategy was used more for pitch than orientation (p < 0.001, Cohen's d = 2.69) and color (p < 0.001, Cohen's d = 3.27), and curiously more for orientation than color (p = 0.012, Cohen's d = 0.24, all Bonferroni corrected), although it was not highly rated for either of the two visual features. 
Figure 5.
 
Strategy use per feature in Experiment 2. Likert scale from 0 (strongly disagree) to 7 (strongly agree). An example statement about strategy use is “I memorized the orientations/pitches/colors during the delay period by how they looked.” Error bars indicate SEM.
Figure 5.
 
Strategy use per feature in Experiment 2. Likert scale from 0 (strongly disagree) to 7 (strongly agree). An example statement about strategy use is “I memorized the orientations/pitches/colors during the delay period by how they looked.” Error bars indicate SEM.
However, assigning words (words) to describe the items differed among features, as well, F(2, 306) = 73.81, p < 0.001: It was used more for color than for orientation (p < 0.001, Cohen's d = 0.79) and pitch (p < 0.001, Cohen's d = 0.84), but not differently for pitch and orientation (p = 0.26, all Bonferroni corrected). Similarly, giving items a number, code, or name (number) differed between features, F(1.73, 265.1) = 65.33, p < 0.001. Using numbers was rated higher for color than for orientation (p < 0.001, Cohen's d = 0.65) and pitch (p < 0.001, Cohen's d = 0.79), but not differently for pitch and orientation (p = 0.13, all Bonferroni corrected). 
We asked about other sensory or cognitive strategies to memorize content. For example, an intensity-based strategy, F(2, 306) = 117.05, p < 0.001, was reported more for color (p < 0.001, Cohen's d = 1.15) and pitch (p < 0.001, Cohen's d = 0.95) than for orientations. The difference between color and pitch in intensity-based strategies was not significant (p = 0.13, all Bonferroni corrected). Associating a temperature with the memoranda was used more for color than for orientation (p < 0.001, Cohen's d = 0.82) and pitch (p < 0.001, Cohen's d = 0.42), but was also used to a higher degree for pitch than orientation (p < 0.001, Cohen's d = 0.53, all Bonferroni corrected; F(1.68, 257.13) = 59.08, p < 0.001. Items were differently associated with an action, F(1.54, 235.27) = 35.74, p < 0.001. This could suggest memorizing orientations with left and right tilts with the respective button presses and for pitch associating “high” and “low” with left and right button presses. It was used to a higher extent for orientation than color (p < 0.001, Cohen's d = 0.63) and pitch (p < 0.001, Cohen's d = 0.33), but was used to a higher degree for pitch than color (p < 0.001, Cohen's d = 0.44, all Bonferroni corrected). 
The three features differed in how they were memorized by associating an emotion, F(2, 306) = 13.51, p < 0.001. Pitch (p < 0.001, Cohen's d = 0.39) and color (p < 0.001, Cohen's d = 0.37) recruited the emotion strategy more than orientations, but there was no difference between color and pitch (p = 1, all Bonferroni corrected). Few people assigned a meaning to the items, and there was no difference in meaning strategy use (p = 0.08). There was a difference in associating touch with the items, F(1.79, 273.36) = 9.63, p < 0.001. This strategy was used more for pitch than color (p < 0.001, Cohen's d = 0.37), but not any differently for other features (all other p > 0.07, all Bonferroni corrected). One of the least frequently used strategies was assigning a smell or taste to the items. If the olfactory strategy was applied, it was used more for pitch than orientation (p = 0.009, Cohen's d = 0.24). There was no difference between other features (p > 0.1, all Bonferroni corrected), F(1.79, 273.55) = 3.66, p = 0.032. 
We also asked participants about their subjective experience of difficulty for the different memorized features and modalities using Likert scale ratings. We found no significant differences between the modalities in the first experiment (n = 79; orientation, 3.54 ± 0.21 SEM; pitch, 3.27 ± 0.24 SEM; t(78) = 0.97, p = 0.34), but we did find significant differences among all features in the second experiment (n = 154; orientation, 2.79 ± 0.14 SEM; pitch, 3.32 ± 1.8 SEM; color, 4.48 ± 0.16 SEM; all t(153) ≥ 2.31, all p ≤ 0.02). 
Interim discussion
In Experiment 2, we replicated the cross-modal benefit for orientations and pitches from Experiment 1. However, we did not find the expected interference between visual items of different strategies. Memorizing one orientation and one color has a similar advantage to memorizing one orientation and one pitch. Similarly, continuous color recall does not improve when the target color is memorized jointly with a pitch compared to an orientation or another color. Previous studies suggest that colors could be uniquely different in how they are memorized compared to other visual features (Bae, Olkkonen, Allred, Wilson, & Flombaum, 2014; Overkott & Souza, 2023). Color categories seem to guide color perception and memory in a more verbal and abstract than low-level visual manner. By analyzing our strategy questionnaire, we found evidence for differential strategy usage for each feature. While memory for orientations and pitches relies mostly on visual and auditory strategies, memory for colors seems to rely on verbal strategies, such as associating colors with words or numbers, as well as abstract strategies, such as associating colors with temperatures. 
General discussion
In this work, we sought to test cross-modal and cross-feature effects of low-level features in working memory. In all experiments, set size was always two, but the sensory modality and within-modal feature of each item were manipulated. In Experiment 1, we found a cross-modal advantage for the continuous recall precision of orientations and pitches. Although there was an effect of stimulus presentation duration on overall precision, it did not affect the cross-modal benefit. In Experiment 2, we tested whether we could replicate the cross-modal benefit with another visual feature (color) and observe interference when two distinct visual features are maintained in working memory (color and orientation). Here, we did replicate the cross-modal benefit for orientation and pitch, but we did not find a cross-modal benefit for color when color was paired with a pitch nor did sets of orientations and colors show interference. Rather, recall for orientation was improved when the target orientation was paired with a color compared with another orientation, in the same magnitude as it was improved when the target orientation was paired with a pitch. Participants did respond significantly faster when the target orientation was memorized jointly with a pitch compared to another orientation or a color, indicating a cross-modal and cross-feature advantage for response speed only for the orientation recall. Furthermore, our analysis of the feature-specific strategies shows that orientation and pitch relied more on visual and auditory strategies, respectively, whereas color was maintained with verbal or other cognitive strategies in addition to the visual strategy. 
These cross-modal advantages are comparable to visual load effects except for the drastic difference between load 1 and load 2 trials (Gorgoraptis et al., 2011), where the necessity to represent additional visual information is paired with additional requirements to represent and process context and cue information to select the to-be-reported item. Another way of describing these cross-modality and cross-feature advantages is to say that memorizing two items from different modalities or features is easier than memorizing two items from the same modality or feature. We found such a cross-modal advantage even for orientation–pitch trials compared to two-orientation trials in Experiment 2, where participants subjectively reported that memorizing pitch items was more difficult than memorizing orientations. These first results fit in the context of early working memory models (Baddeley & Hitch, 1974) and the sensory recruitment hypothesis (Serences et al., 2009). Our previous work provided neural evidence that not load alone but also specifically visual load decrease the decoding accuracy of mnemonic information of orientations in visual cortex but not parietal or frontal regions (Chopurian et al., 2024). Thus, we assume that working memory information of pitches and orientations is encoded and maintained in separable neural populations, which decreases interference between items and increases continuous recall precision. We found that, although overall recall precision increased, the cross-modal advantage did not change significantly with stimulus presentation duration. If longer encoding durations lead to more precise item representations, then the absence of a modulation of the cross-modal benefit could indicate that the interference between two similar items occurs during working memory maintenance or that interference during encoding does not vary with presentation length. However, it is also possible that the interstimulus interval, which was part of our design due to the sequential as opposed to simultaneous item presentation (for comparisons between sequential and simultaneous sets see: Blalock & Clegg, 2010; Chung, Brady, & Störmer, 2024), served as enough time to distinguish two orientations or two pitches to the extent where interference occurring at encoding can be reduced. The decreased precision in the within-modal sets could stem from a cumulation of different working-memory biases that arise specifically in sets with the same feature—for orientations, for example, attraction–repulsion bias or swap errors (Bays, 2016; Chunharas et al., 2022). Visual inspection of the response distributions (see Supplementary Figure S5) suggests no difference in cardinal or categorical biases or central tendency for orientation, color, and pitch recall in the different feature sets. 
For color recall, we did not find a significant cross-feature advantage in precision after correction for multiple-comparisons, but only reaction time. This could indicate that the benefit is generally smaller for color recall, possible due to different memorization strategies used for color working memory. Colors seem to be encoded and maintained in a more categorical format, even in visual cortex (Yan, Christophel, Allefeld, & Haynes, 2023). It is thus possible that, even though our task required precise recall, both visual features (colors and orientations) were still maintained in non-overlapping neural populations and potentially in different representational formats and thus interfered less on this low level. Participants reported applying different strategies to memorize colors. Previous work has shown that memorizing colors with explicit verbal labels increases recall precision (Forsberg, Johnson, & Logie, 2020; Overkott & Souza, 2022; Overkott & Souza, 2023; Souza and Skóra, 2017) and that colors could be perceived as inherently categorical in the way we communicate about those colors in our day-to-day life and are shaped by regularities in our environment (Witzel & Gegenfurtner, 2018). 
Although our strategy results could indicate some differences in the representational format and participants’ subjective insights about these formats, these results should be interpreted with caution. In this study, we did not experimentally manipulate the labeling strategy. Thus, we cannot be sure how exactly participants used these strategies, such as internalized speech or speaking out loud. Verbal information could interfere with pitch memory, as both are processed in auditory cortex and underlie similar mechanisms (Chan, Ho, & Cheung, 1998; Schulze & Koelsch, 2012). However, previous studies have shown that the interference between pitch and verbal information depends on musical training (Pechmann & Mohr, 1992; Schulze, Zysset, Mueller, Friederici, & Koelsch, 2011) and similarity of frequencies of the maintained auditory information (Deutsch, 1970; Salamé & Baddeley, 1989). Future studies could investigate how explicit verbal labeling for visually presented items influences pitch memory, how different strategies interact or change in cross-feature trials, and how interindividual differences in, for example, artistic training influence the maintenance of low-level sensory features. Our strategy questionnaire indicates that, on average, participants reported employing different strategies for different features, which could influence how these features are maintained. However, we do not have information about how these strategies might change with different feature combinations. Previous work suggests that individual differences in cognitive strategies or item identity itself are crucial to our understanding how capacity can be influenced (Brady, Störmer, & Alvarez, 2016; Overkott & Souza, 2022; Reeder, Pounder, Figueroa, Jüllig, & Azañón, 2024). As a further limitation, our experiments were online studies, so we could not ensure that all participants received the same auditory and visual input. Pitches and colors can be perceived differently depending on the participants’ audio output or screens. 
Our study provides evidence for differential interference between low-level features and a possible cortical specialization for the maintenance of pitches and orientations. Exploratory analyses of feature-specific strategies suggest that different visual features might be encoded and maintained in a different representational format. 
Acknowledgments
The authors thank Zhiqi Kang for his support with study preparations. VC and TBC were supported by a DFG Emmy Noether Research Group Grant (CH 1674/2-1). 
Commercial relationships: none. 
Corresponding authors: Vivien Chopurian; Thomas B. Christophel. 
Address: Department of Psychology, Humboldt-Universität zu Berlin, Berlin 10117, Germany. 
References
Baddeley, A. (1986). Working memory, Oxford: Oxford University Press.
Baddeley, A. (2003). Working memory and language: An overview. Journal of Communication Disorders, 36(3), 189–208, https://doi.org/10.1016/S0021-9924(03)00019-4. [CrossRef] [PubMed]
Baddeley, A., & Hitch, G. (1974). Working memory. In Bower, G. H. (Ed.), Psychology of learning and motivation (Vol. 8, pp. 47–89). Amsterdam: Elsevier, https://doi.org/10.1016/S0079-7421(08)60452-1.
Bae, G., & Luck, S. J. (2019). What happens to an individual visual working memory representation when it is interrupted? British Journal of Psychology, 110(2), 268–287, https://doi.org/10.1111/bjop.12339. [CrossRef]
Bae, G., Olkkonen, M., Allred, S. R., & Flombaum, J. I. (2015). Why some colors appear more memorable than others: A model combining categories and particulars in color working memory. Journal of Experimental Psychology: General, 144(4), 744–763, https://doi.org/10.1037/xge0000076. [CrossRef] [PubMed]
Bae, G., Olkkonen, M., Allred, S. R., Wilson, C., & Flombaum, J. I. (2014). Stimulus-specific variability in color working memory with delayed estimation. Journal of Vision, 14(4):7, 1–23, https://doi.org/10.1167/14.4.7. [CrossRef]
Bays, P. M. (2016). Evaluating and excluding swap errors in analogue tests of working memory. Scientific Reports, 6(1), 19203, https://doi.org/10.1038/srep19203. [CrossRef] [PubMed]
Bays, P. M., Wu, E. Y., & Husain, M. (2011). Storage and binding of object features in visual working memory. Neuropsychologia, 49(6), 1622–1631, https://doi.org/10.1016/j.neuropsychologia.2010.12.023. [CrossRef] [PubMed]
Blalock, L. D., & Clegg, B. A. (2010). Encoding and representation of simultaneous and sequential arrays in visuospatial working memory. Quarterly Journal of Experimental Psychology, 63(5), 856–862, https://doi.org/10.1080/17470211003690680. [CrossRef]
Brady, T. F., Störmer, V. S., & Alvarez, G. A. (2016). Working memory is not fixed-capacity: More active storage capacity for real-world objects than for simple stimuli. Proceedings of the National Academy of Sciences, USA, 113(27), 7459–7464, https://doi.org/10.1073/pnas.1520027113. [CrossRef]
Brown, L. A., & Wesley, R. W. (2013). Visual working memory is enhanced by mixed strategy use and semantic coding. Journal of Cognitive Psychology, 25(3), 328–338, https://doi.org/10.1080/20445911.2013.773004. [CrossRef]
Cai, Y., Fulvio, J. M., Samaha, J., & Postle, B. R. (2022). Context binding in visual working memory is reflected in bilateral event-related potentials, but not in contralateral delay activity. eNeuro, 9(6), ENEURO.0207–22.2022, https://doi.org/10.1523/ENEURO.0207-22.2022. [CrossRef] [PubMed]
Chan, A. S., Ho, Y.-C., & Cheung, M.-C. (1998). Music training improves verbal memory. Nature, 396(6707), 128, https://doi.org/10.1038/24075. [PubMed]
Chopurian, V., Weber, S., & Christophel, T. (2024). Distinct functional roles of distributed cortical representations for working memory storage. bioRxiv, https://doi.org/10.1101/2024.02.02.578618.
Chua, F. K. (1990). The processing of spatial frequency and orientation information. Perception & Psychophysics, 47(1), 79–86, https://doi.org/10.3758/BF03208168. [PubMed]
Chung, Y. H., Brady, T. F., & Störmer, V. S. (2024). Sequential encoding aids working memory for meaningful objects’ identities but not for their colors. Memory & Cognition, 52(8), 2119–2131, https://doi.org/10.3758/s13421-023-01486-4. [PubMed]
Chunharas, C., Rademaker, R. L., Brady, T. F., & Serences, J. T. (2022). An adaptive perspective on visual working memory distortions. Journal of Experimental Psychology: General, 151(10), 2300–2323, https://doi.org/10.1037/xge0001191. [PubMed]
Cohen, J. (2013). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Routledge.
Cohen, M. A., Konkle, T., Rhee, J. Y., Nakayama, K., & Alvarez, G. A. (2014). Processing multiple visual objects is limited by overlap in neural channels. Proceedings of the National Academy of Sciences, USA, 111(24), 8955–8960, https://doi.org/10.1073/pnas.1317860111.
Cohen, M. A., Rhee, J. Y., & Alvarez, G. A. (2016). Limits on perceptual encoding can be predicted from known receptive field properties of human visual cortex. Journal of Experimental Psychology: Human Perception and Performance, 42(1), 67–77, https://doi.org/10.1037/xhp0000108. [PubMed]
Cowan, N., Saults, J. S., & Blume, C. L. (2014). Central and peripheral components of working memory storage. Journal of Experimental Psychology: General, 143(5), 1806–1836, https://doi.org/10.1037/a0036814. [PubMed]
Deutsch, D. (1970). Tones and numbers: Specificity of interference in immediate memory. Science, 168(3939), 1604–1605, https://doi.org/10.1126/science.168.3939.1604. [PubMed]
Forsberg, A., Johnson, W., & Logie, R. H. (2020). Cognitive aging and verbal labeling in continuous visual memory. Memory & Cognition, 48(7), 1196–1213, https://doi.org/10.3758/s13421-020-01043-3. [PubMed]
Franconeri, S. L., Alvarez, G. A., & Cavanagh, P. (2013). Flexible cognitive resources: Competitive content maps for attention and memory. Trends in Cognitive Sciences, 17(3), 134–141, https://doi.org/10.1016/j.tics.2013.01.010. [PubMed]
Girden, E. (1992). ANOVA: Repeated measures. London: SAGE Publications.
Gonthier, C. (2021). Charting the diversity of strategic processes in visuospatial short-term memory. Perspectives on Psychological Science, 16(2), 294–318, https://doi.org/10.1177/1745691620950697.
Gorgoraptis, N., Catalao, R. F., Bays, P. M., & Husain, M. (2011). Dynamic updating of working memory resources for visual objects. The Journal of Neuroscience, 31(23), 8502–8511, https://doi.org/10.1523/JNEUROSCI.0208-11.2011.
Hansen, T., & Gegenfurtner, K. R. (2006). Color scaling of discs and natural objects at different luminance levels. Visual Neuroscience, 23(3–4), 603–610, https://doi.org/10.1017/S0952523806233121. [PubMed]
Hardman, K. O., Vergauwe, E., & Ricker, T. J. (2017). Categorical working memory representations are used in delayed estimation of continuous colors. Journal of Experimental Psychology: Human Perception and Performance, 43(1), 30–54, https://doi.org/10.1037/xhp0000290. [PubMed]
Harrison, S. A., & Tong, F. (2009). Decoding reveals the contents of visual working memory in early visual areas. Nature, 458(7238), 632–635, https://doi.org/10.1038/nature07832. [PubMed]
Ho, H.-N., Iwai, D., Yoshikawa, Y., Watanabe, J., & Nishida, S. (2014). Combining colour and temperature: A blue object is more likely to be judged as warm than a red object. Scientific Reports, 4(1), 5527, https://doi.org/10.1038/srep05527. [PubMed]
Hurlstone, M. J., Hitch, G. J., & Baddeley, A. D. (2014). Memory for serial order across domains: An overview of the literature and directions for future research. Psychological Bulletin, 140(2), 339–373, https://doi.org/10.1037/a0034221. [PubMed]
Kassambara, A. (2023). rstatix: Pipe-friendly framework for basic statistical tests. R package version 0.7.2. Retrieved from https://cran.r-project.org/web/packages/rstatix/index.html.
Kiyonaga, A., & D'Esposito, M. (2020). Competition and control during working memory. Cambridge, UK: Cambridge University Press.
Lehnert, G., & Zimmer, H. D. (2006). Auditory and visual spatial working memory. Memory & Cognition, 34(5), 1080–1090, https://doi.org/10.3758/BF03193254. [PubMed]
Luck, S. J., & Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 390(6657), 279–281, https://doi.org/10.1038/36846. [PubMed]
Lund, U., Agostinelli, C., Arai, H., Gagliardi, A., García-Portugués, E., Giunchi, D., ... Rotolo, F. (2024). Package ‘circular.’ Retrieved from https://cran.r-project.org/web/packages/circular/circular.pdf.
Ma, W. J., Husain, M., & Bays, P. M. (2014). Changing concepts of working memory. Nature Neuroscience, 17(3), 347–356, https://doi.org/10.1038/nn.3655. [PubMed]
Manohar, S. G., Pertzov, Y., & Husain, M. (2017). Short-term memory for spatial, sequential and duration information. Current Opinion in Behavioral Sciences, 17, 20–26, https://doi.org/10.1016/j.cobeha.2017.05.023. [PubMed]
Oberauer, K., Lewandowsky, S., Awh, E., Brown, G. D. A., Conway, A., Cowan, N., ... Ward, G. (2018). Benchmarks for models of short-term and working memory. Psychological Bulletin, 144(9), 885–958, https://doi.org/10.1037/bul0000153. [PubMed]
Overkott, C., & Souza, A. S. (2022). Verbal descriptions improve visual working memory but have limited impact on visual long-term memory. Journal of Experimental Psychology: General, 151(2), 321–347, https://doi.org/10.1037/xge0001084. [PubMed]
Overkott, C., & Souza, A. S. (2023). The fate of labeled and nonlabeled visual features in working memory. Journal of Experimental Psychology: Human Perception and Performance, 49(3), 384–407, https://doi.org/10.1037/xhp0001089. [PubMed]
Pechmann, T., & Mohr, G. (1992). Interference in memory for tonal pitch: Implications for a working-memory model. Memory & Cognition, 20(3), 314–320, https://doi.org/10.3758/BF03199668. [PubMed]
Pereira Seabra, J., Chopurian, V., Souza, A. S., & Christophel, T. (2024). Verbal encoding strategies in visuo-spatial working memory. Journal of Cognition, 8(1), 2, https://doi.org/10.5334/joc.406.
Rademaker, R. L., Chunharas, C., & Serences, J. T. (2019). Coexisting representations of sensory and mnemonic information in human visual cortex. Nature Neuroscience, 22(8), 1336–1344, https://doi.org/10.1038/s41593-019-0428-x. [PubMed]
Reeder, R. R., Pounder, Z., Figueroa, A., Jüllig, A., & Azañón E. (2024). Non-visual spatial strategies are effective for maintaining precise information in visual working memory. Cognition, 251, 105907, https://doi.org/10.1016/j.cognition.2024.105907. [PubMed]
Salamé, P., & Baddeley, A. (1989). Effects of background music on phonological short-term memory. The Quarterly Journal of Experimental Psychology, 41(1), 107–122, https://doi.org/10.1080/14640748908402355.
Salmela, V. R., Moisala, M., & Alho, K. (2014). Working memory resources are shared across sensory modalities. Attention, Perception, & Psychophysics, 76(7), 1962–1974, https://doi.org/10.3758/s13414-014-0714-3. [PubMed]
Schneegans, S., & Bays, P. M. (2018). Drift in neural population activity causes working memory to deteriorate over time. The Journal of Neuroscience, 38(21), 4859–4869, https://doi.org/10.1523/JNEUROSCI.3440-17.2018.
Schulze, K., & Koelsch, S. (2012). Working memory for speech and music. Annals of the New York Academy of Sciences, 1252(1), 229–236, https://doi.org/10.1111/j.1749-6632.2012.06447.x. [PubMed]
Schulze, K., Zysset, S., Mueller, K., Friederici, A. D., & Koelsch, S. (2011). Neuroarchitecture of verbal and tonal working memory in nonmusicians and musicians. Human Brain Mapping, 32(5), 771–783, https://doi.org/10.1002/hbm.21060. [PubMed]
Schurgin, M. W., Wixted, J. T., & Brady, T. F. (2020). Psychophysical scaling reveals a unified theory of visual memory strength. Nature Human Behaviour, 4(11), 1156–1172, https://doi.org/10.1038/s41562-020-00938-0. [PubMed]
Serences, J. T., Ester, E. F., Vogel, E. K., & Awh, E. (2009). Stimulus-specific delay activity in human primary visual cortex. Psychological Science, 20(2), 207–214, https://doi.org/10.1111/j.1467-9280.2009.02276.x. [PubMed]
Souza, A. S., & Skóra, Z. (2017). The interplay of language and visual perception in working memory. Cognition, 166, 277–297, https://doi.org/10.1016/j.cognition.2017.05.038. [PubMed]
Thyer, W., Adam, K. C. S., Diaz, G. K., Velázquez Sánchez, I. N., Vogel, E. K., & Awh, E. (2022). Storage in visual working memory recruits a content-independent pointer system. Psychological Science, 33(10), 1680–1694, https://doi.org/10.1177/09567976221090923. [PubMed]
Wennberg, J., & Serences, J. (2023). Mixing and mingling: Inter-item competition in visual working memory is both feature-general and feature-specific. Psychological Science, 86(6), 1846–1860, https://doi.org/10.3758/s13414-024-02933-3.
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., ... Yutani, H. (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686.
Witzel, C., & Gegenfurtner, K. R. (2018). Color perception: Objects, constancy, and categories. Annual Review of Vision Science, 4(1), 475–499, https://doi.org/10.1146/annurev-vision-091517-034231. [PubMed]
Yan, C., Christophel, T. B., Allefeld, C., & Haynes, J.-D. (2023). Categorical working memory codes in human visual cortex. NeuroImage, 274, 120149, https://doi.org/10.1016/j.neuroimage.2023.120149. [PubMed]
Zhou, C., Lorist, M. M., & Mathôt, S. (2022a). Categorical bias as a crucial parameter in visual working memory: The effect of memory load and retention interval. Cortex, 154, 311–321, https://doi.org/10.1016/j.cortex.2022.05.007. [PubMed]
Zhou, C., Lorist, M. M., & Mathôt, S. (2022b). Is categorization in visual working memory a way to reduce mental effort? A pupillometry study. Cognitive Science, 46(9), e13194, https://doi.org/10.1111/cogs.13194. [PubMed]
Figure 1.
 
(A) Experimental setup. Example trial for a cross-modal trial with continuous pitch recall. The two stimuli (here, one pitch and one orientation) are presented sequentially with an interstimulus interval of 1 second. After a 2-second delay, during which participants had to keep both items in mind, a serial order cue indicated whether they had to recall the first or second item. Here, the first item was cued, so they had to recall the pitch. The continuous recall allowed them to select any pitch between 246.94 Hz (B3) and 523.54 Hz (C5). (B) Experimental conditions with all combinations of target and non-target features. The trials for the experimental conditions were presented in a randomized order. (C, D) Main effects of each feature set for orientations (C) and pitches (D). Precision is the inverse of the SD in radians or semitones, respectively. **p < 0.01 and ***p < 0.001, uncorrected. Error bars show SEM.
Figure 1.
 
(A) Experimental setup. Example trial for a cross-modal trial with continuous pitch recall. The two stimuli (here, one pitch and one orientation) are presented sequentially with an interstimulus interval of 1 second. After a 2-second delay, during which participants had to keep both items in mind, a serial order cue indicated whether they had to recall the first or second item. Here, the first item was cued, so they had to recall the pitch. The continuous recall allowed them to select any pitch between 246.94 Hz (B3) and 523.54 Hz (C5). (B) Experimental conditions with all combinations of target and non-target features. The trials for the experimental conditions were presented in a randomized order. (C, D) Main effects of each feature set for orientations (C) and pitches (D). Precision is the inverse of the SD in radians or semitones, respectively. **p < 0.01 and ***p < 0.001, uncorrected. Error bars show SEM.
Figure 2.
 
(A, C) Interaction effects between encoding duration and cross-modal benefit for orientations (A) and pitches (C). Cross-modal advantage was calculated by subtracting the precision for within-modal conditions from cross-modal conditions. Thus, positive values suggest an advantage for cross-modal pairings, and negative values suggest a disadvantage. Asterisks above each point indicate a significant cross-modal advantage for the respective stimulus presentation condition (uncorrected t-test). Average shows the overall cross-modal advantage across all three encoding durations for each modality. (B, D) Main effects of stimulus presentation duration for orientations (B) and pitches (D). Precision is 1/SD in radians for orientations and semitones for pitches. *p < 0.05, **p < 0.01, and ***p < 0.001. Error bars show the SEM.
Figure 2.
 
(A, C) Interaction effects between encoding duration and cross-modal benefit for orientations (A) and pitches (C). Cross-modal advantage was calculated by subtracting the precision for within-modal conditions from cross-modal conditions. Thus, positive values suggest an advantage for cross-modal pairings, and negative values suggest a disadvantage. Asterisks above each point indicate a significant cross-modal advantage for the respective stimulus presentation condition (uncorrected t-test). Average shows the overall cross-modal advantage across all three encoding durations for each modality. (B, D) Main effects of stimulus presentation duration for orientations (B) and pitches (D). Precision is 1/SD in radians for orientations and semitones for pitches. *p < 0.05, **p < 0.01, and ***p < 0.001. Error bars show the SEM.
Figure 3.
 
(A) Experiment 2 trial. Two features were shown sequentially for 0.4 second, with an interstimulus interval of 1 second. The delay was 2 seconds long, and, after the cue, participants had 4 seconds to recall the target feature. The intertrial interval was again randomized among 1, 1.5, and 2 seconds. (B) Experimental conditions. Each feature (orientation, pitch, color) was paired with another feature (orientation, color, pitch), so that each target feature had two cross-feature conditions (e.g., target orientation was paired with a pitch and an orientation) and one within-feature condition (e.g., target orientation was paired with another orientation). (CE) Recall precision for each target feature: orientation (C), color in radians−1 (D), and pitch in semitones−1 (E), as well as each within- and cross-feature set. *p < 0.05, **p < 0.01, and ***p < 0.001. Error bars show the SEM.
Figure 3.
 
(A) Experiment 2 trial. Two features were shown sequentially for 0.4 second, with an interstimulus interval of 1 second. The delay was 2 seconds long, and, after the cue, participants had 4 seconds to recall the target feature. The intertrial interval was again randomized among 1, 1.5, and 2 seconds. (B) Experimental conditions. Each feature (orientation, pitch, color) was paired with another feature (orientation, color, pitch), so that each target feature had two cross-feature conditions (e.g., target orientation was paired with a pitch and an orientation) and one within-feature condition (e.g., target orientation was paired with another orientation). (CE) Recall precision for each target feature: orientation (C), color in radians−1 (D), and pitch in semitones−1 (E), as well as each within- and cross-feature set. *p < 0.05, **p < 0.01, and ***p < 0.001. Error bars show the SEM.
Figure 4.
 
(A, C, E) Overall cross-feature advantage for target orientations (A), colors (C), and pitches (E). (B, D, F) Cross-feature advantages were analyzed for each item position and non-target feature separately for target orientations (B), colors (D), and pitches (F). Asterisks above the error bars indicate a significant cross-feature effect. *p < 0.05, **p < 0.01, and ***p < 0.001. Error bars show the SEM.
Figure 4.
 
(A, C, E) Overall cross-feature advantage for target orientations (A), colors (C), and pitches (E). (B, D, F) Cross-feature advantages were analyzed for each item position and non-target feature separately for target orientations (B), colors (D), and pitches (F). Asterisks above the error bars indicate a significant cross-feature effect. *p < 0.05, **p < 0.01, and ***p < 0.001. Error bars show the SEM.
Figure 5.
 
Strategy use per feature in Experiment 2. Likert scale from 0 (strongly disagree) to 7 (strongly agree). An example statement about strategy use is “I memorized the orientations/pitches/colors during the delay period by how they looked.” Error bars indicate SEM.
Figure 5.
 
Strategy use per feature in Experiment 2. Likert scale from 0 (strongly disagree) to 7 (strongly agree). An example statement about strategy use is “I memorized the orientations/pitches/colors during the delay period by how they looked.” Error bars indicate SEM.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×