Hierarchical stimuli have been widely used to study global and local processing. Two classic phenomena have been observed using these stimuli: the global advantage effect (we identify the global shape faster) and an interference effect (we identify shape slower when the global and local shapes are different). Because these phenomena have been observed during shape categorization tasks, it is unclear whether they reflect the categorical judgment or the underlying shape representation. Understanding the underlying shape representation is also critical because both global and local processing are modulated by stimulus properties.

We performed two experiments to investigate these issues. In Experiment 1, we show that these phenomena can be observed in a same-different task, and that participants show systematic variation in response times across image pairs. We show that the response times to any pair of images can be accurately predicted using two factors: their dissimilarity and their distinctiveness relative to other images. In Experiment 2, we show that these phenomena can also be observed in a visual search task where participant did not have to make any categorical shape judgments. Here too, participants showed highly systematic variations in response time that could be explained as a linear sum of shape comparisons across global and local scales. Finally, the dissimilarity and distinctiveness factors estimated from the same-different task were systematically related to the search dissimilarities observed during visual search.

In sum, our results show that global and local processing phenomena are properties of a systematic shape representation governed by simple rules.

*Participants*. There were 16 human participants (11 men, aged 20–30 years) in this experiment. We chose this number of participants based on our previous studies of object categorization in which this sample size yielded consistent responses (Mohan & Arun, 2012).

*Stimuli*. We created hierarchical stimuli by placing eight local shapes uniformly along the perimeter of a global shape. All local shapes had the same area (0.77 squared degrees of visual angle), and all global shapes occupied an area that was 25 times larger. We used seven distinct shapes at the global and local levels to create 49 hierarchical stimuli (all stimuli can be seen in Supplementary Section S5). Stimuli were shown as white against a black background.

*Procedure*. Participants were seated approximately 60 cm from a computer monitor under the control of custom programs written in MATLAB with routines from PsychToolbox (Brainard, 1997). Participants performed two blocks of the same-different task, corresponding to global or local shape matching. In both blocks, a pair of hierarchical shapes were shown to the participant and the participant had to respond if the shapes contained the same or different shape at a particular global/local level (key “Z” for same, and “M” for different). Each block started with a practice block with eight trials involving hierarchical stimuli made of shapes that were not used in the main experiment. Participants were given feedback after each trial during the practice block.

*Stimulus pairs*. To avoid any response bias, we selected stimulus pairs in each block such that the proportion of same- and different-responses were equal. Each block consisted of 588 stimulus pairs. These pairs were divided equally into four groups of 147 pairs (Figure 2A): (1) pairs with both global and local shape different (GDLD); (2) pairs with same global shape but different local shape (GSLD); (3) pairs with different global shape but same local shape (GDLS), and (4) pairs with same global and local shape (GSLS; i.e. identical shapes) Because there were different numbers of total possible pairs in each category we selected pairs as follows: for GSLS pairs, there are 49 unique stimuli and therefore 49 pairs, so we repeated each pair three times to obtain 147 pairs. For GSLD and GDLS pairs, there are 147 unique pairs, so each pair was used exactly once. For GDLD pairs, there are 882 possible pairs, so we selected 147 pairs that consisted of 21 congruent pairs (i.e. each stimulus containing identical global and local shapes), 21 incongruent pairs (in which global shape of one stimulus was the local shape of the other, and vice-versa), and 105 randomly chosen other pairs. The full set of 588 stimulus pairs were fixed across all participants. Each stimulus pair was shown twice. Thus, each block consisted of 588 × 2 = 1176 trials. Error trials were repeated after a random number of other trials.

*p*< 0.05, sign-rank test on participant-wise accuracy in the two blocks).

*isoutlier*function, MATLAB 2018). We pooled the reaction times across participants for each image pair, and all response times greater than three scaled median absolute deviations away from the median were removed. In practice this procedure removed approximately 8% of the total responses.

*Estimating data reliability*. To estimate an upper limit on the performance of any model, we reasoned that the performance of any model cannot exceed the reliability of the data itself. To estimate the reliability of the data, we first calculated the average correlation between two halves of the data. However, doing so underestimates the true reliability because the correlation is based on two halves of the data rather than the entire dataset. To estimate this true reliability, we applied a Spearman-Brown correction on the split-half correlation. This Spearman-Brown corrected correlation (

*rc*) is given by

*rc*= 2

*r*/(1+

*r*) where

*r*is the correlation between the two halves. This data reliability is denoted as

*rc*throughout the text to distinguish it from the standard Pearson's correlation coefficient (denoted as

*r*).

*t*-test. However, such a test ignores the complexity of the data and might hide many confounding effects: for instance, a faster response in the global block might vary by subject or vary with image pair. A more appropriate statistical test would be an analysis of variance (ANOVA), but it is based on three assumptions: independence of observations, normality of errors, homogeneous variance across all conditions, and balanced data across all conditions. Violations of these assumptions leads to incorrect estimates of effect size and their statistical significance (Glass, Peckham, & Sanders, 1972; Lix, Keselman, & Keselman, 1996). In our case, the experimental design violates the assumption of independence because the same participants performed both global and local blocks - this means any systematic variations due to participants are not independent across blocks. Second, the residuals of an ANOVA performed on RT data are not normally distributed (Supplementary Figure S1). Third, due to removal of excessively long response times, the data can become unbalanced (i.e. have unequal numbers of observations in each experimental condition). This can make the model interpretation ambiguous and ANOVA inoperable (Shaw, Mitchell-olds, & Mitchell-olds, 1993). A potential solution is to use repeated measures ANOVA, but this is typically applied on the average response times, which ignores the trial-to-trial variability present in the data and also continues to assume normally distributed residuals.

*p*< 0.00005 for GDLD pairs; F(1,8647) = 413.06,

*p*< 0.00005, \(\eta _p^2 = 0.046\) for GSLS pairs; see Supplementary Section S1 for details). We conclude that participants show a robust global advantage effect in the same-different task.

*p*< 0.000005, \(\eta _p^2 = 0.047\); see Supplementary Section S1).

*p*< 0.00005,\(\;\eta _p^2 = 0.039\); see Supplementary Section S1).

*p*< 0.00005, \(\eta _p^2 = 0.029\) in the global block; F(1,1206) = 31.95,

*p*< 0.00005,\(\;\eta _p^2 = 0.026\) in the local block; see Supplementary Section S1).

*p*< 0.0005, \(\eta _p^2 = 0.005\) in the global block; F(1,4269) = 38.85,

*p*< 0.0005, \(\eta _p^2 = \) 0.009 in the local block; see Supplementary Section S1).

*r*= 0.16,

*p*= 0.26), suggesting that they are qualitatively different. Interestingly, distinctiveness estimated from GSLS pairs is correlated with both SAME and DIFFERENT response times in both blocks, and also explained the faster responses to the congruent stimuli (Supplementary Section S3).

_{G}and k

_{L}are constants that specify the contribution of GD and LD toward the response time, and L

_{BC}denotes the dissimilarity between local shapes B and C. Because there are seven possible local shapes there are only

^{7}C

_{2}= 21 possible local shape terms. When this equation is written down for each GSLD pair, we get a system of linear equations of the form y = Xb where y is a 147 × 1 vector containing the GSLD response times, X is a 147 × 23 matrix containing the net global distinctiveness and net local distinctiveness as the first two columns, and 0/1 in the other columns corresponding to whether a given local shape pair is present in that image pair or not, and b is a 23 × 1 vector of unknowns containing the weights k

_{G}, k

_{L}, and the 21 estimated local dissimilarities. Because there are 147 equations and only 22 unknowns, we can estimate the unknown vector b using linear regression.

*r*= 0.86,

*n*= 147, and

*p*< 0.00005; Figure 3C). These model fits were close to the reliability of the data (rc = 0.83 ± 0.02; see Methods), suggesting that the model explained nearly all the explainable variance in the data. However, the model fits do not elucidate which factor contributes more toward response times. To do so, we performed a partial correlation analysis in which we calculated the correlation between observed response times and each factor after regressing out the contributions of the other two factors. For example, to estimate the contribution of global distinctiveness, we calculated the correlation between observed response times and global distinctiveness after regressing out the contribution of local distinctiveness and the estimated local dissimilarity values corresponding to each image pair. This revealed a significant negative correlation (

*r*= −0.81,

*n*= 147, and

*p*< 0.00005; see Figure 3C, inset). Likewise, we obtained a significant positive partial correlation between local dissimilarities and observed response times after regressing out the other factors (

*r*= 0.69,

*n*= 147, and

*p*< 0.00005; see Figure 3C, inset). However, local distinctiveness showed positive partial correlation (

*r*= 0.30,

*n*= 147, and

*p*< 0.0005) suggesting that locally distinctive shapes slow down responses in the global block. Thus, response times are faster for more globally distinctive image pairs, and slower for more dissimilar image pairs.

_{G}and k

_{L}are unknown constants that specify the contribution of the net global and local distinctiveness, and G

_{AC}is the dissimilarity between the global shapes A and C. As before, this model is applicable to all the GDLS pairs (

*n*= 147), has 23 free parameters and can be solved using straightforward linear regression.

*r*= 0.72,

*n*= 147, and

*p*< 0.00005; see Figure 3D). This correlation was close to the reliability of the data itself (

*rc*= 0.80 ± 0.03), suggesting that the model explains nearly all the explainable variance in the response times. To estimate the unique contribution of distinctiveness and dissimilarity, we performed a partial correlation analysis as before. We obtained a significant partial negative correlation between observed response times and local distinctiveness after regressing out global distinctiveness and global dissimilarity (

*r*= -0.70,

*n*= 147, and

*p*< 0.00005; see Figure 3D, inset). We also obtained a significant positive partial correlation between observed response times and global dissimilarity after factoring out both distinctiveness terms (

*r*= 0.47,

*n*= 147, and

*p*< 0.00005; see Figure 3D, inset). Finally, as before, global distinctiveness showed a positive correlation with local “SAME” responses after accounting for the other factors (

*r*= 0.36,

*n*= 147, and

*p*< 0.00005; see Figure 3D inset).

_{G}and k

_{L}are unknown constants that specify their contributions, G

_{AC}is the dissimilarity between the global shapes A and C, and L

_{BD}is the dissimilarity between the local shapes B and D. Note that, unlike the “SAME” response model, the sign of G

_{AC}and L

_{BD}is negative because large global or local dissimilarity should speed up “DIFFERENT” responses. The resulting model, which applies to both GDLS and GDLD pairs, consists of 44 free parameters, which are the two constants specifying the contribution of the global and local distinctiveness and 21 terms each for the pairwise dissimilarities at the global and local levels respectively. As before, this is a linear model whose free parameters can be estimated using straightforward linear regression.

*r*= 0.82,

*n*= 294, and

*p*< 0.00005; see Figure 3E). This correlation was close to the data reliability itself (

*rc*= 0.84 ± 0.02), implying that the model explained nearly all the explainable variance in the data. To estimate the unique contributions of each term, we performed a partial correlation analysis as before. We obtained a significant negative partial correlation between observed response times and global distinctiveness after regressing out all other factors (

*r*= ‒0.21,

*n*= 294, and

*p*< 0.0005; see Figure 3E, inset). We also obtained a significant negative partial correlation between observed response times and both dissimilarity terms (

*r*= ‒0.76,

*n*= 294, and

*p*< 0.00005 for global terms; and

*r*= ‒0.33,

*n*= 294, and

*p*< 0.00005 for local terms; see Figure 3E, inset). However, we note that the contribution of global terms is larger than the contribution of local terms. As before, local distinctiveness did not contribute significantly to “DIFFERENT” responses in the global block (

*r*= ‒0.06,

*p*= 0.34, and

*n*= 294; see Figure 3E, inset). We conclude that “DIFFERENT” responses in the global block are faster for globally distinctive image pairs, and for dissimilar image pairs.

*r*= 0.87,

*n*= 294, and

*p*< 0.00005; see Figure 3F). This correlation was close to the data reliability (

*rc*= 0.85 ± 0.01) suggesting that the model explained nearly all the variance in the response times. A partial correlation analysis revealed a significant negative partial correlation for all terms except global distinctiveness (correlation between observed RT and each factor after accounting for all others:

*r*= ‒0.26,

*n*= 294, and

*p*< 0.00005 for local distinctiveness;

*r*= ‒0.04,

*n*= 294, and

*p*= 0.55 for global distinctiveness;

*r*= ‒0.32,

*n*= 294, and

*p*< 0.00005 for global terms; and

*r*= ‒0.86,

*n*= 294, and

*p*< 0.00005 for local terms; see Figure 3F). In contrast to the global block, the contribution of global terms was smaller than that of the local terms. We conclude that “DIFFERENT” responses in the local block are faster for locally distinctive image pairs and for dissimilar image pairs.

*p*< 0.01, sign-rank test). Taken together, these positive correlations imply that the dissimilarities driving the “SAME” and “DIFFERENT” responses at both global and local levels are driven by a common underlying shape representation.

*Participants.*Eight right-handed participants (6 men, aged 23–30 years) participated in the study. We selected this number of participants here and in subsequent experiments based on the fact that similar sample sizes have yielded extremely consistent visual search data in our previous studies (Mohan & Arun, 2012; Vighneshvel & Arun, 2013; Pramod & Arun, 2016).

*Stimuli.*We used the same set of 49 stimuli as in Experiment 1, which were created by combining seven possible shapes at the global level with seven possible shapes at the local level in all possible combinations. The full stimulus set can be seen in Supplementary Section S5.

*Procedure.*Participants were seated approximately 60 cm from a computer. Each participant performed a baseline motor block, a practice block, and then the main visual search block. In the baseline block, on each trial, a white circle appeared on either side of the screen and participants had to indicate the side on which the circle appeared. We included this block so that participants would become familiar with the key press associated with each side of the screen, and to estimate a baseline motor response time for each participant. In the practice block, participants performed 20 correct trials of visual search involving unrelated objects to become familiarized with the main task.

_{2}= 1176 unique searches and 2352 total trials. Trials in which the participant made an error or did not respond within 10 seconds were repeated randomly later. In practice, these repeated trials were very few in number, because participants accuracy was extremely high (mean and SD accuracy: 98.4% ± 0.7% across participants).

_{AC}is the dissimilarity between the global shapes, L

_{BD}is the dissimilarity between the local shapes, X

_{AD}and X

_{BC}are the across-object dissimilarities between the global shape of one stimulus and the local shape of the other, and W

_{AB}and W

_{CD}are the dissimilarities between global and local shape within each object. Thus, there are four sets of unknown parameters in the model, corresponding to global terms, local term, across-object terms, and within-object terms. Each set contains pairwise dissimilarities among the seven shapes used to create the stimuli. Note that model terms repeat across image pairs: for instance, the term G

_{AC}is present for every image pair in which A is a global shape of one and C is the global shape of the other. Writing this equation for each of the 1176 image pairs results in a total of 1176 equations corresponding to each image pair, but with only 21 shape pairs times four types (global, local, across, and within) + 1 = 85 free parameters. The advantage of this model is that it allows each set of model terms to behave independently, thereby allowing potentially different shape representations to emerge for each type through the course of model fitting.

*regress*function, MATLAB).

*r*= 0.85 ± 0.01,

*p*< 0.00005 in all cases), indicating that the model is not overfitting to the data.

^{49}C2 = 1176 pairs). Participants were highly accurate in the task (mean ± SD accuracy: 98.4% ± 0.7% across participants).

*p*= 0.48, sign-rank test across participant-wise accuracy). However, they were faster on GDLS searches compared with GSLD searches (search times, mean ± SD: 1.90 ± 0.40 seconds across 147 GDLS pairs, 2.11 ± 0.56 seconds across 147 GSLD pairs; Figure 4C). This difference was statistically significant as evidenced by a main effect of scale of change in a linear mixed effects model analysis performed on inverse RT (F(1,4696) = 163.24,

*p*< 0.00005, \(\eta _p^2 = \) 0.034; for details see Supplementary Section S4). We conclude that searching for a target differing in global shape is easier than searching for a target differing in local shape. Thus, there is a robust global advantage effect in visual search.

*p*< 0.00005, \(\eta _p^2 = \) 0.051 for main effect of congruence; Supplementary Section S4).

*n*= 2 ×

^{7}C

_{2}× 2 ×

^{5}P

_{2}= 1680). Target congruent searches were slightly faster than target incongruent searches (mean ± SD of RT: 1095 ± 399 ms and 1117 ± 401 ms for congruent and incongruent targets; Figure 4G,

*top panel*). However, this difference was not statistically insignificant, as evidenced by the lack of a main effect of congruence in a linear mixed effects model analysis performed on inverse response times (F(1,328) = 3.73,

*p*= 0.54; see Supplementary Section S4).

*n*=

^{7}C

_{2}× 2 ×

^{5}P

_{2}= 840). Searches with congruent distractors were faster than incongruent distractors (mean ± SD of RT: 1047 ± 96 ms and 1117 ± 116 ms for congruent and incongruent distractors; see Figure 4G,

*bottom panel).*This difference was statistically significant as evidenced by a main effect of congruence in a linear mixed effects model applied to inverse response times (F(1,328) = 34.85,

*p*< 0.00005, \(\eta _p^2 = \) 0.096; see Supplementary Section S4).

*r*= 0.83,

*n*= 1176, and

*p*< 0.00005).

_{AC}is the dissimilarity between the global shapes, L

_{BD}is the dissimilarity between the local shapes, X

_{AD}and X

_{BC}are the across-object dissimilarities between the global shape of one stimulus and the local shape of the other, and W

_{AB}and W

_{CD}are the dissimilarities between global and local shape within each object. Because there are seven possible global shapes, there are

^{7}C

_{2}= 21 pairwise global-global dissimilarities corresponding to G

_{AB}, G

_{AC}, G

_{AD}, etc., and likewise for L, X, and W terms. Thus, in all, the model has 21 part-part relations times 4 types + 1 constant = 85 free parameters. Importantly, the multiscale part sum model allows for completely independent shape representations at the global level, local level, and even for comparisons across objects and within object. The model works because the same global part dissimilarity G

_{AC}can occur in many shapes where the same pair of global shapes A and C are paired with various other local shapes.

*r*= 0.88,

*n*= 1176, and

*p*< 0.00005; see Figure 5B). This high degree of fit matches the reliability of the data (mean ± SD reliability:

*rc*= 0.84 ± 0.01; see Methods).

*r*= 0.60,

*p*< 0.005 for L terms;

*r*= 0.75,

*p*< 0.00005 for X terms;

*r*= -0.60,

*p*< 0.005 for W terms; Figure 5C). This is consistent with the finding that hierarchical stimuli and large/small stimuli are driven by a common representation at the neural level (Sripati & Olson, 2009).

*p*< 0.005, sign-rank test on 21 within-object terms). The effect of within-object dissimilarity is akin to the effect of distracter heterogeneity in visual search. Just as similar distracters make search easier, similar shapes at the global and local level within a shape make the search easier. We have made a similar observation previously with two-part objects (Pramod & Arun, 2016).

^{−1}for global, 0.30 ± 0.11 s

^{−1}for local,

*p*< 0.005, sign-rank test).

*n*= 42:

*r*= 0.87 for model with across terms;

*r*= 0.88 for model with within terms,

*p*< 0.00005 in both cases). Thus, the incongruence effect arises from both factors.

*r*= 0.55,

*p*< 0.00005 for global search distinctiveness; and

*r*= 0.036,

*p*= 0.55 for local search distinctiveness; Figure 7A). Likewise, local distinctiveness estimated in the same-different task was correlated only with local search distinctiveness but not global distinctiveness (

*r*= 0.35,

*p*< 0.05 for local search distinctiveness; and

*r*= 0.05,

*p*= 0.76 for global search distinctiveness; Figure 7B).

*Vision Research,*54, 20–30. [CrossRef]

*Journal of Mathematical Psychology,*38, 423–466. [CrossRef]

*Proceedings Biological Sciences,*282, 20142384.

*Journal of Memory and Language,*59, 390–412. [CrossRef]

*Neuropsychologia,*44, 110–129. [CrossRef]

*Brain and Cognition,*11, 37–49. [CrossRef]

*Journal of Experimental Psychology Animal Behavior Processes,*27, 3–16. [CrossRef]

*Psychological Review,*96, 433–458. [CrossRef]

*Nature,*382, 626–628. [CrossRef]

*Scientific Reports,*7, 17462. [CrossRef]

*Neuroscience and Biobehavioral Reviews,*32, 311–329. [CrossRef]

*Scientific Reports,*8, 324. [CrossRef]

*Psychonomic Bulletin & Review,*25, 1365–1372. [CrossRef]

*Review of Educational Research,*42, 237–288. [CrossRef]

*Neuron,*36, 299–308. [CrossRef]

*Human Brain Mapping,*22, 321–328. [CrossRef]

*Neuroimage,*17, 1290–1299. [CrossRef]

*Perception & Psychophysics,*43, 189–198. [CrossRef]

*Psychological Bulletin,*112, 24–38. [CrossRef]

*Journal of Experimental Psychology Human Perception and Performance,*24, 1105–1118. [CrossRef]

*Psychological Science,*16, 282–290. [CrossRef]

*Cognitive Neuropsychology,*25, 730–744. [CrossRef]

*Frontiers in Psychology,*4, 863. [CrossRef]

*Perception & Psychophysics,*47, 489–496. [CrossRef]

*Nature,*442, 572–575. [CrossRef]

*Journal of Vision,*19, 12. [CrossRef]

*Review of Educational Research,*66, 579–619.

*Frontiers in Psychology,*6, 1171. [CrossRef]

*Experimental Brain Research,*144, 136–139. [CrossRef]

*The*

*Quarterly Journal of Experimental Psychology. A, Human Experimental Psychology,*55, 289–310. [CrossRef]

*Journal of Vision,*12, 19. [CrossRef]

*Psychonomic Bulletin and Review,*8, 454–469. [CrossRef]

*Cognitive Psychology,*9, 353–383. [CrossRef]

*Journal of Experimental Psychology Human Perception and Performance,*9, 955–965. [CrossRef]

*Cognitive Psychology,*34, 72–107. [CrossRef]

*Animal Cognition,*17, 869–877. [CrossRef]

*Acta Psychologica (Amsterday),*127, 1–11. [CrossRef]

*Journal of Vision,*14, 1–20. [CrossRef]

*Journal of Vision,*16, 8. [CrossRef]

*Psychological Science,*29, 95–109. [CrossRef]

*Educational Research Review,*6, 135–147. [CrossRef]

*Cognitive Psychology,*23, 299–330. [CrossRef]

*Current Biology (CB),*21, 334–337. [CrossRef]

*Ecology,*74, 1638–1645. [CrossRef]

*Journal of Cognitive Neuroscience,*14, 187–198. [CrossRef]

*Neuropsychologia,*40, 1173–1186. [CrossRef]

*Neuropsychology,*29, 888–894. [CrossRef]

*Journal of Neuroscience,*29, 7788–7796. [CrossRef]

*Journal of Vision,*16, 3. [CrossRef]

*Neuroreport,*11, 2881–2884. [CrossRef]

*Nature Neuroscience,*5, 682–687. [CrossRef]

*Journal of Vision,*13, 1–24. [CrossRef]

*Vision Research,*51, 1741–1750. [CrossRef]

*Journal of Experimental Psychology Human Perception and Performance,*15, 419–433. [CrossRef]