**Abstract**:

**Abstract**
**When multiple objects are in close proximity, observers have difficulty identifying them individually. Two classes of theories aim to account for this crowding phenomenon: spatial pooling and spatial substitution. Variations of these accounts predict different patterns of errors in crowded displays. Here we aim to characterize the kinds of errors that people make during crowding by comparing a number of error models across three experiments in which we manipulate flanker spacing, display eccentricity, and precueing duration. We find that both spatial intrusions and individual letter confusions play a considerable role in errors. Moreover, we find no evidence that a naïve pooling model that predicts errors based on a nonadditive combination of target and flankers explains errors better than an independent intrusion model (indeed, in our data, an independent intrusion model is slightly, but significantly, better). Finally, we find that manipulating trial difficulty in any way (spacing, eccentricity, or precueing) produces homogenous changes in error distributions. Together, these results provide quantitative baselines for predictive models of crowding errors, suggest that pooling and spatial substitution models are difficult to tease apart, and imply that manipulations of crowding all influence a common mechanism that impacts subject performance.**

*w*is the width of the letter in degrees visual angle, and

*E*is the eccentricity in degrees visual angle. The parameters for this equation were obtained by averaging the superior and inferior visual field expressions reported in Carrasco and Frieder (1997) (which were in turn obtained by averaging estimates from Rovamo & Virsu, 1979 and Virsu & Rovamo, 1979). This yielded letter widths of roughly 0.53°, 0.61°, 0.68°, 0.75°, 0.83°, and 0.90° visual angle for eccentricities of 5°, 6°, 7°, 8°, 9°, and 10° visual angle, respectively. Thus, further in the periphery the letters were larger, and their spacing in degrees of visual angle increased (13.125° of arc for eccentricities of 5°, 6°, 7°, 8°, 9°, and 10° visual angle corresponds to arc lengths of 1.2°, 1.4°, 1.6°, 1.8°, 2.1°, and 2.3° of visual angle, respectively). The increasing spacing and increasing letter size roughly canceled out, such that letters subtended roughly 40% of the center-center distance between letters.

*spatial weighting function*, describing how much flankers at different distances from the target influence target reports and the

*letter confusion matrix*, describing which letters tend to be more or less orthographically confusable with which other letters. Once we have the spatial weighting function and letter confusion matrix, we consider different response models that combine spatial and orthographic confusion differently to yield responses. We end up with eight error models that capture different possible crowding mechanisms.

^{1}The Laplacian distribution is analogous to a Gaussian probability distribution, but it uses the absolute value rather than the square of distance. As we use it, this weighting function has one parameter: the Laplacian scale parameter,

**b**, which determines how steeply the weighting function drops off with distance (we write

**b**in bold to indicate that it will be a free parameter for a number of error models).

*x*(with zero being the target, negative one and one being the two immediate flankers, and so on) as

*P*(

_{L}*x*|

**b**), where

**b**is the Laplacian scale parameter. Because we are interested in discrete positions, this weighting function is defined in terms of the Laplacian cumulative density function (Ψ

*), assigning each letter position a weight based on the probability mass in the interval around that letter position: The Laplacian cumulative density function (with a fixed mean of zero, centered on the target), is given by: If the scale parameter is particularly small (*

_{L}**b**≤ 0.1), the Laplacian spatial weighting function will place all weight on the target (

*P*(0|

_{L}**b**≤ 0.1) > 0.99). If the scale parameter is particularly large (

**b**≥ 10), this function distributes weight nearly uniformly over the presented items (

*P*(0|

_{L}**b**≥ 10) < 0.135. For a perfectly uniform distribution,

*P*(0) = 1/9 = 0. $11\xaf$).

_{L}*i*to be confused with an alternate letter

*j*?

*i*, and the response was letter

*j*we incremented by one the (

*i*,

*j*)th entry of the confusion matrix. Thus, the

*i*th row of the empirical confusion matrix corresponds to the empirical counts of intrusions every time the target was letter

*i*. This procedure yielded a total of 17,571 observations, which were roughly equally distributed per target letter (mean count per target letter: 676, standard deviation: 49, minimum count: 605, maximum count: 776). These raw counts were adjusted by adding one to every cell (thus eliminating zeros from the probabilities); in practice only two cells contained zero observations: reporting the letter “I” when the target was a “Q,” and reporting “I” when the target was a “Y.” We normalized these counts so that each row summed to one, thus obtaining a confusion matrix in which each row reflects the conditional probability of reporting each of 26 letters given a specific target letter. This confusion matrix is shown in Figure 5 and is available online.

^{2}

*χ*

^{2}[25] = 670,

*p*< 0.001). Second, when the correct target is not reported, observers do not guess randomly—some letters tend to be confused with specific other letters. For instance, when the correct target is a “C,” then “G” and “O” are reported about 13% and 12% of the time, as compared to “I,” which is only reported 0.5% of the time. For every target letter, the frequency with which the 25 other letters were erroneously reported was significantly different from a uniform distribution (smallest

*χ*

^{2}[24] = 58.6,

*p*< 0.001—for “A”—all other letters had larger chi-squared values and correspondingly smaller

*p*values).

*direct spatial substitution*: each letter is reported with a frequency proportional to its weight, (b)

*letter confusion*: each letter may be confused with similar-looking letters, and (c)

*multiplicative combination*: reported letters tend to be those that are similar to multiple adjacent letters (a naïve pooling model). For each class of response models, we write out a probability model describing the probability that a letter

*j*will be reported given the presented letter array .

*j*(indexed from one to 26) on a given trial can be written: where (

*x*) is the letter in location

*x*(

*x*= 0 for the target, and the presented array spans

*x*= −4 to

*x*= 4), and

*δ*((

*x*) ≡

*j*) is 1 when (

*x*) is the letter

*j*(that is, when position

*x*contains the letter

*j*) and zero otherwise.

**p**) and (b) the spread of the spatial weighting function (

**b**). This implicit model is adopted by proponents and opponents of spatial substitution accounts of crowding. When errors are made, they are likely to be simple substitutions of flankers for the target.

**p**= 1), this model reduces to completely random guesses. We treat this pure random guessing case as the baseline for all model fits: It predicts a constant likelihood of 1/26 for every trial. Second, when the spread of the spatial weighting function approaches zero (

**b**< 0.01, i.e., spread is very small), this model reduces to a simple mixture of

**p**proportion of random guesses and (1 −

**p**) proportion of correct answers. We consider this mixture of correct and random to be our second baseline, which captures variation in difficulty/accuracy across conditions/subjects but assigns equal probability to every error.

*Q*[

*i*,

*j*], described in the previous section), which describes how often a presented letter

*i*is confused with letter

*j*. Under this model, the spatial weighting function determines the linear combination of presented letters, and the response distribution is given by a weighted mixture of the corresponding rows from the confusion matrix.

*δ*((

*x*) ≡

*j*) expression (saying that only the target and flankers themselves may be reported) with (

*Q*(

*x*),

*j*), which says that not only may flankers be reported but also letters similar to the target and flankers may be reported.

*p*; otherwise, with probability (1 −

*p*), it is a response drawn from an additive mixture of the rows corresponding to the nine presented letters (weighted by the spatial weighting function parameterized by

**b**). The rows of the letter confusion matrix are raised to an exponent

**q**(between 0.3 and three). This exponent

**q**can be thought of as the Luce-choice exponent for the confusion matrix (Luce, 1959). When it is large, predicted confusions are minimal and subjects tend to report the intended letter. When it is small, the probability of similar letters being reported increases. Note that if a given row from the letter confusion matrix is exponentiated, then this conditional probability must be renormalized by dividing it by the sum of all possible letters that might have been reported (

*j*′).

**p**= 1, then we again obtain the purely random guessing model. The same random guessing model may be recovered if the confusion matrix exponent is particularly low, thus making the confusion matrix effectively uniform (e.g.,

**q**< 0.01; even for

**q**< 0.3 deviations from uniformity are insubstantial). As

**q**becomes particularly large (larger than three or so), this model approximates the direct spatial substitution model described above, because in this case, only the diagonal of the confusion matrix will have any substantial report probability.

**b**< 0.01 (the scale of the spatial weighting function is small) and

**p**= 0 (the proportion of random guessing is zero), then we obtain a simple target-letter confusion model in which flankers play no role in determining errors, and all errors correspond only to orthographic confusions with the target letter. Thus, the effect of crowding under such a model is to exponentiate the independent target-letter confusion matrix. We consider this the simple target-letter confusion error model. We may also consider a model where

**b**< 0.1 (so flankers have no influence) but

**p**> 0 (so there is some chance of uniform random guessing). This yields a model that is a mixture of random guesses and confusions with the target letter itself. We find that there is no noticeable advantage (in terms of fit to our data) gained by including this random guessing parameter, so we prefer the simple target-letter confusion model. In general we find that the random guessing parameter may be dropped from the full substitution/confusion model because a model with

**p**= 0 does as well as models where

**p**is variable. We suspect this is because a confusion matrix raised to a low exponent captures uniform random guesses as effectively as a separate random guessing parameter; thus, there is no need for the additional complication.

**p**), the spread of the spatial weighting function (

**b**), and the exponent of the confusion matrix (

**q**). As in all other models, as the probability of random guessing goes to one, this model produces simple random guessing. This is also the limit reached if the confusion matrix exponent (

**q**) falls below 0.01. As the spatial spread parameter falls below about 0.1, this model is indistinguishable from the target-letter confusion model because the nontarget letters have effectively zero weight and so they do not contribute to the multiplicative mixture.

**p**= 0).

**p**) and a random (uniform) guess with probability

**p**. Although several response models collapse to this pure mixture of correct and random guesses, we define this error model as the limiting case of the direct spatial substitution model (e.g., Equation 4) when the scale of the spatial weighting function is near zero (

**b**= 0.01).

**p**and otherwise is an exact report of one of the presented letters, with probability proportional to the spatial weighting function. We define this model as Equation 4 with

**p**< 1, and

**b**> 0.1 but note that either of the response models using the letter confusion matrix models reduce to direct spatial substitution when

**q**> 3.

**q**. We define this as the limiting case of the spatial substitution plus letter confusion (Equation 5) model when there is no random guessing (

**p**= 0) and the spatial scale is near zero (

**b**= 0.01) (note that this is also the limiting case of the multiplicative combination model when

**p**= 0 and

**b**< 0.01).

**p**< 1,

**b**> 0.1, and 3 >

**q**> 0.3.

**p**= 0,

**b**> 0.1, and 3 >

**q**> 0.3).

**p**< 1,

**b**> 0.1, and 3 >

**q**> 0.3.

**p**= 0,

**b**> 0.1, and 3 >

**q**> 0.3).

**p**= 0.2,

**b**= 0.5,

**q**= 1.2 would be 0.2535 (see Appendix A for a more thorough description of how this number is calculated). We multiply all trial likelihoods obtained this way for each trial for a given subject in a given condition to obtain the net likelihood of a given set of parameters for a given subject in a given condition. We then find (using numerical optimization) the maximum likelihood set of parameters for a given subject: where

*T*is the set of all trials in the condition of interest for the particular subject,

*t*is a specific trial,

^{(t)}is the set of letters presented on that specific trial, and

*j*

^{(t)}is the response given on that trial.

**p**; that is, the probability that the correct letter will be reported. For the target-letter confusion model, accuracy is a function of the confusion matrix exponent (

**q**). When

**q**approaches 3 (10

^{0.5}), the model describes responses that are nearly always the correct letter. When

**q**approaches 0.3 (10

^{−0.5}), the model describes responses that are effectively uniform random guessing.

*t*(107) = 15,

*p*< 0.001;

*r*

^{2}= 0.65 for the full model repeated-measures regression including a random subject intercept. In Experiment 2, for every additional degree of eccentricity, accuracy decreases by 6.6%, 95% confidence interval on the slope: [−0.082, −0.052],

*t*(74) = 8.8,

*p*< 0.001;

*r*

^{2}= 0.47 for the full repeated-measures regression. In Experiment 3, every additional millisecond of precueing duration increases accuracy by 0.3%, 95% confidence interval on the slope: [0.0023, 0.0037],

*t*(77) = 8.6,

*p*< 0.001;

*r*

^{2}= 0.45 for the full repeated-measures regression. In short, accuracy as measured by the probability of reporting the correct answer changes as predicted, and the trends seen in Figure 6 are highly significant.

_{10}(

**q**), i.e., the log of the confusion matrix exponent from the target-letter confusion model (Equation 11).

^{3}The target-letter confusion model has an advantage over a correct/random mixture model in capturing the specific kinds of errors that observers make. For the present analysis, we focus only on the fact that the confusion matrix exponent in the target-letter confusion model effectively captures changes in accuracy: When

**q**= 1 (i.e., log

_{10}(

**q**) = 0), accuracy is 35%, on average; when

**q**= 2 (i.e., log

_{10}(

**q**) = 0.3), accuracy is about 80%; when

**q**= 0.5 (i.e., log

_{10}(

**q**) = −0.3), accuracy is about 14%. In Experiment 1a, the log

_{10}of the confusion matrix exponent (

**q**) increased by 0.028 for every extra degree of arc in spacing, 95% confidence interval on the slope: [0.024, 0.032],

*t*(107) = 14.75,

*p*< 0.001;

*r*

^{2}= 0.63 for the full repeated-measures regression. In Experiment 2, for every additional degree of eccentricity, the log

_{10}confusion matrix exponent decreases by 0.058, 95% confidence interval on the slope: [−0.072, −0.044],

*t*(74) = 8.2,

*p*< 0.001;

*r*

^{2}= 0.43 for the full repeated-measures regression. In Experiment 3, every additional millisecond of precueing duration increased the log

_{10}confusion matrix exponent by 0.0024, 95% confidence interval on the slope: [0.0018, 0.0029],

*t*(77) = 8.6,

*p*< 0.001;

*r*

^{2}= 0.45 for the full repeated-measures regression. To summarize: Accuracy as measured by the confusion matrix exponent changes as predicted, and the trends seen in Figure 6 are highly significant.

*L*(

*s*,

*c*))—a measure of how well the model could account for the responses in all trials of a particular subject (

*s*) in a given condition (

*c*). Because different subject-condition combinations contained different numbers of trials (

*n*),

*L*(

*s*,

*c*) is not comparable across subjects or conditions. To obtain an interpretable measure we calculate the log likelihood per trial (log

_{10}

*L*(

*s*,

*c*)/

*n*) for each subject condition. This log likelihood per trial is invariant to the number of trials that went into a particular subject-condition combination and simply reflects the average model fit for all trials in that condition (for comparison: a random guessing predicts a constant log likelihood per trial of −1.415, and a model that predicts the exact response on every trial has a constant log likelihood of zero). Even raw log likelihood per trial is minimally informative, because what is relevant are differences between models; thus, we consider the gain in average log likelihood per trial compared to different baselines. Figure 7 (top) uses completely random guessing as a baseline. Completely random guessing assigns each trial a log likelihood of −1.415, insofar as a model can better capture subjects' responses, the log likelihood per trial will be higher (less negative), and there will be some positive gain in log likelihood per trial. We calculate this gain above the random guessing baseline for each subject-condition combination (so as to factor out across-subject variability in performance). Figure 7 (top) shows how well the various models perform in terms of improvement in log likelihood per trial compared to a baseline of pure random guessing.

**p**), and every other response has probability

**p**/26. This log likelihood per trial gain of a more complicated model over the correct/random mixture model indicates that the more complicated model can capture not only the changes in error rate as a function of condition but also what kinds of errors are made by participants.

- In particularly difficult conditions, all models perform poorly. The log likelihood per trial gain above a purely random guessing model is minimal (Figure 7, top). This occurs because when subjects make more mistakes, the entropy of their guesses increases, which causes all models to have relatively low likelihoods (even though they might predict specific errors a bit better than purely random guessing). As the conditions become easier, all models that can capture variation in difficulty exhibit roughly the same advantage over the random guessing baseline (including even the simple second baseline model, a mixture of correct and random guesses).
- The direct spatial substitution model (which predicts intrusions from adjacent positions) provides some benefit over the correct/random mixture model, indicating that observers tend to substitute flankers directly for the target—this can be seen by comparing the purple data points to the baseline (zero) in Figure 7 (bottom). This is seen both in the log
_{10}likelihood per trial gain averaged across all conditions (mean: 0.0458,*SD*: 0.0127) and in across-subject*t*tests within each condition (across the 20 conditions, the smallest*t*value is 3.07, with a corresponding*p*= 0.009; all other conditions had higher*t*values^{4}). The overall advantage of direct spatial substitution over a simple correct/random mixture can be tested via a likelihood ratio test that aggregates likelihoods over all subjects and conditions and accounts for the extra parameters in the spatial substitution model. The spatial substitution model (total loglikelihood = −43,653_{e}^{5}with 614 parameters—two per subject per condition) has a higher likelihood than the correct/random mixture (total loglikelihood = −45,537, with 307 parameters) with a very significant likelihood ratio test (_{e}*X*^{(2)}(307) = 3768,*p*≈ 0). This indicates that subjects' errors do indeed tend to come from adjacent letters, and a model that does not account for this cannot fit the error distributions as well as one that does. - However, the advantage of the spatial substitution model over the correct/random mixture is quite small compared to the further improvement in fit gained by further taking the confusion matrix into account in either the spatial substitution plus letter confusion model (average across conditions log
_{10}likelihood/trial gain: 0.0766,*SD*= 0.024; likelihood ratio test:*X*^{(2)}(650) = 5822,*p*≈ 0, the comparison between purple and red points in Figure 7, bottom), or in a multiplicative letter combination model (average log_{10}likelihood/trial gain: 0.07,*SD*= 0.028; likelihood ratio test:*X*^{(2)}(650) = 5822,*p*≈ 0, the comparison between purple and blue points in Figure 7). - Similarly, taking the letter confusion matrix into account provides considerable advantage over a simple correct/random mixture model (average log
_{10}likelihood/trial gain: 0.078,*SD*= 0.0254—comparison between green points and baseline in Figure 7, bottom). But again, there is a considerable further benefit of adding spatial weighting to the model, which means that responses are not based on the target alone but incorporate influences from adjacent letters. Again, this is the case both for the spatial substitution plus letter confusion model (log_{10}likelihood/trial gain: mean = 0.044,*SD*= 0.014; likelihood ratio test:*X*^{2}(307) = 3662,*p ≈*0—comparison between green and red points) and multiplicative combination models (log_{10}likelihood/trial gain: mean = 0.038,*SD*= 0.017; likelihood ratio test:*X*^{2}(307) = 3126,*p ≈*0—comparison between green and blue points).

- For models that employ both spatial substitution and letter confusion, there is no substantial difference between those with and without a random guessing parameter. Comparing the spatial substitution plus letter confusion model with a random guessing parameter to that without (comparison between the pairs of red points in each condition in Figure 7), we find little difference between them. The average log
_{10}likelihood/trial gain is 0.0014,*SD*: 0.0020 from adding a random guessing parameter (although the pairwise*t*test in one condition is significant, the log-likelihood ratio test shows no effect;*X*^{2}(307) = 101,*p*≈ 1). Comparing the multiplicative combination model with a random guessing parameter to one without yields similar results (comparison between pairs of blue dots in each condition in Figure 7); the average log_{10}likelihood/trial gain from the random guessing parameter is 0.0034,*SD*: 0.0027, and the likelihood ratio test shows no effect (*X*^{2}(307) = 266,*p*= 0.95). This result captures the intuition that the confusion matrix, as we have defined it, captures systematic intrusions and can capture uniform random guessing to the degree that it exists. - There is no stable difference between the spatial substitution plus letter confusion model and the multiplicative combination model (comparison between red and blue points in Figure 7). While in some conditions in Experiment 2 the multiplicative model better describes the error data (in 3/6 conditions pairwise
*t*> 2 and*p*≤ 0.05), in most conditions of Experiment 1a the spatial substitution plus letter confusion model is a better description (in 6/7 conditions*t*> 2 and*p*≤ 0.05). The average log-likelihood/trial gain of the spatial substitution plus letter confusion model is small and variable (across condition mean: 0.0063,*SD*: 0.0133). That said, if we pool likelihoods over all conditions and all experiments to compare these models via a likelihood ratio test with 1° of freedom, we find there is a highly significant advantage to the spatial substitution plus letter confusion model (*X*^{2}(1) = 535,*p*≈ 0). However, because this advantage is neither large nor stable across conditions, we do not believe it should be taken seriously. These results reinforce the point made in the Introduction: When put on an equal footing, spatial substitution and pooling models are difficult to tease apart.

_{10}likelihood: −4935) and is slightly, but very significantly (based on a likelihood ratio test), better than the multiplicative model (total log

_{10}likelihood: −4957;

*X*

^{2}(1) = 55.48,

*p*< 0.001). Note that no new parameters were fit to this subset of data. We used the parameters from the fit to all the trials for a given subject condition but considered the likelihood of only a subset of trials.

*p*value (which should be high if variance does not differ from chance). If we can be reasonably confident that the variance of the residuals along the second principal component is indistinguishable from one, then we have evidence (albeit evidence by failing to reject the null) that all the variation in accuracy across conditions boils down to a single dimension.

*p*value for a chi-squared test to see if this variance is greater than one is not significant,

*χ*

^{2}(20) = 23,

*p*= 0.28. In contrast, the 95% posterior confidence interval on the variance of studentized residuals along the first principal component is [9.02, 32.93], significantly greater than one by any measure. Thus, when error distributions are summarized in terms of the breadth of the spatial weighting function and the rate of random guessing in a direct spatial substitution model, changes in these parameters, regardless of how we manipulate difficulty, fall along a single line.

**q**, the confusion matrix exponent. In all experiments, conditions that increase difficulty of the trial also increase the breadth of the spatial weighting function and decrease the confusion matrix exponent (bringing the confusion matrix closer to uniformity). We find that collapsing over experiments and conditions, parameters again appear to fall on one dimension: (a) 95% intervals on the variance along the second principal component spans one [0.61, 2.25], (b) the probability that this variance is greater than one is low (0.62), and (c) the chi-squared test comparing this variance to one is not significant,

*χ*

^{2}(20) = 22.64,

*p*= 0.31. As should be clear from the graph, variance along the primary principal component is much greater than one (95% confidence interval: [7.4, 27.3]). Thus, if we use the spatial substitution plus letter confusion model, we again find that all manipulations of difficulty appear to have effects that can be described by one dimension.

*χ*

^{2}(20) = 31.31,

*df*= 20,

*p*= 0.051. As should be clear from the graph, variance along the primary principal component is much greater than one (95% confidence interval: [6.9, 26]). Although the results under the multiplicative combination model are less persuasive (presumably because of the correlation in parameter estimates orthogonal to the principal component), we again find little evidence that different manipulations of difficulty affect error distributions in different ways.

*Trends in Cognitive Sciences*, 15 (3), 122–131. [CrossRef] [PubMed]

*Vision Research*

*,*16 (1), 71–78. [CrossRef] [PubMed]

*Journal of Vision*, 9 (12): 13, 1–18, http://www.journalofvision.org/content/9/12/13, doi:10.1167/9.12.13. [PubMed] [Article] [CrossRef] [PubMed]

*Journal of Vision*, 11 (8): 1, 1–16, http://www.journalofvision.org/content/11/8/1, doi:10.1167/11.8.1. [PubMed] [Article] [CrossRef] [PubMed]

*Nature*, 226 (4), 177–178. [CrossRef] [PubMed]

*Vision Research*

*,*13 (4), 767–782. [CrossRef] [PubMed]

*Vision Research*, 37 (1), 63–82. [CrossRef] [PubMed]

*Journal of Vision*, 7 (9): 338, http://www.journalofvision.org/content/7/9/338, doi:10.1167/7.9.338. [Abstract] [CrossRef]

*Psychological Research*, 44 (1), 51–65. [CrossRef] [PubMed]

*Archives of Ophthalmology*, 49 (4), 431. [CrossRef] [PubMed]

*Acta Opthalmologica*, 14 (4), 56–63.

*Journal of Experimental Psychology*, 80 (2), 254–261. [CrossRef] [PubMed]

*Attention, Perception,*&

*Psychophysics*

*,*12 (1), 5–8. [CrossRef]

*Attention, Perception, & Psychophysics*

*,*16 (1), 143–149. [CrossRef]

*The American Journal of Psychology*

*,*83 (3), 330–342. [CrossRef]

*Attention, Perception,*&

*Psychophysics*

*,*19 (1), 1–15. [CrossRef]

*Psychonomic Science*

*,*25 (2), 77–80. [CrossRef]

*Journal of the Optical Society of America*, 53 (9), 1026–1032. [CrossRef] [PubMed]

*Attention, Perception,*&

*Psychophysics*

*,*74 (2), 379–396. [CrossRef]

*Nature Neuroscience*, 14 (9), 1195–1201. [CrossRef] [PubMed]

*Perception*, 15 (2), 119. [CrossRef] [PubMed]

*Sociological Methodology*

*,*25

*,*165–174. [CrossRef]

*Journal of Experimental Psychology: Human Perception and Performance*, 10 (5), 655. [CrossRef] [PubMed]

*Proceedings of the National Academy of Sciences*, 106 (31), 13130–13135. [CrossRef]

*The probabilistic mind: Prospects for Bayesian cognitive science*(pp. 303–328). Oxford, UK: Oxford University Press.

*Nature*

*,*383 (6598), 334–337. [CrossRef] [PubMed]

*Visual Cognition*

*,*9 (7), 889–910. [CrossRef]

*Zeitschrift für Psychologie*

*,*93 (3), 17–82.

*Attention, Perception,*&

*Psychophysics*

*,*21 (3), 269–279. [CrossRef]

*Vision Research*

*,*48 (5), 635. [CrossRef] [PubMed]

*Vision Research*

*,*40 (8), 973–988. [CrossRef] [PubMed]

*Individual choice behavior a theoretical analysis*. New York, NY: John Wiley and Sons.

*Perceptual*&

*Motor Skills*

*,*36 (3), 777–778. [CrossRef]

*Acta Psychologica*

*,*139, 19–37, doi:10.1016/j.actpsy.2011.09.014. [CrossRef] [PubMed]

*Nature Neuroscience*, 4 (7), 739–744. [CrossRef] [PubMed]

*Journal of Vision*, 4 (12): 12, 1136–1169, http://www.journalofvision.org/content/4/12/12, doi:10.1167/4.12.12. [PubMed] [Article] [CrossRef]

*Nature Neuroscience*, 11 (10), 1129–1135. [CrossRef] [PubMed]

*Journal of Vision*, 7 (2): 20, 1–36, http://www.journalofvision.org/content/7/2/20, doi:10.1167/7.2.20. [PubMed] [Article] [CrossRef] [PubMed]

*International Journal of Computer Vision*, 40 (1), 49–70. [CrossRef]

*Quarterly Journal of Experimental Psychology*, 32 (1), 3–25. [CrossRef] [PubMed]

*Experimental Brain Research*, 37 (3), 495–510. [PubMed]

*Nature*, 271 (5640), 54–56. [CrossRef] [PubMed]

*Journal of Vision*, 5 (11): 8, 1024–1037, http://www.journalofvision.org/content/5/11/8, doi:10.1167/5.11.8. [PubMed] [Article] [CrossRef]

*Attention, Perception,*&

*Psychophysics*

*,*49 (6), 495–508. [CrossRef]

*Journal of Vision*, 13 (1): 24, 1–20, http://www.journalofvision.org/content/13/1/24, doi:10.1167/13.1.24. [PubMed] [Article] [CrossRef] [PubMed]

*American Journal of Ophthalmology*

*,*53 (471), 163–169.

*Perception*&

*Psychophysics*

*,*12 (1), 97–99. [CrossRef]

*Perception*&

*Psychophysics*

*,*9 (1A), 40–50. [CrossRef]

*Cognitive Psychology*, 12 (1), 97–136. [CrossRef] [PubMed]

*Experimental Brain Research*, 37 (3), 475–494. [PubMed]

*Vision Research*, 46 (3), 417–425. [CrossRef] [PubMed]

*Journal of Experimental Psychology: General*

*,*138 (4), 546. [CrossRef] [PubMed]

*Psychological Science*, 21 (8), 1168–1175. [CrossRef] [PubMed]

*Vision Research*, 15 (10), 1137–1141. [CrossRef] [PubMed]

*Trends in Cognitive Sciences*

*,*15 (4), 160–168. [CrossRef] [PubMed]

*Journal of the Optical Society of America*, 14 (9), 2057–2068. [CrossRef] [PubMed]

*Psychological Review*, 82 (3), 184. [CrossRef] [PubMed]

*Attention, Perception,*&

*Psychophysics*

*,*33 (2), 129–138. [CrossRef]

*The American Journal of Psychology*, 51 (1), 83–96. [CrossRef]

*Experimental psychology*. New York, NY: Holt.

^{1}Although it is possible to define distance with respect to the weighting function in terms of angle of arc or degrees visual angle, it is not necessary for our analysis, and it would require a few additional assumptions. Moreover, these physical measures of distance would be useful if the breadth of a spatial weighting function defined over them remained constant across conditions, but that is not the case.

^{4}Note that the error bars in Figure 7 correspond to across-subject standard errors of the mean, while the comparisons are done within subjects; hence the error bars are larger than the relevant standard errors of the difference.

**p**= 0.2,

**b**= 0.5,

**q**= 1.2 would be 0.2535: Using the appropriate “A”, “B”, … , “I” rows of our confusion matrix $Q$ and the Laplacian spatial weighting function described in Equation 2, we obtain the following: Thus, the likelihood of the subject reporting an “E”, on this trial, with parameters

**p**= 0.2,

**b**= 0.5,

**q**= 1.2, is 0.2535.

*x*and

*y*denote the two first parameters,

*i*is the experimental condition (with conditions concatenated across experiments), and

*j*is the index over all subjects contributing in that experimental condition.

*x*and

*y*for each condition across subjects, thus obtaining the mean (vector) parameter estimates for the

*i*th condition, denoted

*μ*

^{(i)}, and the across-subject covariance of parameter estimates

**∑**

^{(i)}. The standard error of the mean is given by dividing the covariance of parameter estimates by the number of subjects: $ \u2211 \mu ( i ) $ = ∑

^{(i)}/

*n*.

*i*) of the mean parameters for each condition (

*μ*

^{(i)}). This yields a vector of component loadings

**C**(describing the orientation of each principal component in the original parameter space) and the mean vector within that coordinate frame (expressed as deviations from the grand mean along the two components),

*ν*

^{(i)}.

**C**

^{T}∗ $ \u2211 \mu ( i ) $ ∗

**C**. Thus, we obtain standard errors of the mean (for each condition) within the principal component coordinate frame.

*z*-scored) deviations of the means along the two principal components. These studentized residuals in principal component space can be analyzed in two ways: either using the Bayesian posterior distribution over their covariance (to obtain credible intervals and estimates of the magnitude of the variance along the second component) or via frequentist chi-squared tests to assess whether variation along the second component is significantly different from one. If we can be reasonably confident that the variance of the residuals along the second principal component is indistinguishable from one, then we have evidence (albeit evidence by failing to reject the null), that all the variation in accuracy across conditions boils down to a single dimension.

- (a) posterior credible intervals on the variance along the second principal component.
- (b) the posterior probability that the variance along the second principal component is greater than one.
- (c) a credible interval on the ratio of the variance along the first principal component divided by the variance along the second.

*z*scores sampled from a standard normal distribution. That is what the studentized residuals along the second principal component should be under the null hypothesis of zero variation (other than error) along this component. Thus, we can calculate a chi-squared statistic as

*χ*

^{2}= $ \u2211 1 n ( z 2 ( i ) ) 2 $, and under the null hypothesis this should follow a chi-squared distribution with

*n*degrees of freedom (

*df*=

*n*). We can obtain a

*p*value by assessing the upper tail probability of this chi-squared distribution (the probability of seeing a chi-squared as large as ours or larger under the null hypothesis that variation along the second principal component arises only from sampling error).

- In particularly difficult conditions, all models perform poorly. The log-likelihood gain over the random guessing baseline is much lower for closer spacings (Figure 11 top), although in this experiment, all conditions are easier due to the longer precue.
- Unlike Experiments 1a, 2, and 3, more sophisticated error models provide a smaller and less consistent advantage over a simple correct/random mixture model. We believe this is the case because subjects' performance in Experiment 1b is much higher, therefore there are fewer errors that can be used to inform different error models.
- Nonetheless, the direct spatial substitution model provides a significant benefit over the correct/random mixture model, indicating that observers tend to substitute flankers directly for the target (the comparison between the purple data points to the baseline (zero) in Figure 11, bottom). This is evident both in the log
_{10}likelihood per trial gain averaged across all conditions (mean: 0.027,*SD*: 0.0077) and in across-subject*t*tests within 6/7 conditions. The overall advantage of direct spatial substitution over a simple correct/random mixture can be tested via a likelihood ratio test that aggregates likelihoods over all subjects and conditions and accounts for the extra parameters in the spatial substitution model. The spatial substitution model (total loglikelihood = −8542 with 196 parameters) has a higher likelihood than the correct/random mixture (total log_{e}likelihood = −8936 with 98 parameters), with a very significant likelihood ratio test (_{e}*X*^{2}(98) = 787,*p*≈ 0). - However, there is a further advantage over the direct spatial substitution model of models that take into account the confusion matrix for the spatial substitution plus letter confusion model (comparison between purple and red points;
*X*^{2}(650) = 301,*p*≈ 0 for the spatial substitution model with no random guessing parameter). For the multiplicative combination model, the advantage over direct spatial substitution is less reliable (comparison between purple and blue points in Figure 11). The full multiplicative combination model slightly outperforms direct spatial substitution (*G*^{2}= 53,*df*= 748,*p*≈ 0), but the multiplicative combination model without random guessing has a significantly lower net log likelihoods (−8750) than the direct substitution model (−8542). - Similarly, taking the letter confusion matrix into account provides considerable advantage over a simple correct/random mixture model: The average log
_{10}likelihood/trial gain: 0.0157,*SD*= 0.0065—comparison between green points and baseline in Figure 7 bottom; (*G*^{2}= 84,*df*= 650,*p*≈ 0). But again, there is a considerable further benefit of adding spatial weighting to the model so that responses are not based on the target alone but incorporate influences from adjacent letters. Again, this is the case both for the spatial substitution plus letter confusion model (*G*^{2}= 1005,*df*= 98,*p*≈ 0— comparison between green and red points) and multiplicative models (*G*^{2}= 287,*df*= 98,*p*≈ 0—comparison between green and blue points).

- Consistent with the results of Experiments 1a (and the net results over Experiments 1a, 2, and 3), we see a stable and significant advantage of the spatial substitution plus letter confusion model over the multiplicative combination model (red points compared to blue). The average log-likelihood/trial gain of the spatial substitution plus letter confusion model is small but reliable (across condition mean: 0.0248,
*SD*: 0.0069 for the no random guessing models), all within condition*t*tests significant at*p*≤ 0.037, and the overall comparison is highly significant (*G*^{2}= 718,*df*= 1,*p*≈ 0). This conclusion also holds for versions of these models that include the random guessing parameter.