**Abstract**:

**Abstract**
A visual target is more difficult to recognize when it is surrounded by other, similar objects. This breakdown in object recognition is known as crowding. Despite a long history of experimental work, computational models of crowding are still sparse. Specifically, few studies have examined crowding using an ideal-observer approach. Here, we compare crowding in ideal observers with crowding in humans. We derived an ideal-observer model for target identification under conditions of position and identity uncertainty. Simulations showed that this model reproduces the hallmark of crowding, namely a critical spacing that scales with viewing eccentricity. To examine how well the model fits quantitatively to human data, we performed three experiments. In Experiments 1 and2, we measured observers' perceptual uncertainty about stimulus positions and identities, respectively, for a target in isolation. In Experiment 3, observers identified a target that was flanked by two distractors. We found that about half of the errors in Experiment 3 could be accounted for by the perceptual uncertainty measured in Experiments 1 and2. The remainder of the errors could be accounted for by assuming that uncertainty (i.e., the width of internal noise distribution) about stimulus positions and identities depends on flanker proximity. Our results provide a mathematical restatement of the crowding problem and support the hypothesis that crowding behavior is a sign of optimality rather than a perceptual defect.

*φ*

_{T},

*φ*

_{D}) represents the (one-dimensional) positions of the target and distractor, respectively, and = (

*σ*

_{T},

*σ*

_{D}) the spatial uncertainty at these locations, and

**x**= (

*x*

_{T},

*x*

_{D}) the internal representations of the stimulus positions. The probability of an IC equals the probability of reporting a “B” when the target (or, in case of a PC, the flanker) is an “A,” or vice versa. This is equal to 1 minus the probability of reporting the correct stimulus. We assume that the probability of reporting the correct stimulus follows a scaled and translated cumulative normal distribution as a function of the logarithm of stimulus contrast: where

*μ*

_{IC}and

*σ*

_{IC}are the mean and width of the cumulative normal function and

*c*the contrast of the encoded stimulus (which, as will be shown later, is a good description of human performance; Figure 5). Hence, the probability of an IC can be written as When stimulus contrast is close to 0, the probability of reporting the wrong stimulus is equal to chance (0.5; i.e., the observer must guess); when stimulus contrast is high, the probability of an IC approaches 0.

*c*

_{T}is the contrast of the target. Responses on trials in which a PC has occurred are based on the signal from the flanker. Because there are only two possible stimulus identities and the target and distractor are always different from each other in the experiment that we are modeling, this leads to an error only if the flanker signal is decoded correctly, that is, if no IC occurs. Hence, we get where

*c*

_{F}denotes the contrast of the flanker. The total probability of an error being made is thus:

*c*

_{F}, was set to 20%. (All contrasts throughout the paper are specified as the Weber contrast.) The width of the positional noise distribution was set at 1° for both the target and the flanker, i.e.,

*σ*

_{T}=

*σ*

_{F}= 1°. The two parameters that determine the probability of an identity confusion,

*μ*

_{IC}and

*σ*

_{IC}, were set to 10% and 1%, respectively. Using Equation 1, we computed performance as a function of target contrast,

*c*

_{T}, for a range of target-flanker spacings. From the resulting curves, we estimated the target-contrast thresholds that yield 75% correct performance. Plotting these thresholds as a function of stimulus spacing (Figure 2c) revealed a curve qualitatively similar to those typically found when human subjects perform target identification under conditions of crowding; the identification threshold declines as spacing is increased, up until a “critical” spacing beyond which the identification threshold is more or less stable. A trilinear function was fit to these thresholds (Figure 2c) to estimate the critical spacing, as has been done in previous studies (Pelli et al., 2004; van den Berg et al., 2007).

*φ*

_{T}, yielding an estimate of critical spacing as a function of target eccentricity (Figure 2d). We found that the model produces a critical spacing that scales with eccentricity. Hence, the behavioral hallmark of crowding (Levi, 2008; Pelli et al., 2004) seems to be a qualitative property of any ideal observer who performs target identification in the face of visual uncertainty.

^{2}). The viewing distance was 60 cm. Luminance of the (gray) background screen was 35 cd/m

^{2}. A chin rest was used to minimize head movements.

_{φ}, from a set of 10 positive and 10 negative values (with |Δ

_{φ}| logarithmically spaced). Subsequently, the eccentricities of the left and right stimulus were set to $ \phi L = \phi \u0304 \u2212 1 2 \Delta \phi $ and $ \phi R = \phi \u0304 + 1 2 \Delta \phi $, respectively. (Hence, the left stimulus was closer to fixation on trials with Δ

_{φ}> 0 and the right stimulus on trials with Δ

_{φ}< 0.) To reduce the number of values of Δ

_{φ}at which subjects perform at floor or ceiling performance, the boundaries of the range of |Δ

_{φ}| were automatically updated after each 300 trials. If an observer's mean performance over the last 300 trials was below 70%, then the maximum value was increased by 10%. If the mean performance was above 80%, the minimum value was decreased by 10%. Following this change, a new set of 20 values for Δ

_{φ}was computed.

*φ*

_{L}and

*φ*

_{R}denote the eccentricities of the stimuli on the left and the right, respectively;

*x*

_{L}and

*x*

_{R}denote their internal representations; and

*σ*

_{L}and

*σ*

_{R}denote the trial-to-trial variability over these internal representations.

*σ*) to retinal eccentricity (

*φ*) and contrast (

*c*) (Table 1). Bayesian model comparison (see Appendix for details) revealed that the model that assumes a linear relationship between

*σ*and stimulus eccentricity, and a power-law relationship between

*σ*and stimulus contrast, provides the most likely description of the data. Specifically, this function (Function 2 in Table 1) outperforms the first, third, and fourth functions shown in Table 1, with 116 ± 28, 118 ± 28, and 4.3 ± 0.3 log-likelihood points, respectively. Note that a difference of 4.3 means that one model is exp(4.3) ≈ 74 times more likely than the other, given the dataset.

Function number | Relationship between | Function | |

σ and φ | σ and c | ||

1 | Linear | Linear | σ ( φ , c ) = a φ c |

2 | Linear | Power law | σ ( φ , c ) = a φ c t |

3 | Power law | Linear | σ ( φ , c ) = a φ s c |

4 | Power law | Power law | σ ( φ , c ) = a φ s c t |

*φ*

_{IN},

*φ*

_{T},

*φ*

_{OUT}) denote the locations of the foveal (inward) flanker, target, and peripheral (outward) flanker, respectively;

*x*

_{IN},

*x*

_{T}, and

*x*

_{OUT,}their normally distributed internal representations; and

**c**= (

*c*

_{IN},

*c*

_{T},

*c*

_{OUT}) their contrasts. Then the probability of a PC in Experiment 3 equals where = (

*σ*

_{IN},

*σ*

_{T},

*σ*

_{OUT}) is the trial-to-trial variability in the internal representations of the stimuli as estimated from Experiment 1. Evaluating Equation 3 involves taking the integral of the error function, for which no closed-form solution exists. Therefore, we evaluated this equation numerically using Monte Carlo simulation.

*a*,

*t*,

*μ*

_{IC},

*σ*

_{IC}) is a vector containing all model parameters. Combining both cases, the total probability of an error response in Experiment 3 equals

*a*,

*t*,

*μ*

_{IC},

*σ*

_{IC}), fixed to the values estimated from Experiments 1 and2, we computed the expected proportions of errors and flanker correspondences in Experiment 3. We found that both the proportion of errors and the proportion of flanker correspondences are strongly underestimated (Figure 7a). Specifically, we found that this model explains 44% ± 5.2% of the errors in Experiment 3 (Figure 7b). Hence, whereas the sensory uncertainty measured in Experiments 1 and2 can explain a major part of the crowding effect, it also leaves a large part unexplained.

*a*in the equation that relates stimulus contrast and eccentricity to position uncertainty,

*σ*(

*φ*,

*c*) =

*aφc*, depend on stimulus spacing. We did this using a power law function, because it satisfies the following conditions: (1) For very large spacing, position uncertainty converges to the case without flankers, as measured in Experiment 1; (2) The relation between position uncertainty and stimulus spacing is monotonic, but not otherwise strongly constrained; and (3) The number of additional model parameters is small. Specifically, we let

^{t}*a*be the following function of two parameters,

*α*and

*β*, and target-flanker spacing,

*φ*

_{OUT}–

*φ*

_{T}: where

*a*

_{I}is the value of

*a*for isolated stimuli as estimated from Experiment 1. For negative

*β*≤ 0, positional uncertainty converges to

*a*

_{I}when spacing is large.

*F*[1, 120] = 0.81,

*p*= 0.37). However, this model predicts flanker-correspondence rates that are significantly different from those of the human observers (

*F*[1, 120] = 48.8,

*p*< 0.001). The identification thresholds following from this model (Figure 8, top row, right) are also significantly different from the human observer thresholds (

*F*[1, 104] = 19.0,

*p*< 0.001). These results argue against the hypothesis that the crowding behavior in Experiment 3 was caused solely by an increase in position uncertainty.

*μ*

_{IC}in Equation 4: where

*μ*

_{I}is the value of

*μ*

_{IC}for isolated stimuli, as estimated from Experiment 2. We let the width of the cumulative Gaussian in Equation 4 scale with its mean by fixing the ratio between

*μ*

_{IC}and

*σ*

_{IC}to

*μ*

_{I}/

*σ*

_{I}.

*F*[1, 120] = 0.47,

*p*= 0.49), which suggests that it can accurately account for those rates (Figure 8, center row, left). However, this model predicts flanker-correspondence rates that are significantly different from those in the human data (

*F*[1, 120] = 52.4,

*p*< 0.001). In addition, the identification thresholds following from this model (Figure 8, center row, right) are significantly different from the human observer thresholds (

*F*[1, 104] = 15.7,

*p*< 0.001). These results show that it is unlikely that the crowding behavior observed in Experiment 3 resulted solely from an increase in uncertainty about the identities of the stimuli.

*F*[1, 120] = 0.008,

*p*= 0.93), contrast thresholds (

*F*[1, 104] = 0.22,

*p*= 0.64), or predicted flanker-correspondence rates (

*F*[1, 120] = 1.420,

*p*= 0.24). These results suggest that—if observers were optimal in Experiment 3—then one way to explain crowding effects is by assuming that uncertainty about both stimulus positions and identities depends on flanker proximity.

*Vision Research**,*16, 71–78. [CrossRef] [PubMed]

*Nature**,*226 (5241), 177–178. [CrossRef] [PubMed]

*Spatial Vision**,*10 (4), 433–436. [CrossRef] [PubMed]

*, 7 (2): 11, 11–13, http://www.journalofvision.org/content/7/2/11, doi:10.1167/7.2.11. [PubMed] [Article]. [CrossRef] [PubMed]*

*Journal of Vision*

*Vision Research**,*49 (15), 1948–1960. [CrossRef] [PubMed]

*, 10 (10): 14, 1–16, http://www.journalofvision.org/content/10/10/14, doi:10.1167/10.10.14. [PubMed] [Article]. [CrossRef] [PubMed]*

*Journal of Vision**, 11 (9): 2, 1–13, http://www.journalofvision.org/content/11/9/2, doi:10.1167/11.9.2. [PubMed] [Article]. [CrossRef] [PubMed]*

*Journal of Vision*

*Nature Reviews Neuroscience**,*9 (4), 292–303. [CrossRef] [PubMed]

*Proceedings of the National Academy of Sciences of the United States of America**,*106 (31), 13130–13135. [CrossRef] [PubMed]

*Current Biology**,*20 (6), 496–501. [CrossRef] [PubMed]

*Nature**,*383 (6598), 334–337. [CrossRef] [PubMed]

*Vision Research**,*32 (6), 1085–1097. [CrossRef] [PubMed]

*Visual Cognition**,*9 (7), 889–910. [CrossRef]

*Spatial Vision**,*8 (2), 255–279. [CrossRef] [PubMed]

*Zeitschrift für Psychologie**,*93, 17–82.

*Vision Research**,*48 (5), 635–654. [CrossRef] [PubMed]

*Current Biology**,*19 (23), 1988–1993. [CrossRef] [PubMed]

*, 2 (2): 2, 167–177, http://www.journalofvision.org/content/2/2/2, doi:10.1167/2.2.2. [PubMed] [Article]. [CrossRef]*

*Journal of Vision*

*Vision Research**,*50 (22), 2308–2319. [CrossRef] [PubMed]

*. Cambridge, UK: Cambridge University Press.*

*Information theory, inference, and learning algorithms**, 11 (1): 18, 1–18, http://www.journalofvision.org/content/11/1/18, doi:10.1167/11.1.18. [PubMed] [Article]. [CrossRef] [PubMed]*

*Journal of Vision**, 7 (2): 5, 1–26, hhttp://www.journalofvision.org/content/7/2/5, doi:10.1167/7.2.5. [PubMed] [Article]. [CrossRef] [PubMed]*

*Journal of Vision*

*Spatial Vision**,*10 (4), 437–442. [CrossRef] [PubMed]

*, 4 (12): 12, 1136–1169, http://www.journalofvision.org/content/4/12/12, doi:10.1167/4.12.12. [PubMed] [Article]. [CrossRef]*

*Journal of Vision*

*Clinical Vision Sciences**,*2, 187–199.

*Nature Neuroscience**,*11 (10), 1129–1135. [CrossRef] [PubMed]

*, 7 (2): 8, 1–9, http://www.journalofvision.org/content/7/2/8, doi:10.1167/7.2.8. [PubMed] [Article]. [CrossRef] [PubMed]*

*Journal of Vision*

*Vision Research**,*50, 2248–2260.

*, 5 (11): 8, 1024–1037, http://www.journalofvision.org/content/5/11/8, doi:10.1167/5.11.8. [PubMed] [Article]. [CrossRef]*

*Journal of Vision*

*American Journal of Ophthalmology**,*53, 471–477. [CrossRef] [PubMed]

*, 10 (5): 16, 1–14, http://www.journalofvision.org/content/10/5/16, doi:10.1167/10.5.16. [PubMed] [Article]. [CrossRef] [PubMed]*

*Journal of Vision**, 7 (2): 14, 1–11, http://www.journalofvision.org/content/7/2/14, doi:10.1167/7.2.14. [PubMed] [Article]. [CrossRef] [PubMed]*

*Journal of Vision*

*PLoS Computational Biology**,*6 (1), e1000646. [CrossRef] [PubMed]

*Trends in Cognitive Sciences**,*15 (4), 160–168. [CrossRef] [PubMed]

*φ*

_{L}and

*φ*

_{R}denote the eccentricities of the stimuli on the left and the right, respectively;

*x*

_{L}and

*x*

_{R}denote the internal representations of the left and right stimuli; and

*σ*

_{L}and

*σ*

_{R}denote the trial-to-trial variability over these internal representations; erf(.) is the error function; and $\sigma (\phi ,c)=a\phi ct$. Under this model, and assuming independence between trials, the likelihood of the data obtained in Experiment 1 equals where we sorted the data in such a way that the first

*M*trials were those on which the subject responded “left” and the subsequent

*N*–

*M*trials were those on which the subject responded “right.” We find the maximum-likelihood values of parameters

*a*and

*t*by maximizing the log of the previous expression:

*average*likelihood of the data over its parameter space between, instead of merely the

*maximum*likelihood. As a result of the averaging, additional model parameters are automatically punished for. The intuition behind it is that, while the maximum likelihood can never get worse when adding free parameters to a model, the average likelihood will increase only if the additional parameter adds sufficient “likelihood mass” to the entire volume, because adding a parameter will lead to averaging over a larger volume.

*M,*we calculated the probability of the data

*D*given that model. Denoting model parameters by

*θ*, this probability can be written as The conditional probability

*p*(

*D*|

*M*, ) is calculated by assuming that the data are conditionally independent across trials: with

*p*(error|,

**c**, ) as in Equation 5, and trials sorted such that trials 1 …

*M*are the trials on which the observer made an error and

*M*+ 1 …

*N*the trials on which the observer reported correctly. (In the analysis of Experiment 1, we used

*p*(data|

*a*,

*s*) as defined in the previous section instead of

*p*(error|,

**c**, ).) It is convenient to take the logarithm of Equation 7: We assume a uniform prior distribution over the parameters in Equation 6: where Vol is the volume of parameter space. Combining and taking the log, we find the following expression for the log likelihood of model

*M*under data set

*D*: We approximated this integral numerically, by evaluating

*L*(

_{M}*θ*) for a large range of (linearly spaced) parameter values: where

*k*is the number of free parameters of the model and

*m*the number of values at which each parameter is evaluated.

*M*are the trials on which the observer made an error and trials

*M*+ 1 …

*N*the trials on which the observer made a correct response,

*θ*a vector with the model's free parameters.