We developed a method analogous to classification images that allowed us to measure the influence that each dot in a dot cluster had on observers' estimates of the center of the cluster. In Experiment 1, we investigated whether observers employ a robust estimator when estimating the centers of dot clusters that were drawn from a single distribution. Most observers' fitted influences did not differ significantly from that predicted by a center-of-gravity (COG) estimator. Such an estimator is not robust. In Experiments 2 and 3, we considered an alternative approach to the problem of robust estimation, based on source separation, that makes use of the visual system's ability to segment visual data. The observers' task was to estimate the center of one distribution when viewing complex dot clusters that were drawn from a mixture of two distributions. We compared human performance to that of an ideal observer that separated the cluster into two sources through a maximum likelihood algorithm and based its estimates of location using the dots it assigned to just one of the two sources. The results suggest that robust methods employed by the visual system are closely tied to mechanisms of perceptual segmentation.

*p*

_{ i }= (

*p*

_{ i }

^{ x },

*p*

_{ i }

^{ y }),

*i*= 1, …,

*n*drawn from a circularly symmetric distribution centered on the point

*μ*= (

*μ*

^{ x },

*μ*

^{ y }). The challenge for the viewer is to estimate the location of the invisible center of the distribution given only the visible dots.

^{1}and that minimize the expected variance of the estimate (see Landy, Maloney, Johnston, & Young, 1995). If, for example, we know that the distribution is bivariate Gaussian, then the unbiased estimation rule with minimum variance is the

*center-of-gravity (COG) estimator,*

^{ x },

^{ y }), where

^{ x }is the average (mean) of the dots'

*x*-coordinates and

^{ y }is the average (mean) of the dots'

*y*-coordinates.

*relative efficiency*of the median observer relative to the COG observer is the ratio of the variance of the latter to the variance of the former (see 1 for discussion of the relative efficiency of estimators). The median has 36% lower efficiency than COG. For Gaussian clusters, the COG estimator has the highest possible efficiency (the lowest possible variance) of any estimator.

*t*-distribution with three degrees of freedom, denoted

*t*

_{3}. The resulting cluster is superficially similar to the Gaussian in Figure 1A, but now the estimates of the median observer have about 59% higher efficiency than the estimates of the COG observer. The mean and median have exchanged roles. Part of the reason why the mean is less efficient than the median when estimating the center of

*t*

_{3}is that the distribution tends to generate dots very far from the center of the distribution (“outliers”), which greatly affect the variance of the mean of the sample but not that of the median.

*t*

_{3}, he should avoid using the mean and instead use the median or some other estimator with a higher efficiency than the mean. However, we are rarely in the position of knowing for certain that the data we confront were drawn from a specific distributional family such as the Gaussian. In 1960, the statistician John Tukey published a seminal article pointing out the remarkable consequences of our ignorance.

^{2}Tukey's article triggered considerable interest (see Huber, 2001) in developing what came to be known as

*robust estimators*and the mathematical theory of robustness.

*robustness in the statistician's sense*. We use the terms

*outlier robust*and

*outlier robustness*in discussing it. To our knowledge, outlier robustness was introduced into the vision literature by Landy et al. (1995).

*source-robust estimation*is linked to perceptual segmentation.

^{3}.

*t*

_{3}. For the latter distribution in particular, the COG would be a poor choice of estimator as we saw above. Even for the former, we might expect to see down-weighting of extreme dots since the visual system has no way of knowing that the dots are drawn from an “uncontaminated” bivariate Gaussian. To anticipate our results, observers' behavior in Experiment 1 was consistent with the use of COG for both distributions. Whatever the merits of Tukey's argument, the visual system does not use a robust rule in estimating the centers of dot clusters that are perceived as arising from a single source.

*t*-distribution with three degrees of freedom, denoted

*t*

_{3}. We predicted that even if observers use a non-robust approach when estimating some property of the Gaussian cluster, they would nevertheless employ a robust approach when estimating some property of

*t*

_{3}, because it has longer tails than the Gaussian. The need for robust methods to protect one's estimate from outliers is more acute when there are very extreme dots present in the data.

*σ*= 0.89°) containing 100 dots, and each dot had a diameter of 2.4 minutes of arc. The

*x*and

*y*coordinates of the

*t*

_{3}stimuli were independent, both drawn from a

*t*-distribution with 3 degrees of freedom. We scaled the

*t*

_{3}stimuli so that, on average, 90% of the dots appeared within the same region as 90% of the dots of the Gaussian stimuli. This ensured that the two types of stimuli would look roughly similar. Figure 1 shows an example of the equated stimuli.

*t*

_{3}distribution would often include a dot that was not displayed. In all of our analyses, we used only the dots that were visible to the observer.

*t*

_{3}) was aligned with the reference lines. Figure 2 shows the effect of shifting the cluster to the left and to the right of the reference lines.

*t*

_{3}clusters. Some observers were shown the Gaussian blocks first, while other observers were shown the

*t*

_{3}blocks first. The observers were not informed of these details but rather told that the experiment was divided into four blocks to allow for easy breaks. The entire experiment, which comprised a short practice block followed by four experimental blocks, lasted approximately 1 h and 20 min.

*s,*and the number of dots in each bin, which we call the bin counts

*B*

_{1},

*B*

_{2}…

*B*

_{ n }. We denote influence per dot in each bin by

*γ*

_{1},

*γ*

_{2}…

*γ*

_{ n }. The GLM links an observer's response profile to a Gaussian psychometric function (Φ) using the following equation, where

*t*denotes the trial number:

*μ,*which is the observer's overall bias;

*σ,*which is the slope of the psychometric function; and

*γ*

_{1}through

*γ*

_{ n }, which are the influences per dot on observer's PSE for each bin.

*p*= 0.217). Based on this conclusion, we used GLM to fit influence functions that were odd-symmetric about 0. We report only the parameters to the right of 0; those to the left are identical in absolute value and opposite in sign.

*t*

_{3}data separately and thus there were 1400 trials per data set (for each observer). We used a maximum likelihood procedure to find the best fits of the parameters, and we calculated the 95% confidence intervals of each parameter using bootstrap methods (Efron & Tibshirani, 1993).

*t*

_{3}stimuli so that, on average, 90% of the dots appeared within the same region as 90% of the dots of the Gaussian stimuli. In the results presented here, we decided to divide the region that covers 90% of the dots into six bins and used two more bins, one on each side, to cover the remaining 10% of the dots. Thus, the length of each of the inner bins (labeled 1, 2, and 3), for both the Gaussian and the

*t*

_{3}analyses, was 0.49°. There were six of these, three to the right of 0 (which are shown in the graph) and three to the left of 0 (which are not shown in the graph because they are defined symmetrically), and so the horizontal region covered by these bins was 2.9°. The length of the outermost bin (labeled 4) was 3.7°, extending from the end of bin 3 until the end of the display region, with the outermost bin on the left being defined symmetrically.

*t*

_{3}stimuli first (O2, O5). We ran two other observers who saw the

*t*

_{3}stimuli first, but their precision (measured as the slope of the psychometric function) was much lower than the precision of the observers reported here. The fits for the influences-per-dot parameters for these two observers were similar to those of the seven observers whose results are shown, but with markedly larger confidence intervals. Hence, we omit them from the graphical summary of data.

*p*> 0.05 based on 95% bootstrapped CI). Only one observer (O2) had significantly less influence per dot for the outermost bin than that predicted by COG for both stimulus types (

*p*< 0.05 based on 95% bootstrapped CI). One other observer (O6) had less influence per dot for the outermost bin than that predicted by COG only for the

*t*

_{3}condition and not for the Gaussian condition.

*shape*implied by the dot cluster, rather than the COG of the dots themselves; see Melcher & Kowler, 1999.) These observers were evidently not using an outlier-robust method in estimating the center.

*t*

_{3}or vice versa). Despite these reports, we considered the possibility that observers used the same rule of combination for the Gaussian and

*t*

_{3}distributions because they could not readily discriminate samples from the two distributions.

*t*

_{3}samples, interleaved at random). The distributions were presented just as they were in Experiment 1. We performed a signal detection analysis (Green & Swets, 1966/1974) on the resulting data, arbitrarily declaring that one keypress response counted as a “Gaussian” response, the other a “

*t*

_{3}” response.

*d*′ separately for each observer based on each observer's responses. The resulting

*d*′ could be positive or negative, depending on whether the observer chose the same assignment of distributions to keys as we did. We report the absolute values of the estimated

*d*′ values. The observers'

*d*′ values ranged between 1.5 and 2.2 in absolute magnitude, indicating that observers could readily differentiate samples from the two distributions used in Experiment 1 even when they were interleaved.

*t*

_{3}distribution. We also found that there was no significant difference between the Gaussian condition and the

*t*

_{3}condition across observers. Regardless of whether or not the stimuli had extreme dots, most observers' fits for the binned influences per dot on PSE were not significantly different from that predicted by the simple COG model.

*t*

_{3}distribution in Experiment 1 is that observers

*cannot*weight a cluster's dots differentially. In Experiment 2, we explicitly asked observers to assign differential weights to the dots, and the results indicate that they are fully capable of doing so. Hence, we can rule out the possibility that observers' lack of robustness in Experiment 1 was due to some sort of inability to assign differential weights to the dots in a cluster.

*σ*= 0.89°) and a 40-dot noise cluster that was uniformly distributed across the screen (13.5° by 10.7°). Every dot in the display had a diameter of 2.4 minutes of arc just as in the first experiment. Figure 7 shows a schematic of the stimulus.

*t*

_{3}distribution was not included in this experiment) and seven shift locations resulting in 7 conditions. Each condition was repeated 200 times resulting in 1400 total trials. The entire experiment lasted approximately 45 min.

*ignorant model*that makes its response based on the COG of all 140 dots, the

*omniscient model*that makes its response based on the COG of just the 100-dot Gaussian cluster, and the maximum likelihood model developed below.

*p*

_{1}, …,

*p*

_{ n }, where

*p*

_{ i }= (

*p*

_{ i }

^{ x },

*p*

_{ i }

^{ y }) are the horizontal and vertical coordinates of a dot. Exactly 100 of these dots are drawn from an isotropic Gaussian distribution

*φ*(

*p*;

*μ, σ*

^{2}) with unknown mean

*μ*= (

*μ*

^{ x },

*μ*

^{ y }) and known variance

*σ*

^{2}, and the remaining 40 dots are drawn from a Uniform distribution whose probability density function is

*U*(

*x, y*) =

*C,*a constant. Let

*π*: {1, …, 100}

^{4}for any choice of free parameters

*μ, π*is

*C*is the constant derived from the probability density function of the 40 dots that are assigned to the Uniform instead of the Gaussian. We wish to maximize Equation 2 by choice of the free parameters

*μ, π*. The resulting estimates

^{ x }.

*μ,*the expression in Equation 2 is maximized when the function

*π*assigns to the Gaussian the 100 dots that are nearest to

*μ*. Thus, we need only search on

*μ,*computing Equation 2 for the 100 nearest dots.

*n*= 100” because it assigns 100 dots to the Gaussian.

*p*= 0.287). Hence, we report only the binned influences to the right of 0; those to the left are identical in absolute value and opposite in sign. The outermost bin (labeled 9) was 2.85° long, extending from the end of bin 8 until the edge of the screen, with the outermost bin on the left being defined symmetrically. Remember that there were dots present throughout the screen, and thus it is easier to see the influence pattern if we fit more eccentricities than the small number of eccentricities that we fit in Experiment 1.

*n*is the number of dots assigned to the Gaussian cluster (with the remaining 140 −

*n*dots assigned to the uniform noise). For each value of

*n,*ranging from 94 to 116, we repeated the simulation of the ML model's performance and compared it against the mean influence function exhibited by the observers. We found that the ML model with

*n*= 108 best fitted the influence function derived from observers' responses. The simulated performance of this ML model is plotted in Figure 9 as a red solid curve and labeled “ML,

*n*= 108” because it assigns 108 dots to the Gaussian.

*σ*= 0.89°) and a small 15-dot contamination cluster uniformly distributed across a small square region (0.89° by 0.89°) that was placed at one of several offsets from the center of the main cluster. Every dot in the display had a diameter of 2.4 minutes of arc just as in the first two experiments.

*σ*= 0.89°). The length of the small contamination cluster (which was drawn from a squared uniform distribution of 0.89° by 0.89°) was equal to the size of the standard deviation of the main cluster. Hence, by simply stating how many standard deviations (

*SD*s) away the small cluster's center was from the main cluster's center, one could quickly surmise how much overlap in

*SD*units there was between the two clusters. For example, placing the small cluster 1.5

*SD*s away meant that the 15-dot cluster extended from 1

*SD*to 2

*SD*s on the main cluster. When the small cluster was placed 0.5

*SD*away, it fell in a dense region of the main cluster. On the other hand, when the small cluster was 4

*SD*s away, the two clusters were readily separable.

*SD*= 0). The eccentricities to the left were the same as the eccentricities to the right, and for the analysis and results, we collapsed the data after testing and failing to reject our null hypothesis that the influence of the small cluster on observers' PSE was odd-symmetric (lowest observer's

*p*= 0.57). For example, we do not distinguish between offsetting the small cluster 2

*SD*s to the left and offsetting it 2

*SD*s to the right. Hence, for simplicity, we describe the eccentricities of only 10 offsets keeping in mind that each one (save for

*SD*= 0) corresponds to both an offset to the left and an offset to the right of the main cluster. The 10 eccentricities (in

*SD*units) were: 0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5.

*SD*= 0 as a measure of the observer's overall bias. We then compared the observer's PSEs at the other nine eccentricities (i.e.,

*SD*≥ 0.5) to the observer's overall bias to measure the influence of the small cluster on observer's PSE at each eccentricity. Figure 12 shows the mean influence of the small cluster on observers' PSE as a function of its eccentricity from the center of the main cluster. These means are plotted in red together with the 95% CI of the mean. They were compared to the influence function predicted by an ignorant model (blue dashed diagonal) and the influence function predicted by an omniscient model (black dashed horizontal) as in Experiment 2.

*p*

_{1}, …,

*p*

_{ n }(notation as in Experiment 2). Exactly 100 of these dots are drawn from an isotropic Gaussian distribution

*φ*(

*p*;

*μ*,

*σ*

^{2}) with unknown mean

*μ*and known variance

*σ*

^{2}, and the remaining 15 dots are drawn from a Uniform distribution within a small square region with known dimensions whose probability density function is

*U*(

*x*,

*y*) =

*C,*a constant. However, the center of the small square region, denoted

*α,*is unknown. To simplify the simulation of the ML estimator, we assumed that observers had no trouble identifying that the small contamination cluster was only shifted horizontally, not vertically, with respect to the large Gaussian cluster (as was made evident to the observers during training). Hence, the ML estimator knows that the vertical coordinates of

*μ*= (

*μ*

^{ x },

*μ*

^{ y }) and

*α*= (

*α*

^{ x },

*α*

^{ y }) are

*μ*

^{ y }= 0 and

*α*

^{ y }= 0, respectively, and so only the horizontal coordinates

*μ*

^{ x }and

*α*

^{ x }are unknown to the model.

*n*(

*α*

^{ x }) denote the number of dots that lie inside the small square region for any choice of

*α*

^{ x }, with 115 −

*n*(

*α*

^{ x }) being the number of dots that lie outside the square region. The ML estimator must assign exactly 100 dots to the Gaussian cluster and exactly 15 dots to the contamination cluster. Furthermore, it is not allowed to assign to the contamination cluster any dots that lie outside the square region. Hence, we rule out from the solution space all locations

*α*

^{ x }where

*n*(

*α*

^{ x }) < 15 (by assigning a likelihood of 0, i.e., a log-likelihood of −∞).

*α*

^{ x }where

*n*(

*α*

^{ x }) = 15, the ML estimator assigns the 15 dots that lie inside the square region to the contamination cluster and the 100 dots that lie outside the square region to the Gaussian cluster. For all locations

*α*

^{ x }where

*n*(

*α*

^{ x }) > 15, the ML estimator must assign some of the dots (

*n*(

*α*

^{ x }) − 15 to be precise) that lie inside the square region to the Gaussian cluster, because there are less than 100 dots lying outside the square region. In the ML model derived for Experiment 2, we noted that the likelihood is maximized by assigning to the Gaussian cluster the dots that are nearest to

*μ*

^{ x }. So, from all the dots lying inside the square region, the ML estimator assigns to the Gaussian the

*n*(

*α*

^{ x }) − 15 dots that are nearest to

*μ*

^{ x }so that, together with the dots lying outside the square region, there are exactly 100 dots assigned to the Gaussian. This assignment maximizes likelihood for any given choice of

*μ*

^{ x }and

*α*

^{ x }.

*π*: {1, …, 100}

*α*

^{ x },

*μ*

^{ x }is

*C*is the constant derived from the probability density function of the 15 dots that are assigned to the contamination cluster instead of the Gaussian. We wish to maximize Equation 4 by choice of free parameters

*α*

^{ x },

*μ*

^{ x }. The resulting estimates

^{ x },

^{ x }are the maximum likelihood estimates. The estimate of concern to us is

^{ x }.

*SD*> = 3.5), its mean influence on observers' PSE is not significantly different from 0 and the prediction of the omniscient model. This is hardly surprising. We included these conditions to verify that the influence of the small cluster disappears if it is very far away from the main cluster. On the other hand, when the small cluster partially overlaps the main cluster (

*SD*= 1.5 to

*SD*= 3), its mean influence on observers' PSE is less than that predicted by the ignorant model but greater than that predicted by the omniscient and ML models. This indicates that, although observers do down-weight the influence of dots that are likely to belong to the small contamination cluster, their ability to separate sources falls short of that of the ML observer. Human observers did not separate sources so well when the small cluster was offset about 1.5 to 2.5

*SD*s away from the center of the Gaussian.

*SD*and 1.0

*SD*), its mean influence on observers' PSE is not significantly different from 0 and the prediction of the omniscient model. The dots in the small cluster are evidently not outliers when they are so close to the center of the Gaussian, and an outlier-robust estimator would not reduce their influence. Yet human observers do so. This pattern of performance demonstrates that human observers are not simply detecting outliers and reducing their influence. They behave as we would expect a source-robust estimator to behave.

*SD*s. To determine why, we examined examples of stimuli with the small cluster at 0.5

*SD*, 2.0

*SD*s, and 3.5

*SD*s.

*SD*s, the ML observer and the human observer can readily segment the small cluster from the large and discount its effect. The ML observer, omniscient observer, and human observer are in good agreement. When the eccentricity is 0.5

*SD*, the presence of the small cluster is betrayed by a marked increase in density and a concomitant decrease in dot spacing near the small cluster. The human observer can discount the effect of the small cluster almost perfectly.

*SD*= 2.0), the small cluster falls in a sparser region of the Gaussian. In objective terms, adding the small cluster increases local dot density well above what could be expected from the Gaussian alone. As a consequence, the ML observer has little difficulty in locating the small cluster and reducing its influence almost completely. However, we found that the small cluster was almost invisible when it is offset 2.0

*SD*s. We could identify whether it fell on the right or left side of the center of the Gaussian, but to the unalerted observer there was little to indicate that it was there at all. We conjecture that the observed deviations between human performance and performance of the ML observer are due to difficulties in detecting the small cluster with intermediate offsets. This masking effect deserves further study given the marked discrepancy between observed human and ML performances.

*t*-distribution with 3 degrees of freedom, denoted

*t*

_{3}. An outlier-robust rule would assign less weight to extreme dots generated by either distribution. We found that observers did not give less weight (influence) to potential outliers for either choice of distribution. Their influence estimates were indistinguishable from those of the center-of-gravity (COG) rule.

*t*

_{3}, however, observers' estimates were also not outlier robust, and for this distribution, the COG has markedly higher variance than other unbiased rules of estimation such as the median. The COG is not a “good” estimation rule for the

*t*

_{3}, at least in the sense of minimizing variance.

*t*

_{3}distribution but did not find.

*θ*

_{1}, …,

*θ*

_{ m }of a parameterized distribution with probability density function

*f*(

*x*;

*θ*

_{1}, …,

*θ*

_{ m }) given a sample

*X*

_{1}, …,

*X*

_{ n }from that distribution. A familiar distribution is the Gaussian

*f*(

*x*;

*μ, σ*

^{2}) with parameters

*μ*and

*σ*

^{2}. Any function of the sample

*T*(

*X*

_{1}, …,

*X*

_{ n }) is an estimator, and since it is a function of random variables, any estimator is a random variable itself. A familiar example is the

*mean*

*unbiased estimator*of a parameter

*θ*

_{ i }has expected value

*E*[

*T*(

*X*

_{1}, …,

*X*

_{ n })] =

*θ*

_{ i }. Intuitively, it gets the right answer “on average.” For the Gaussian distribution, the sample mean is an unbiased estimator of the parameter

*μ*also called the population mean. Of course, the population mean is also the population median and the population mode, and ultimately, it just marks the center of the distribution. The

*median*of the sample is a second example of an unbiased estimator of the center of the Gaussian.

*μ*.

*relative efficiency*of two estimators is the ratio of the variance of the second to the variance of the first (note the order). If the second has a lower variance, then the efficiency is less than 1. Efficiency is a measure of the loss incurred by using the first estimator in place of the second for a particular sample size from a particular distribution. Tukey (1960) considered the relative efficiency of the median relative to the mean when estimating the center of contaminated Gaussian distributions. Most of the dots in a sample are drawn from a Gaussian with unknown mean

*μ*and known variance

*σ*

^{2}, but a small proportion

*p*are drawn from a distribution with the same mean but higher variance. In the following example, we use 16

*σ*

^{2}(or 4

*σ*).

*robust to failures of distributional assumptions*. Since we can never be certain that empirical data are drawn from an uncontaminated Gaussian, we can never justify the use of the mean instead of its robust cousins when analyzing data.

^{2}Tukey (1960) only considered large sample, asymptotic behavior. We have verified his conclusions for samples as small as 5 as well.

^{3}Cohen et al. (2008) use the term

*part-based robustness*. In this paper, we consider the more general notion of

*source robustness*.