Free
Research Article  |   December 2010
Robust visual estimation as source separation
Author Affiliations
Journal of Vision December 2010, Vol.10, 2. doi:10.1167/10.14.2
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Mordechai Z. Juni, Manish Singh, Laurence T. Maloney; Robust visual estimation as source separation. Journal of Vision 2010;10(14):2. doi: 10.1167/10.14.2.

      Download citation file:


      © 2015 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements

We developed a method analogous to classification images that allowed us to measure the influence that each dot in a dot cluster had on observers' estimates of the center of the cluster. In Experiment 1, we investigated whether observers employ a robust estimator when estimating the centers of dot clusters that were drawn from a single distribution. Most observers' fitted influences did not differ significantly from that predicted by a center-of-gravity (COG) estimator. Such an estimator is not robust. In Experiments 2 and 3, we considered an alternative approach to the problem of robust estimation, based on source separation, that makes use of the visual system's ability to segment visual data. The observers' task was to estimate the center of one distribution when viewing complex dot clusters that were drawn from a mixture of two distributions. We compared human performance to that of an ideal observer that separated the cluster into two sources through a maximum likelihood algorithm and based its estimates of location using the dots it assigned to just one of the two sources. The results suggest that robust methods employed by the visual system are closely tied to mechanisms of perceptual segmentation.

Introduction
Visual tasks are often modeled as statistical estimation tasks. In Figure 1A, for example, the viewer is confronted with the scatter plot of a sample of dots p i = (p i x , p i y ), i = 1, …, n drawn from a circularly symmetric distribution centered on the point μ = (μ x , μ y ). The challenge for the viewer is to estimate the location of the invisible center of the distribution given only the visible dots.
Figure 1
 
Experiment 1—Example of the two stimulus types. (A) A Gaussian dot cluster: The horizontal and vertical coordinates of the dots in the dot cluster are independent, identically distributed random variables drawn from a Gaussian distribution. (B) A t 3 dot cluster: The horizontal and vertical coordinates of the dots in the dot cluster are independent, identically distributed random variables drawn from a t-distribution with 3 degrees of freedom. A dashed vertical line marks the horizontal center of the distributions.
Figure 1
 
Experiment 1—Example of the two stimulus types. (A) A Gaussian dot cluster: The horizontal and vertical coordinates of the dots in the dot cluster are independent, identically distributed random variables drawn from a Gaussian distribution. (B) A t 3 dot cluster: The horizontal and vertical coordinates of the dots in the dot cluster are independent, identically distributed random variables drawn from a t-distribution with 3 degrees of freedom. A dashed vertical line marks the horizontal center of the distributions.
 
Each dot is an independent piece of information about the unknown center and the resulting estimation task is an example of a cue combination problem. Cue combination models typically make strong assumptions about the distributions involved. They are typically focused on estimators that are unbiased 1 and that minimize the expected variance of the estimate (see Landy, Maloney, Johnston, & Young, 1995). If, for example, we know that the distribution is bivariate Gaussian, then the unbiased estimation rule with minimum variance is the center-of-gravity (COG) estimator,
μ ^
= (
μ ^
x ,
μ ^
y ), where
μ ^
x is the average (mean) of the dots' x-coordinates and
μ ^
y is the average (mean) of the dots' y-coordinates. 
There are previous reports that COG is employed in visual localization tasks. Whitaker and Walker (1988) found that comparison of the horizontal offset of two dot clusters depended on COG. Morgan, Hole, and Glennerster (1990) reported that the perceived distance between two target dots located within separate dot clusters is biased toward the distance between the two clusters' respective COGs. Morgan et al. concluded that the COG acts as a reference point for each cluster as a whole. McGowan, Kowler, Sharma, and Chubb (1998) reported that saccades to dot clusters land near the COG. 
The potential pitfall of using COG is that its variance can inflate markedly if the distribution from which the sample is drawn is not Gaussian. First, consider the Gaussian case like the 100-dot cluster in Figure 1A. If instead of COG we choose to estimate the center of the cluster by taking the median values of the horizontal and vertical coordinates of the dots, the resulting horizontal and vertical estimates of the center of the cluster would have about 56% higher variance. The relative efficiency of the median observer relative to the COG observer is the ratio of the variance of the latter to the variance of the former (see 1 for discussion of the relative efficiency of estimators). The median has 36% lower efficiency than COG. For Gaussian clusters, the COG estimator has the highest possible efficiency (the lowest possible variance) of any estimator. 
However, now consider the 100-dot cluster in Figure 1B. Its horizontal and vertical coordinates are independent samples from the Student's t-distribution with three degrees of freedom, denoted t 3. The resulting cluster is superficially similar to the Gaussian in Figure 1A, but now the estimates of the median observer have about 59% higher efficiency than the estimates of the COG observer. The mean and median have exchanged roles. Part of the reason why the mean is less efficient than the median when estimating the center of t 3 is that the distribution tends to generate dots very far from the center of the distribution (“outliers”), which greatly affect the variance of the mean of the sample but not that of the median. 
The statistician's choice seems clear. If he knows that the sample is taken from a Gaussian, he should use the mean of the sample as an estimate of the center of the population. If he knows that the sample is taken from t 3, he should avoid using the mean and instead use the median or some other estimator with a higher efficiency than the mean. However, we are rarely in the position of knowing for certain that the data we confront were drawn from a specific distributional family such as the Gaussian. In 1960, the statistician John Tukey published a seminal article pointing out the remarkable consequences of our ignorance. 
He showed that “… nearly imperceptible non-normalities may make conventional relative efficiencies of estimates of scale and location entirely useless…. If contamination is a real possibility (and when is it not?), neither [sample] mean nor [sample] variance is likely to be a wisely chosen basis for making estimates …” (Tukey, 1960, p. 474). 2 Tukey's article triggered considerable interest (see Huber, 2001) in developing what came to be known as robust estimators and the mathematical theory of robustness. 
Robustness in statistics is often presented as the process of down-weighting the influence of distant “outliers” in the data. To that end, robust methods were developed to arrive at nearly optimal estimates both in the presence and in the absence of outliers (Hampel, 1974; Huber, 1981). We refer to this definition of robustness as robustness in the statistician's sense. We use the terms outlier robust and outlier robustness in discussing it. To our knowledge, outlier robustness was introduced into the vision literature by Landy et al. (1995). 
However, there is a second way to think about robustness that also has its origins in Tukey's (1960) work. We can imagine that the stimulus is composed of dots drawn from multiple sources. We wish to ignore all but one source and estimate parameters belonging to that one source alone. With this interpretation, source-robust estimation is linked to perceptual segmentation. 
Cohen, Singh, and Maloney (2008), for example, presented observers with stimuli that consisted of a large, elliptical cluster and a small, circular cluster that was placed, at least partially, beyond the perimeter of the ellipse. The observers were asked to rotate a probe pattern to match the perceived orientation of the cluster, and they were not explicitly informed of the complex nature of the stimuli. The study found that with increased lateral separation between the main, elliptical cluster and the secondary, small cluster, observers' orientation settings approached the principal axis of the main, elliptical cluster, away from the principal axis of the entire cluster. This indicates that with greater separation, the small secondary cluster lends itself more and more to be segmented away from the main cluster into its own separate part and thus to be treated as an independent source. This down-weighting mechanism relies on detecting that there is more than one source generating the stimulus. It is a form of source robustness 3
Outlier robustness does not use segmentation cues to find suspect dots, but rather down-weights the most extreme dots indiscriminately, solely because they fall at the extremes of the data set. Such a process ensures that extreme outliers will not have undue effects on one's estimate of the parameter in question. This, however, is different from source-robust estimation in which the influence of dots is reduced to the extent that they are assigned to a second, distinct source that is not relevant to the estimation task. 
In Experiment 1, we asked observers to estimate the centers of dot clusters drawn from single distributions and tested whether the visual system systematically assigns lower weights to dots further from the cluster's center. We do so for two kinds of distributions, Gaussian and t 3. For the latter distribution in particular, the COG would be a poor choice of estimator as we saw above. Even for the former, we might expect to see down-weighting of extreme dots since the visual system has no way of knowing that the dots are drawn from an “uncontaminated” bivariate Gaussian. To anticipate our results, observers' behavior in Experiment 1 was consistent with the use of COG for both distributions. Whatever the merits of Tukey's argument, the visual system does not use a robust rule in estimating the centers of dot clusters that are perceived as arising from a single source. 
In Experiment 2, we addressed Tukey's (1960) source separation problem by examining how observers estimate the center of dots drawn from one distribution in the presence of dots drawn from a second distribution. In Experiment 3, we report an analogue of the task in Cohen et al. (2008), using a localization task rather than an orientation task. 
Before we report our experimental results, we consider one last question. Why should the visual system choose rules of combination that minimize variance? Minimum variance is a standard cost function commonly employed in both the statistics literature and also in the visual cue combination literature (Landy et al., 1995; Oruç, Maloney, & Landy, 2003). However, why should an observer in our tasks minimize variance rather than some other criterion more closely aligned with the demands of everyday tasks? If the observer uses his visual estimate as the aim point in rapidly reaching to touch a small circular disk or in throwing a ball at a basket, should he not adopt a visual rule of combination that reflects the task and maximizes the probability of success? 
We conjecture that minimizing variance is precisely the criterion that is aligned with the demands of many everyday tasks. If the observer combines a large number of dots in a dot cluster by a weighted average of their spatial locations, then it is plausible that the resulting visual estimate will be close to Gaussian in form. We also know that motor error in many tasks is close to Gaussian and there are theoretical reasons why it should be (Harris & Wolpert, 1998). Combined visual and motor error is typically Gaussian in form (Trommershäuser, Maloney, & Landy, 2008) as it must be if it is simply the sum of independent Gaussian visual and motor errors. 
If the goal of the observer in rapidly reaching to touch a circular target is to maximize his probability of success, then he achieves this goal by placing as much of the probability density of his visual motor error within the target area. If his visuomotor error is Gaussian, he should (a) aim for the center of the target and (b) minimize his visuomotor variance by minimizing his visual and motor variances separately. In making this argument, we ignore the effect of visual feedback on movements. However, so long as visual feedback does not alter the Gaussian form of visuomotor error, it is still minimized by minimizing the variance of the initial visual estimate. So long as visual error is Gaussian in form, then minimizing its variance automatically optimizes performance in a wide class of everyday tasks. 
Experiment 1
To investigate whether observers use estimators that are outlier robust, we had observers estimate the centers of dot clusters drawn from isotropic, bivariate distributions. Each observer participated in two different conditions. The only difference between the two conditions was the underlying distribution from which the dots were drawn. In one condition, the dots were drawn from a Gaussian, and in the other condition, they were drawn from a t-distribution with three degrees of freedom, denoted t 3. We predicted that even if observers use a non-robust approach when estimating some property of the Gaussian cluster, they would nevertheless employ a robust approach when estimating some property of t 3, because it has longer tails than the Gaussian. The need for robust methods to protect one's estimate from outliers is more acute when there are very extreme dots present in the data. 
Methods
Observers
Seven observers at New York University participated in the experiment. All were naive as to the purpose of the experiment and were paid for their participation. 
Apparatus
Stimuli were displayed on a 34 cm by 27 cm LCD monitor (Dell 1708FP) with a 60-MHz refresh rate. Observers viewed the stimuli from a distance of 144 cm, and we used a chin rest to maintain the viewing distance. Thus, the screen subtended 13.5 by 10.7 degrees of visual angle. We restricted the display region to an invisible square (subtending 10.3° by 10.3°) to ensure that each stimulus was isotropic and not horizontally elongated. The experiment was programmed and run using MATLAB and the Psychtoolbox libraries (Brainard, 1997; Pelli, 1997). Responses were recorded using a keyboard. 
Stimuli
Each Gaussian stimulus consisted of a single isotropic, bivariate Gaussian cluster (σ = 0.89°) containing 100 dots, and each dot had a diameter of 2.4 minutes of arc. The x and y coordinates of the t 3 stimuli were independent, both drawn from a t-distribution with 3 degrees of freedom. We scaled the t 3 stimuli so that, on average, 90% of the dots appeared within the same region as 90% of the dots of the Gaussian stimuli. This ensured that the two types of stimuli would look roughly similar. Figure 1 shows an example of the equated stimuli. 
An important concern was the possibility that dots drawn would fall beyond the edges of the display region. For draws from the Gaussian distribution, dots were unlikely to fall outside the display region over the course of the experiment. However, draws from the t 3 distribution would often include a dot that was not displayed. In all of our analyses, we used only the dots that were visible to the observer. 
Design
We used the method of constant stimuli. On each trial, we shifted the cluster horizontally to one of seven locations with respect to two collinear, vertical reference lines placed above and below the stimulus. The seven horizontal locations were: −0.378°, −0.252°, −0.126°, 0°, 0.126°, 0.252°, 0.378°. In the 0° condition, the mean of the underlying distribution (be it Gaussian or t 3) was aligned with the reference lines. Figure 2 shows the effect of shifting the cluster to the left and to the right of the reference lines.
Figure 2
 
Experiment 1—Experimental task. Observers judged whether the center of the cluster was to the left or to the right of the invisible line marked by two vertical line segments shown in blue. The center of the underlying distribution of the stimulus was shifted horizontally from trial to trial as shown (three examples). Using a method of constant stimuli design, we measured how this shift affected the probability of responding “Right.” We show the same random cluster in the three examples for illustrative purpose only. In the actual experiment, the stimulus was regenerated on each trial by drawing random coordinates from the appropriate distribution.
Figure 2
 
Experiment 1—Experimental task. Observers judged whether the center of the cluster was to the left or to the right of the invisible line marked by two vertical line segments shown in blue. The center of the underlying distribution of the stimulus was shifted horizontally from trial to trial as shown (three examples). Using a method of constant stimuli design, we measured how this shift affected the probability of responding “Right.” We show the same random cluster in the three examples for illustrative purpose only. In the actual experiment, the stimulus was regenerated on each trial by drawing random coordinates from the appropriate distribution.
 
There were two stimulus types and seven shift locations resulting in 14 conditions. Each condition was repeated 200 times resulting in 2800 total trials. We divided the experimental session into four blocks of 700 trials each. In two blocks, we showed observers the Gaussian clusters, and in the other two blocks, we showed them the t 3 clusters. Some observers were shown the Gaussian blocks first, while other observers were shown the t 3 blocks first. The observers were not informed of these details but rather told that the experiment was divided into four blocks to allow for easy breaks. The entire experiment, which comprised a short practice block followed by four experimental blocks, lasted approximately 1 h and 20 min. 
Procedure
The task was two-alternative forced choice. On each trial, observers responded whether the cluster's center was to the left or to the right of the reference lines. The instructions were given as follows: “Press the ‘z’ key with your left hand if you think that the center of the cluster is to the left of the reference lines, and press the ‘?’ key with your right hand if you think that the center of the cluster is to the right of the reference lines.” 
The reference lines, which were located at the horizontal center of the screen, were displayed continuously throughout the experiment (except during the breaks between blocks). On each trial, we displayed the stimulus for 250 ms at one of the seven shift locations. Observers were allowed to respond as soon as the stimulus appeared, but the stimulus remained on the screen for 250 ms even when the observers responded while the stimulus was still on the screen. The next trial started 750 ms after the observer's response (or 750 ms after the stimulus disappeared, if the observer responded while the stimulus was still on the screen). Figure 3 shows a schematic of the trial sequence.
Figure 3
 
Trial sequence for all three experiments. The experimental stimulus was presented only briefly (250 ms). The next trial did not begin until the observer responded “Left” or “Right.” The inter-stimulus interval was 750 ms.
Figure 3
 
Trial sequence for all three experiments. The experimental stimulus was presented only briefly (250 ms). The next trial did not begin until the observer responded “Left” or “Right.” The inter-stimulus interval was 750 ms.
 
Analysis
Generalized linear model (GLM)
We used a Generalized Linear Model (McCullagh & Nelder, 1989) to estimate the influence of the dots on observers' PSE as a function of the dots' horizontal eccentricity from the population mean. These estimates could be compared to the influences predicted by a center-of-gravity (COG) model. If observers use an estimator that is outlier robust, then, with increasing dot eccentricity, we would expect to see a decreasing influence of the dot on PSE compared to the influence predicted by the COG model. Figure 4 compares a hypothetical robust model to the simple COG model.
Figure 4
 
Influence. The solid blue line is the influence that a single dot has on the center-of-gravity (COG) estimator as a function of the dot's horizontal displacement from the center of the cluster. The absolute magnitude of the influence of dots increases without bounds with increasing distance from the center of the cluster, indicative of the large influence that outliers have on the COG estimator. The dashed red line is the influence that a single dot has on a hypothetical robust estimator. Characteristically, the influence of dots is bounded in absolute magnitude and decreases to zero with increasing distance from the center.
Figure 4
 
Influence. The solid blue line is the influence that a single dot has on the center-of-gravity (COG) estimator as a function of the dot's horizontal displacement from the center of the cluster. The absolute magnitude of the influence of dots increases without bounds with increasing distance from the center of the cluster, indicative of the large influence that outliers have on the COG estimator. The dashed red line is the influence that a single dot has on a hypothetical robust estimator. Characteristically, the influence of dots is bounded in absolute magnitude and decreases to zero with increasing distance from the center.
 
We used the GLM to measure the influences of the dots grouped into bins. The GLM analysis treated every single dot within a bin as if it had the same influence on PSE regardless of the dot's precise location within the bin. While our model is limited in resolution, by fitting enough bins we will have a large enough influence pattern to test whether observers are employing a robust strategy. Since we apply the same binning procedure in computing the influence assigned by the COG ideal observer, any effect of binning would affect data and prediction identically and therefore a comparison of actual performance and ideal would not be affected by binning. 
As shown in the four examples in Figure 5, the number of dots in each bin changes naturally across trials. The GLM analysis uses the natural sampling variation of the stimuli from trial to trial to estimate the influences of the dots on PSE. It is similar in spirit to classification image approaches (Ahumada, 2002; Ahumada & Lovell, 1971; see Knoblauch & Maloney, 2008). The information that the GLM uses from each trial is the following: the observer's response (Right or Left), the horizontally shifted location of the stimulus s, and the number of dots in each bin, which we call the bin counts B 1, B 2B n . We denote influence per dot in each bin by γ 1, γ 2γ n . The GLM links an observer's response profile to a Gaussian psychometric function (Φ) using the following equation, where t denotes the trial number: 
P [ R i g h t | t ] = Φ μ , σ ( s t + γ 1 B 1 t + γ 2 B 2 t + + γ n B n t ) .
(1)
The fitting routine returns estimates for the following parameters: μ, which is the observer's overall bias; σ, which is the slope of the psychometric function; and γ 1 through γ n , which are the influences per dot on observer's PSE for each bin.
Figure 5
 
Random variation in bin count. We used the natural variation in the number of dots in each bin across different stimuli to estimate the influence of any one dot in that bin on observers' PSE. The number of dots in bin B 2, for example, varies from twelve dots in stimulus (A), to eight dots in stimulus (B), to sixteen dots in stimulus (C), to eighteen dots in stimulus (D).
Figure 5
 
Random variation in bin count. We used the natural variation in the number of dots in each bin across different stimuli to estimate the influence of any one dot in that bin on observers' PSE. The number of dots in bin B 2, for example, varies from twelve dots in stimulus (A), to eight dots in stimulus (B), to sixteen dots in stimulus (C), to eighteen dots in stimulus (D).
 
Left–right symmetry
Preliminary analyses on pilot data led us to surmise that the observers' influence functions were odd-symmetric about 0. If so, we could get more accurate estimates of influence by forcing the GLM to return odd-symmetric influence functions. Accordingly, we first tested each observer's data using a nested hypothesis test (Mood, Graybill, & Boes, 1974, p. 440 ff.), to see if we could reject the null hypothesis that the influence functions were odd-symmetric. We did this for different choices of number of bins and we were not able to reject the null hypothesis in any of them (lowest p = 0.217). Based on this conclusion, we used GLM to fit influence functions that were odd-symmetric about 0. We report only the parameters to the right of 0; those to the left are identical in absolute value and opposite in sign. 
Results
We fitted the Gaussian data and the t 3 data separately and thus there were 1400 trials per data set (for each observer). We used a maximum likelihood procedure to find the best fits of the parameters, and we calculated the 95% confidence intervals of each parameter using bootstrap methods (Efron & Tibshirani, 1993). 
The results of interest are the fits for the binned influences per dot on PSE for each of the two stimulus types. Figure 6 shows the fits for each observer for each of the two stimulus types with bootstrapped 95% CI. The blue dashed diagonals connect the fits for the binned influences per dot on PSE for the simple center-of-gravity (COG) model. The figure also shows the mean influence of each bin across all observers (with 95% CI), for each of the two stimulus types.
Figure 6
 
Experiment 1—Results. We estimated the influence of a dot in each of eight bins symmetric about the center of the cluster. We collapsed data from pairs of bins symmetric around the center of the cluster (B −4 with B 4, B −3 with B 3, etc.) and plot the resulting influence-per-dot estimates for just the bins on the right of the cluster (labeled 1, 2, 3, and 4). The influence estimates for the bins on the left are identical in absolute magnitude but opposite in sign. The blue dashed diagonals connect the fits for the binned influences per dot on PSE for the simple center-of-gravity (COG) model. The error bars on the individual estimates mark a 95% confidence interval, computed using a bootstrap method (Efron & Tibshirani, 1993). The error bars on the means of each bin across all observers mark a 95% CI of the mean. (A) Results for seven observers for the Gaussian stimuli. The respective means of each bin across all observers is shown in the bottom right plot. For most observers and bins, the measured influence is not significantly different from that of the COG model. Only one of the seven observers (O2) shows a robust pattern and only for the outermost bin, while two observers (O5, O7) show an “anti-robust” pattern for the outermost bin. (B) Results for the same seven observers for the t 3 stimuli. The respective means of each bin across all observers is shown in the bottom right plot. For most observers and bins, the measured influence is not significantly different from that of the COG model. Only two of the seven observers (O2, O6) show a robust pattern, and only for the outermost bin.
Figure 6
 
Experiment 1—Results. We estimated the influence of a dot in each of eight bins symmetric about the center of the cluster. We collapsed data from pairs of bins symmetric around the center of the cluster (B −4 with B 4, B −3 with B 3, etc.) and plot the resulting influence-per-dot estimates for just the bins on the right of the cluster (labeled 1, 2, 3, and 4). The influence estimates for the bins on the left are identical in absolute magnitude but opposite in sign. The blue dashed diagonals connect the fits for the binned influences per dot on PSE for the simple center-of-gravity (COG) model. The error bars on the individual estimates mark a 95% confidence interval, computed using a bootstrap method (Efron & Tibshirani, 1993). The error bars on the means of each bin across all observers mark a 95% CI of the mean. (A) Results for seven observers for the Gaussian stimuli. The respective means of each bin across all observers is shown in the bottom right plot. For most observers and bins, the measured influence is not significantly different from that of the COG model. Only one of the seven observers (O2) shows a robust pattern and only for the outermost bin, while two observers (O5, O7) show an “anti-robust” pattern for the outermost bin. (B) Results for the same seven observers for the t 3 stimuli. The respective means of each bin across all observers is shown in the bottom right plot. For most observers and bins, the measured influence is not significantly different from that of the COG model. Only two of the seven observers (O2, O6) show a robust pattern, and only for the outermost bin.
 
In the results presented here, we divided the stimulus space into eight bins (four to the right of 0 and four to the left of 0), which, as noted above, means that we only fit and display four influence-per-dot parameters because of the symmetry assumption. As for the length of each bin, remember that we scaled the t 3 stimuli so that, on average, 90% of the dots appeared within the same region as 90% of the dots of the Gaussian stimuli. In the results presented here, we decided to divide the region that covers 90% of the dots into six bins and used two more bins, one on each side, to cover the remaining 10% of the dots. Thus, the length of each of the inner bins (labeled 1, 2, and 3), for both the Gaussian and the t 3 analyses, was 0.49°. There were six of these, three to the right of 0 (which are shown in the graph) and three to the left of 0 (which are not shown in the graph because they are defined symmetrically), and so the horizontal region covered by these bins was 2.9°. The length of the outermost bin (labeled 4) was 3.7°, extending from the end of bin 3 until the end of the display region, with the outermost bin on the left being defined symmetrically. 
No difference was evident between observers who saw the Gaussian stimuli first (O1, O3, O4, O6, O7) and observers who saw the t 3 stimuli first (O2, O5). We ran two other observers who saw the t 3 stimuli first, but their precision (measured as the slope of the psychometric function) was much lower than the precision of the observers reported here. The fits for the influences-per-dot parameters for these two observers were similar to those of the seven observers whose results are shown, but with markedly larger confidence intervals. Hence, we omit them from the graphical summary of data. 
Most observers' fits for the influence-per-dot parameters were not significantly different from that predicted by the COG model for either of the two stimulus types (p > 0.05 based on 95% bootstrapped CI). Only one observer (O2) had significantly less influence per dot for the outermost bin than that predicted by COG for both stimulus types (p < 0.05 based on 95% bootstrapped CI). One other observer (O6) had less influence per dot for the outermost bin than that predicted by COG only for the t 3 condition and not for the Gaussian condition. 
An interesting result that we did not expect was the over-weighting of the outermost bin for the Gaussian stimuli by two observers (O5 and O7). One possibility is that these observers might be basing their estimate in part on the convex hull of the cluster (the smallest convex region containing all the dots of the cluster). Such a strategy could yield very high influences for the outermost bins when fitting the data with our GLM. (It has previously been shown that saccades to highly non-uniform dot clusters tend to land at the COG of the shape implied by the dot cluster, rather than the COG of the dots themselves; see Melcher & Kowler, 1999.) These observers were evidently not using an outlier-robust method in estimating the center. 
A second interesting result that we did not expect was the down-weighting of the innermost dots (bin 1) for many of the observers. We suspect that this is due to the fact that the cluster is very dense in the innermost bins. Hence, a decrease in influence per dot could occur due to crowding (McGowan et al., 1998). 
Stimulus discriminability
When asked after the experiment, all observers reported that they did notice that the stimuli in Experiment 1 changed between the second and third blocks (which is when the stimuli switched from Gaussian to t 3 or vice versa). Despite these reports, we considered the possibility that observers used the same rule of combination for the Gaussian and t 3 distributions because they could not readily discriminate samples from the two distributions. 
We ran four naive observers in a control experiment. We told them that they would see a series of samples of dot clusters of two different types and that they should attempt to identify the two types and classify them by pressing one of two keys on the computer keyboard. No further instructions or training were given. 
We presented 210 examples (105 Gaussian samples and 105 t 3 samples, interleaved at random). The distributions were presented just as they were in Experiment 1. We performed a signal detection analysis (Green & Swets, 1966/1974) on the resulting data, arbitrarily declaring that one keypress response counted as a “Gaussian” response, the other a “t 3” response. 
We estimated the sensitivity parameter d′ separately for each observer based on each observer's responses. The resulting d′ could be positive or negative, depending on whether the observer chose the same assignment of distributions to keys as we did. We report the absolute values of the estimated d′ values. The observers' d′ values ranged between 1.5 and 2.2 in absolute magnitude, indicating that observers could readily differentiate samples from the two distributions used in Experiment 1 even when they were interleaved. 
Discussion
This experiment investigated whether the visual system employs estimators that are outlier robust in estimating the center of simple, uncontaminated dot clusters drawn from a single distribution. We found that only one observer deviated from the minimum variance COG rule for the Gaussian, and only that observer and a second observer showed any evidence of robustness for the t 3 distribution. We also found that there was no significant difference between the Gaussian condition and the t 3 condition across observers. Regardless of whether or not the stimuli had extreme dots, most observers' fits for the binned influences per dot on PSE were not significantly different from that predicted by the simple COG model. 
One possible explanation for observers' lack of robustness when dealing with the t 3 distribution in Experiment 1 is that observers cannot weight a cluster's dots differentially. In Experiment 2, we explicitly asked observers to assign differential weights to the dots, and the results indicate that they are fully capable of doing so. Hence, we can rule out the possibility that observers' lack of robustness in Experiment 1 was due to some sort of inability to assign differential weights to the dots in a cluster. 
In the Introduction section, we broached the idea that robustness could be construed as source separation. So far we have only found observers consistently employing robust methods when the stimulus has evident spatial part structure (Cohen & Singh, 2006; Cohen et al., 2008; Denisova, Singh, & Kowler, 2006). The results of Experiment 1 do not conflict with this hypothesis since the stimuli had no evident spatial part structure. Observers had no reason to expect that the visible dots were generated by more than one process. However, what if we showed them how a complex stimulus was drawn from more than one source? Would they then employ a robust method based on source separation even in the absence of evident spatial part structure? 
Experiment 2
Experiment 1 revealed that most observers do not employ visual estimators that are outlier robust when estimating the center of simple, uncontaminated dot clusters drawn from a single distribution. In Experiment 2, we tested whether observers employ a source-robust rule when estimating the center of a cluster that is drawn from a mixture of two distributions: a bivariate Gaussian distribution and a bivariate Uniform distribution. The observer's task was to estimate the center of the dots drawn from the Gaussian source and ignore the dots from the Uniform source. 
Each stimulus had 140 dots total. One hundred of them were drawn from a Gaussian distribution, identical to the one we used in Experiment 1, and the remaining 40 were drawn from a Uniform distribution that covered the entire display. Observers were shown during training how the stimuli were drawn from two different sources, and they were tasked with estimating the center of the Gaussian cluster. 
We predicted that observers would employ a source-robust rule in down-weighting the influence of dots as a function of their increased eccentricity. The motivation for this down-weighting is that with increased eccentricity there is less of a chance that the dots arose from the Gaussian distribution. For the analysis, we used the same GLM and fitting procedure to estimate the binned influences per dot on PSE at different eccentricities. We compare human performance to that of a maximum likelihood model of an ideal source-robust estimator described below. 
Methods
Observers
Eight paid observers at New York University participated in the experiment. All were naive as to the purpose of the experiment and had not participated in the first experiment. 
Apparatus
The apparatus was the same as in the first experiment. However, the display region included the entire screen and not just a square region centered on the screen as in Experiment 1
Stimulus
Each 140-dot complex stimulus consisted of a 100-dot isotropic, bivariate Gaussian cluster (σ = 0.89°) and a 40-dot noise cluster that was uniformly distributed across the screen (13.5° by 10.7°). Every dot in the display had a diameter of 2.4 minutes of arc just as in the first experiment. Figure 7 shows a schematic of the stimulus.
Figure 7
 
Experiment 2—Construction of the stimuli as a mixture of sources. On each trial, we generated 100 dots from a bivariate Gaussian distribution and 40 (“noise”) dots from a Uniform distribution and superimposed them.
Figure 7
 
Experiment 2—Construction of the stimuli as a mixture of sources. On each trial, we generated 100 dots from a bivariate Gaussian distribution and 40 (“noise”) dots from a Uniform distribution and superimposed them.
 
Design
We used the method of constant stimuli, randomly shifting the 100-dot Gaussian cluster to one of seven locations with respect to two collinear, vertical reference lines placed at the horizontal center of the screen. The seven locations were: −0.378°, −0.252°, −0.126°, 0°, 0.126°, 0.252°, 0.378°. In the 0° condition, the mean of the underlying Gaussian distribution was aligned with the reference lines. 
The experimental session was divided into five blocks to allow for easy breaks. There was only one stimulus type (the t 3 distribution was not included in this experiment) and seven shift locations resulting in 7 conditions. Each condition was repeated 200 times resulting in 1400 total trials. The entire experiment lasted approximately 45 min. 
Training
At the start of the experiment, observers learned how each stimulus was drawn from two different sources. The 100-dot Gaussian cluster was colored red and the 40 noise dots were colored green. The 100-dot Gaussian cluster appeared randomly at one of the seven shift locations. During training, the stimulus remained on the screen until the observer responded. At the start of the first practice trial, observers were told that there were two clusters on the screen: “the stimulus you are seeing is a combination of two different clusters which we will call ‘cluster-1,’ colored in red, and ‘cluster-2,’ colored in green” (see Figure 8A). The observers were then asked to respond whether the center of cluster 1, the red one, was to the left (“z” key) or to the right (“?” key) of the reference lines. Observers did this for 10 to 15 trials while the experimenter asked if they had any trouble doing the task. All reported that it was straightforward.
Figure 8
 
Experiment 2—Training and experimental task. (A) During training, observers judged the location of the center of the red (Gaussian) cluster. They were told to ignore the green (noise) dots. (B) In the actual experiment, observers continued to judge the location of the center of the Gaussian cluster, but now the noise dots were also red. They were instructed to ignore the noise dots.
Figure 8
 
Experiment 2—Training and experimental task. (A) During training, observers judged the location of the center of the red (Gaussian) cluster. They were told to ignore the green (noise) dots. (B) In the actual experiment, observers continued to judge the location of the center of the Gaussian cluster, but now the noise dots were also red. They were instructed to ignore the noise dots.
 
The experimenter then showed a typical stimulus to the observer and switched the color of the noise dots back and forth between green and red to emphasize that the stimulus was comprised of the same two kinds of clusters (cluster 1 and cluster 2) that they were seeing before, but that now, and during the actual experiment, both clusters would have the same color (see Figure 8B). The observers were told that the task remained the same: “respond whether the center of cluster 1 is to the left or to the right of the reference lines.” After a few trials, all observers reported that they understood the task, that it was difficult, but that they did not have any trouble making a response. The observers were then allowed to practice for a few more trials. Finally, observers were shown the stimuli again under color-coding for a few trials and they were told to pay attention to the two distributions. They were again informed that during the actual experiment all the dots would be colored in red. 
Procedure
This was the same as in the first experiment, except that observers viewed a complex stimulus rather than a simple stimulus and thus the instructions, given during training (as described above), were slightly different. A 2AFC task was employed whereby observers responded whether the center of the 100-dot Gaussian cluster was to the left or to the right of the reference lines. The stimulus also contained 40 noise dots (uniformly distributed throughout the display) that the observers were supposed to ignore. The reference lines, which were located at the horizontal center of the screen, were displayed continuously throughout the experiment (except during the breaks between blocks). On each trial, we placed the 100-dot Gaussian cluster at one of the seven shift locations with respect to the reference lines, and the entire 140-dot stimulus remained on the screen for 250 ms. Observers were allowed to respond as soon as the stimulus appeared; but the stimulus remained on the screen for 250 ms even when the observers responded while the stimulus was still on the screen. The next trial started 750 ms after the observer's response (or 750 ms after the stimulus disappeared, if the observer responded while the stimulus was still on the screen). 
Analysis
We used the same GLM as in Experiment 1 to get the fits for the binned influences per dot on PSE. The bin count for each trial is the total number of dots in a bin irrespective of whether the dot belongs to the 100-dot Gaussian cluster or to the 40 noise dots. Importantly, this means that the expected ratio of Gaussian dots to noise dots within a bin decreases with increased bin eccentricity. We compared the fits returned from the observers' data to the fits predicted by three different models: the ignorant model that makes its response based on the COG of all 140 dots, the omniscient model that makes its response based on the COG of just the 100-dot Gaussian cluster, and the maximum likelihood model developed below. 
The influence for the ignorant model (plotted in Figure 9 as a blue dashed diagonal) increases linearly with increased eccentricity (just as in Experiment 1). However, the initial increase in influence for the omniscient model (plotted in Figure 9 as a black dashed curve) tapers off with increased eccentricity and eventually goes to zero for very high eccentricities. This is exactly what we expect because with increased bin eccentricity there is a decrease in the ratio of Gaussian dots to noise dots, and with very high eccentricities, there are unlikely to be any Gaussian dots within the bin. Hence, as the probability of Gaussian dots in the bin decreases, the influence per dot for a source-robust estimator should decrease as well.
Figure 9
 
Experiment 2—Results. We estimated the mean influence across all observers in each of eighteen bins symmetric about the center of the cluster. As in Experiment 1, we collapsed data from pairs of bins symmetric around the center of the cluster and plot the resulting influence per dot on PSE estimates for just the bins on the right of the cluster (labeled 1, 2, …, 9). These estimates are shown in red together with the 95% confidence interval of the mean. The estimates for the bins on the left are identical in absolute magnitude but opposite in sign. The blue dashed diagonal is the predicted influence function of the “ignorant” observer who computes COG indiscriminately using all 140 dots. The black dashed curve is the predicted influence function of the “omniscient” observer who excludes all the noise dots and computes COG using only the dots in the Gaussian cluster. The red dashed curve is the predicted influence function of the maximum likelihood (ML) observer for n = 100, and the red solid curve is the predicted influence function for n = 108 (see text).
Figure 9
 
Experiment 2—Results. We estimated the mean influence across all observers in each of eighteen bins symmetric about the center of the cluster. As in Experiment 1, we collapsed data from pairs of bins symmetric around the center of the cluster and plot the resulting influence per dot on PSE estimates for just the bins on the right of the cluster (labeled 1, 2, …, 9). These estimates are shown in red together with the 95% confidence interval of the mean. The estimates for the bins on the left are identical in absolute magnitude but opposite in sign. The blue dashed diagonal is the predicted influence function of the “ignorant” observer who computes COG indiscriminately using all 140 dots. The black dashed curve is the predicted influence function of the “omniscient” observer who excludes all the noise dots and computes COG using only the dots in the Gaussian cluster. The red dashed curve is the predicted influence function of the maximum likelihood (ML) observer for n = 100, and the red solid curve is the predicted influence function for n = 108 (see text).
 
If observers treated all dots equally, then their fits should not be significantly different from the fits predicted by the ignorant model. If, on the other hand, they were perfectly able to discriminate the two sources, as if the stimuli were presented with the color-coding from the training session, then their fits should not be significantly different from the fits predicted by the omniscient model. If they employ a source-robust rule by down-weighting dots as a function of eccentricity simply because of the decreased chances of the dot belonging to the Gaussian cluster, then their fits should lie between the two ideal models. 
ML model
We derived a maximum likelihood (ML) estimator for the visual estimation task (Duda, Hart, & Stork, 2000, Chap. 2). The stimulus consists of 140 dots p 1, …, p n , where p i = (p i x , p i y ) are the horizontal and vertical coordinates of a dot. Exactly 100 of these dots are drawn from an isotropic Gaussian distribution φ(p; μ, σ 2) with unknown mean μ = (μ x , μ y ) and known variance σ 2, and the remaining 40 dots are drawn from a Uniform distribution whose probability density function is U(x, y) = C, a constant. Let π: {1, …, 100}
1 1
{1, …, 140} be a 1–1 function that maps the integers from 1 to 100 to integers in the range 1 to 140. This function only serves to select the 100 dots that will be assigned to the Gaussian in computing likelihood. The log-likelihood 4 for any choice of free parameters μ, π is 
λ ( μ , π ) = i = 1 100 log φ ( p π [ i ] ; μ , σ 2 ) + C ,
(2)
where C is the constant derived from the probability density function of the 40 dots that are assigned to the Uniform instead of the Gaussian. We wish to maximize Equation 2 by choice of the free parameters μ, π. The resulting estimates
μ ^
,
π ^
are the maximum likelihood estimates. The estimate of concern to us is the horizontal coordinate of
μ ^
, denoted
μ ^
x
The maximization is made easier by the following observation. For any choice of μ, the expression in Equation 2 is maximized when the function π assigns to the Gaussian the 100 dots that are nearest to μ. Thus, we need only search on μ, computing Equation 2 for the 100 nearest dots. 
We simulated the performance of this ML observer in Experiment 2 (using 2000 trials for each of the seven shift locations) and computed the resulting influence function. It is plotted in Figure 9 as a red dashed curve and labeled “ML, n = 100” because it assigns 100 dots to the Gaussian. 
Results and discussion
Figure 9 shows observers' mean influence per dot on PSE for each of the nine bins, together with the 95% CI of each mean. The figure also shows the predictions of the ignorant, omniscient, and ML models. The inner bins (labeled 1, 2…8) were the same length as the inner bins in the first experiment; i.e., 0.49° each. There were 16 of these (eight on each side) and so the horizontal region covered by these bins was 7.8°. Just as in Experiment 1, we were unable to reject our null hypothesis that the influence functions were odd-symmetric (lowest p = 0.287). Hence, we report only the binned influences to the right of 0; those to the left are identical in absolute value and opposite in sign. The outermost bin (labeled 9) was 2.85° long, extending from the end of bin 8 until the edge of the screen, with the outermost bin on the left being defined symmetrically. Remember that there were dots present throughout the screen, and thus it is easier to see the influence pattern if we fit more eccentricities than the small number of eccentricities that we fit in Experiment 1
As shown in Figure 9, the mean of the observers' fits lies between the ignorant and the omniscient predictions. Observers did assign a reduced influence to points in the outermost bins, suggesting that observers do employ a source-robust rule. 
It is apparent from Figure 9, however, that the ML model developed above—which assigns 100 dots to the Gaussian cluster, and the remaining 40 dots to the uniform noise—does not capture the estimated influences exhibited by the observers. 
A natural possibility we considered is that the restriction of assigning exactly 100 dots to the Gaussian cluster was too severe. After all, the observers likely have only a rough estimate of the number of dots in the Gaussian cluster. Therefore, we generalized the ML model in Equation 2 to 
λ ( μ , π ; n ) = i = 1 n log φ ( p π [ i ] ; μ , σ 2 ) + C ,
(3)
where n is the number of dots assigned to the Gaussian cluster (with the remaining 140 − n dots assigned to the uniform noise). For each value of n, ranging from 94 to 116, we repeated the simulation of the ML model's performance and compared it against the mean influence function exhibited by the observers. We found that the ML model with n = 108 best fitted the influence function derived from observers' responses. The simulated performance of this ML model is plotted in Figure 9 as a red solid curve and labeled “ML, n = 108” because it assigns 108 dots to the Gaussian. 
This analysis indicates that the ML model of an ideal source-robust estimator does a good job of capturing observers' performance; observers, however, seem to assign more dots to the Gaussian cluster than it actually contains. 
Experiment 3
In Experiment 2, we found that the visual system is source robust when estimating the center of a Gaussian dot cluster presented together with noise dots: it down-weights the influence of dots likely to have come from a second distribution. In Experiment 3, we presented observers with two-part stimuli that were similar to those used by Cohen et al. (2008). We asked observers to estimate the center of the same 100-dot Gaussian cluster that we used in Experiments 1 and 2, and we contaminated each stimulus by adding a small 15-dot cluster at one of 19 offsets from the center of the main cluster. Observers were shown during training how the stimuli were drawn from two different sources, and they were tasked with estimating the center of the Gaussian cluster. One crucial difference between this experiment and Cohen et al.’s experiment is that in this experiment the small cluster could be placed at or near the center of the larger cluster. In their experiment, however, the small cluster always fell near or outside the apparent boundary of the larger cluster. We return to this point in the discussion. 
Methods
Observers
Five paid observers at New York University participated in the experiment. All were naive as to the purpose of the experiment and had not participated in either of the first two experiments. 
Apparatus
The apparatus was the same as in Experiment 2
Stimulus
Each 115-dot complex stimulus was comprised of a 100-dot isotropic, bivariate Gaussian cluster (σ = 0.89°) and a small 15-dot contamination cluster uniformly distributed across a small square region (0.89° by 0.89°) that was placed at one of several offsets from the center of the main cluster. Every dot in the display had a diameter of 2.4 minutes of arc just as in the first two experiments. 
Design
Because the focus of this experiment was to test the effect of varying degrees of overlap between the stimulus' two clusters, we express the location of the small cluster as its distance in standard deviations away from the center of the main cluster. The main cluster (i.e., the 100-dot Gaussian) was identical to the one we used in the first two experiments (σ = 0.89°). The length of the small contamination cluster (which was drawn from a squared uniform distribution of 0.89° by 0.89°) was equal to the size of the standard deviation of the main cluster. Hence, by simply stating how many standard deviations (SDs) away the small cluster's center was from the main cluster's center, one could quickly surmise how much overlap in SD units there was between the two clusters. For example, placing the small cluster 1.5 SDs away meant that the 15-dot cluster extended from 1 SD to 2 SDs on the main cluster. When the small cluster was placed 0.5 SD away, it fell in a dense region of the main cluster. On the other hand, when the small cluster was 4 SDs away, the two clusters were readily separable. 
As shown in Figure 10, the small cluster was placed at one of 19 eccentricities from the center of the main cluster. Nine of these eccentricities were to the left of the main cluster, nine to the right, and one at the center of the main cluster (SD = 0). The eccentricities to the left were the same as the eccentricities to the right, and for the analysis and results, we collapsed the data after testing and failing to reject our null hypothesis that the influence of the small cluster on observers' PSE was odd-symmetric (lowest observer's p = 0.57). For example, we do not distinguish between offsetting the small cluster 2 SDs to the left and offsetting it 2 SDs to the right. Hence, for simplicity, we describe the eccentricities of only 10 offsets keeping in mind that each one (save for SD = 0) corresponds to both an offset to the left and an offset to the right of the main cluster. The 10 eccentricities (in SD units) were: 0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5.
Figure 10
 
Experiment 3—Construction of the complex stimuli. Each stimulus consisted of a large Gaussian cluster presented together with a small contamination cluster. The large circle marks the approximate area of the large 100-dot Gaussian cluster. The horizontally aligned black squares mark the centers of the 19 locations where the small 15-dot contamination cluster could be placed. On each trial, the contamination cluster was placed at only one of the 19 eccentricities. In this figure, the contamination cluster is offset 2.5 SDs away from the center of the large cluster. The dashed square surrounding the contamination cluster marks the boundaries of the small Uniform distribution from which the contamination cluster was drawn at that offset.
Figure 10
 
Experiment 3—Construction of the complex stimuli. Each stimulus consisted of a large Gaussian cluster presented together with a small contamination cluster. The large circle marks the approximate area of the large 100-dot Gaussian cluster. The horizontally aligned black squares mark the centers of the 19 locations where the small 15-dot contamination cluster could be placed. On each trial, the contamination cluster was placed at only one of the 19 eccentricities. In this figure, the contamination cluster is offset 2.5 SDs away from the center of the large cluster. The dashed square surrounding the contamination cluster marks the boundaries of the small Uniform distribution from which the contamination cluster was drawn at that offset.
 
We used the method of constant stimuli by shifting each stimulus type to one of 13 locations with respect to two collinear, vertical reference lines that were placed above and below the stimulus. The 13 locations were: − 0.942°, −0.785°, −0.628°, −0.471°, −0.314°, −0.157°, 0°, 0.157°, 0.314°, 0.471°, 0.628°, 0.785°, 0.942°. In the 0° condition, the mean of the underlying Gaussian distribution was aligned with the reference lines. 
The experimental session was divided into 10 blocks to allow for easy breaks. There were 10 stimulus types (i.e., the 10 offsets of the small cluster with respect to the center of the main cluster) and 13 shift locations (i.e., the 13 shifts of the entire stimulus with respect to the reference lines) resulting in 130 conditions. Each condition was repeated 20 times resulting in 2600 total trials. The entire experiment lasted approximately 1 h and 20 min. 
Training
Training was done in the same way as the training in Experiment 2, except that for this experiment the observers were tasked with ignoring the small color-coded contamination cluster. Figure 11 shows an example of a stimulus with and without color-coding.
Figure 11
 
Experiment 3—Training and experimental task. (A) During training, observers judged the location of the center of the large, red (Gaussian) cluster. They were told to ignore the small, green (contamination) cluster. (B) In the actual experiment, observers continued to judge the location of the center of the large Gaussian cluster, but now the contamination dots were also red. They were instructed to ignore the contamination dots.
Figure 11
 
Experiment 3—Training and experimental task. (A) During training, observers judged the location of the center of the large, red (Gaussian) cluster. They were told to ignore the small, green (contamination) cluster. (B) In the actual experiment, observers continued to judge the location of the center of the large Gaussian cluster, but now the contamination dots were also red. They were instructed to ignore the contamination dots.
 
Procedure
This was the same as in the second experiment. A 2AFC task was employed whereby observers responded whether the center of the 100-dot Gaussian cluster was to the left or to the right of the reference lines. The stimulus also contained a 15-dot contamination cluster that the observers were supposed to ignore. The reference lines, which were located at the horizontal center of the screen, were displayed continuously throughout the experiment (except during the breaks between blocks). On each trial, we placed the 100-dot Gaussian cluster at one of the 13 shift locations with respect to the reference lines, and the 15-dot contamination cluster at one of 10 eccentricities from the center of the main cluster (see Design section for details). The entire 115-dot stimulus remained on the screen for 250 ms. Observers were allowed to respond as soon as the stimulus appeared, but the stimulus remained on the screen for 250 ms even when the observers responded while the stimulus was still on the screen. The next trial started 750 ms after the observer's response (or 750 ms after the stimulus disappeared, if the observer responded while the stimulus was still on the screen). 
Analysis
Each of the 10 stimulus types was analyzed separately to compute each observer's PSE at each of the 10 eccentricities. To that end, we recorded the percentage of trials that the observer responded “Right” for each of the 13 shift locations and fit this data to a Gaussian psychometric function using a maximum likelihood procedure. 
We used each observer's PSE at SD = 0 as a measure of the observer's overall bias. We then compared the observer's PSEs at the other nine eccentricities (i.e., SD ≥ 0.5) to the observer's overall bias to measure the influence of the small cluster on observer's PSE at each eccentricity. Figure 12 shows the mean influence of the small cluster on observers' PSE as a function of its eccentricity from the center of the main cluster. These means are plotted in red together with the 95% CI of the mean. They were compared to the influence function predicted by an ignorant model (blue dashed diagonal) and the influence function predicted by an omniscient model (black dashed horizontal) as in Experiment 2.
Figure 12
 
Experiment 3—Results. We estimated the mean influence of the small, contamination cluster on observers' PSE as a function of its distance from the center of the Gaussian cluster. These estimates are shown in red together with the 95% confidence interval of the mean. The blue dashed diagonal is the predicted influence function of the “ignorant” observer who computes COG indiscriminately using all 115 dots. The black dashed horizontal is the predicted influence function of the “omniscient” observer who excludes all the contamination dots and computes COG using only the dots in the Gaussian cluster. The red dashed horizontal is the predicted influence function of the maximum likelihood (ML) observer (see text).
Figure 12
 
Experiment 3—Results. We estimated the mean influence of the small, contamination cluster on observers' PSE as a function of its distance from the center of the Gaussian cluster. These estimates are shown in red together with the 95% confidence interval of the mean. The blue dashed diagonal is the predicted influence function of the “ignorant” observer who computes COG indiscriminately using all 115 dots. The black dashed horizontal is the predicted influence function of the “omniscient” observer who excludes all the contamination dots and computes COG using only the dots in the Gaussian cluster. The red dashed horizontal is the predicted influence function of the maximum likelihood (ML) observer (see text).
 
ML model
We derive a maximum likelihood (ML) estimator for the visual estimation task in Experiment 3. The stimulus consists of 115 dots p 1, …, p n (notation as in Experiment 2). Exactly 100 of these dots are drawn from an isotropic Gaussian distribution φ(p; μ, σ 2) with unknown mean μ and known variance σ 2, and the remaining 15 dots are drawn from a Uniform distribution within a small square region with known dimensions whose probability density function is U(x, y) = C, a constant. However, the center of the small square region, denoted α, is unknown. To simplify the simulation of the ML estimator, we assumed that observers had no trouble identifying that the small contamination cluster was only shifted horizontally, not vertically, with respect to the large Gaussian cluster (as was made evident to the observers during training). Hence, the ML estimator knows that the vertical coordinates of μ = (μ x , μ y ) and α = (α x , α y ) are μ y = 0 and α y = 0, respectively, and so only the horizontal coordinates μ x and α x are unknown to the model. 
Let n(α x ) denote the number of dots that lie inside the small square region for any choice of α x , with 115 − n(α x ) being the number of dots that lie outside the square region. The ML estimator must assign exactly 100 dots to the Gaussian cluster and exactly 15 dots to the contamination cluster. Furthermore, it is not allowed to assign to the contamination cluster any dots that lie outside the square region. Hence, we rule out from the solution space all locations α x where n(α x ) < 15 (by assigning a likelihood of 0, i.e., a log-likelihood of −∞). 
For all locations α x where n(α x ) = 15, the ML estimator assigns the 15 dots that lie inside the square region to the contamination cluster and the 100 dots that lie outside the square region to the Gaussian cluster. For all locations α x where n(α x ) > 15, the ML estimator must assign some of the dots (n(α x ) − 15 to be precise) that lie inside the square region to the Gaussian cluster, because there are less than 100 dots lying outside the square region. In the ML model derived for Experiment 2, we noted that the likelihood is maximized by assigning to the Gaussian cluster the dots that are nearest to μ x . So, from all the dots lying inside the square region, the ML estimator assigns to the Gaussian the n(α x ) − 15 dots that are nearest to μ x so that, together with the dots lying outside the square region, there are exactly 100 dots assigned to the Gaussian. This assignment maximizes likelihood for any given choice of μ x and α x
Let π: {1, …, 100}
1 1
{1, …, 115} be a 1–1 function that maps the integers from 1 to 100 to the integers in the range 1 to 115. This function only serves to select the 100 dots that will be assigned to the Gaussian in computing likelihood. The log-likelihood for any choice of free parameters α x , μ x is 
λ ( α x , μ x ) = i = 1 100 log φ ( p π [ i ] ; ( μ x , 0 ) , σ 2 ) + C ,
(4)
where C is the constant derived from the probability density function of the 15 dots that are assigned to the contamination cluster instead of the Gaussian. We wish to maximize Equation 4 by choice of free parameters α x , μ x . The resulting estimates
α ^
x ,
μ ^
x are the maximum likelihood estimates. The estimate of concern to us is
μ ^
x
We simulated the performance of this ML observer in Experiment 3 (using 200 trials for each of the 130 conditions) and computed the resulting influence function. The influence predicted by this ML model is very close to that predicted by the omniscient model and it is plotted in Figure 12 as a red dashed horizontal. 
Results and discussion
Figure 12 allows us to compare the results of the ignorant, omniscient, and ML models, to the experimentally measured influence of the small cluster on observers' PSE as a function of eccentricity. The observers' data lie between the ignorant and omniscient predictions indicating that observers do partially discount the effect of the small contamination cluster—that is, they exhibit source robustness. When the small cluster is very far away from the main cluster (SD > = 3.5), its mean influence on observers' PSE is not significantly different from 0 and the prediction of the omniscient model. This is hardly surprising. We included these conditions to verify that the influence of the small cluster disappears if it is very far away from the main cluster. On the other hand, when the small cluster partially overlaps the main cluster (SD = 1.5 to SD = 3), its mean influence on observers' PSE is less than that predicted by the ignorant model but greater than that predicted by the omniscient and ML models. This indicates that, although observers do down-weight the influence of dots that are likely to belong to the small contamination cluster, their ability to separate sources falls short of that of the ML observer. Human observers did not separate sources so well when the small cluster was offset about 1.5 to 2.5 SDs away from the center of the Gaussian. 
When the small cluster is near the center of the Gaussian (0.5 SD and 1.0 SD), its mean influence on observers' PSE is not significantly different from 0 and the prediction of the omniscient model. The dots in the small cluster are evidently not outliers when they are so close to the center of the Gaussian, and an outlier-robust estimator would not reduce their influence. Yet human observers do so. This pattern of performance demonstrates that human observers are not simply detecting outliers and reducing their influence. They behave as we would expect a source-robust estimator to behave. 
The ML model predictions for all conditions were indistinguishable from the omniscient model. In effect, the ML model can readily locate the small cluster and remove its effect almost perfectly at every offset. In contrast, human observers have difficulty doing so for offsets from 1.5 to 2.5 SDs. To determine why, we examined examples of stimuli with the small cluster at 0.5 SD, 2.0 SDs, and 3.5 SDs. 
When the eccentricity of the small cluster is 3.5 SDs, the ML observer and the human observer can readily segment the small cluster from the large and discount its effect. The ML observer, omniscient observer, and human observer are in good agreement. When the eccentricity is 0.5 SD, the presence of the small cluster is betrayed by a marked increase in density and a concomitant decrease in dot spacing near the small cluster. The human observer can discount the effect of the small cluster almost perfectly. 
In the remaining case (SD = 2.0), the small cluster falls in a sparser region of the Gaussian. In objective terms, adding the small cluster increases local dot density well above what could be expected from the Gaussian alone. As a consequence, the ML observer has little difficulty in locating the small cluster and reducing its influence almost completely. However, we found that the small cluster was almost invisible when it is offset 2.0 SDs. We could identify whether it fell on the right or left side of the center of the Gaussian, but to the unalerted observer there was little to indicate that it was there at all. We conjecture that the observed deviations between human performance and performance of the ML observer are due to difficulties in detecting the small cluster with intermediate offsets. This masking effect deserves further study given the marked discrepancy between observed human and ML performances. 
Conclusions
There are previous studies that suggest that the visual system employs robust estimators in depth cue combination (Girshick & Banks, 2009). Our goal was to evaluate whether the visual system's estimates of the centers of dot clusters (a form of cue combination where each dot is a cue) are based on detection of outliers (outlier robust) and/or segmentation of dots by source (source robust). 
We reported three experiments. In all of them, we used a method analogous to classification images to estimate the influence of each visible dot on the observer's rule of combination. 
In Experiment 1, we sought to determine whether the rule of combination used by human observers systematically assigned less weight to dots far from the center of the sample, that is, whether the rule of combination is outlier robust. We asked observers to estimate the centers of dot clusters drawn from one of two distributions, a bivariate Gaussian and a t-distribution with 3 degrees of freedom, denoted t 3. An outlier-robust rule would assign less weight to extreme dots generated by either distribution. We found that observers did not give less weight (influence) to potential outliers for either choice of distribution. Their influence estimates were indistinguishable from those of the center-of-gravity (COG) rule. 
For the Gaussian, the COG is not outlier robust, but it is the unbiased, minimum variance estimator (see 1). We could argue that the observer, although not outlier robust, has chosen a “good” estimation rule. For the t 3, however, observers' estimates were also not outlier robust, and for this distribution, the COG has markedly higher variance than other unbiased rules of estimation such as the median. The COG is not a “good” estimation rule for the t 3, at least in the sense of minimizing variance. 
We then considered a different way of framing the problem of robust estimation, namely, as source separation. A key step in perceptual organization is to segment visual data according to source and make visual estimates concerning the properties of each. In Experiment 1, dot clusters were drawn from a single source and observers combined all available information with equal weight into an estimate of the center of the single source. What if instead we made it clear that there were two sources of visual data and asked observers to separate them, returning estimates of the center of just one of the sources? 
The kinds of stimuli and task that we used are evidently artificial, but they are standard in studies of localization (see, for example, Battaglia & Schrater, 2007; McGowan et al., 1998; Tassinari, Hudson, & Landy, 2006). Yet, while Experiment 1 (and many previous studies) only measured human ability to combine visual information from multiple sources, Experiments 2 and 3 measured human ability to combine information from multiple sources while suppressing information from other sources (cf. Cohen, Schnitzer, Gersch, Singh, & Kowler, 2007). Given the typical clutter in everyday scenes, the ability to segment sources of visual information in this way is a great advantage to the organism. 
In Experiment 2, we presented dot clusters drawn from a mixture of two distributions: a bivariate Gaussian as in Experiment 1 and a large bivariate Uniform distributed uniformly across the entire display. The construction of the stimuli as a mixture of two distributions was explained and demonstrated to observers during training before the experiment. The mixture distribution was inspired by the contamination mixtures used by Tukey (1960). 
Observers were asked to estimate the center of the Gaussian while ignoring dots from the Uniform. We found that observers now gave less weight to potential outliers that were more likely to be from the Uniform. Observers' performance in segmenting the distributions matched that of an ideal maximum likelihood (ML) observer. The overall pattern of influence resembled that of a robust estimator, what we might have expected to find in Experiment 1 at least for the t 3 distribution but did not find. 
In Experiment 3, we presented dot clusters drawn from a mixture of two distributions: a bivariate Gaussian as in Experiment 1 and a small bivariate Uniform placed on the horizontal axis bisecting the Gaussian. The resulting stimulus was a mixture of a large cluster (Gaussian) and a small cluster (Uniform) similar to the complex stimuli of Cohen et al. (2008). The construction of the stimuli as a mixture of two distributions was explained and demonstrated to observers during training before the experiment. 
The ideal ML observer for our task could almost perfectly discount the presence of the small cluster at any offset from the large cluster. Observers could match this ideal performance when (i) the small cluster was near the center of the large cluster and (ii) when the small cluster was far from the center of the large cluster. In between, however, observers failed to fully discount the small cluster, suggesting that they may have difficulty detecting or localizing it. If so, this segmentation failure deserves further investigation. 
Human ability to discount the influence of the small cluster when it was far from the center of the large cluster is consistent with outlier robustness or source robustness. However, human ability to discount the influence of the small cluster when it was embedded near the center of the large cluster indicates that observers were source robust as these dots were not outliers. 
In summary, we find little indication that the visual system uses robust estimators in estimating the centers of dot clusters that are perceived as arising from a single source (Experiment 1). However, in Experiments 2 and 3 we found that observers can segment scenes by source and discount the effect of data from one source on estimates of the properties of a second source. In Experiment 2, we obtained influence functions that looked like those for an outlier-robust estimator. Dots further from the center of the Gaussian cluster received less weight. However, we showed that human performance also mimicked that of a source-robust rule based on an ML model that classified dots by source. 
Our results indicate that, at least for estimates of centers of dot clusters, the visual system does not employ outlier-robust estimators. Instead, it segments dots by source with the recognition that contamination of the data corresponding to one source is just data belonging to some other source. We conclude that robust methods employed by the visual system are closely tied to mechanisms of perceptual segmentation. These mechanisms themselves cannot be captured by robust statistics in large part because standard robust methods rely primarily on standardized residuals. The visual system, in contrast, uses more sophisticated generative models for “objects” in performing segmentation (see Cohen et al., 2008). The robust behavior exhibited by the visual system appears to rely on the prior operation of mechanisms of perceptual segmentation. 
We have focused on separation of sources, but we can also consider a further question. Once observers have learned to interpret dot clusters as a superposition of clusters from two sources, can they ignore that interpretation? For example, if we asked them to estimate the center of the stimuli in Experiment 2 or 3, but now to take into account all the dots, not just those from Source 1, could they do so, or would they continue to down-weight those that plausibly arise from the second source? This question is worthy of future investigation. 
Appendix A
Statistical terminology
In parametric estimation, the typical goal is to estimate one or more parameters θ 1, …, θ m of a parameterized distribution with probability density function f(x; θ 1, …, θ m ) given a sample X 1, …, X n from that distribution. A familiar distribution is the Gaussian f(x; μ, σ 2) with parameters μ and σ 2. Any function of the sample T(X 1, …, X n ) is an estimator, and since it is a function of random variables, any estimator is a random variable itself. A familiar example is the mean  
T ( X 1 , , X n ) = X 1 + + X n n .
(A1)
 
Unbiased estimator
An unbiased estimator of a parameter θ i has expected value E[T(X 1, …, X n )] = θ i . Intuitively, it gets the right answer “on average.” For the Gaussian distribution, the sample mean is an unbiased estimator of the parameter μ also called the population mean. Of course, the population mean is also the population median and the population mode, and ultimately, it just marks the center of the distribution. The median of the sample is a second example of an unbiased estimator of the center of the Gaussian. 
Minimum variance unbiased estimator
Among unbiased estimators of a parameter, there may be one whose variance is less than the variance of all other estimators. For samples drawn from a Gaussian distribution, for example, the mean is the minimum variance unbiased estimator of the parameter μ
Relative efficiency
The relative efficiency of two estimators is the ratio of the variance of the second to the variance of the first (note the order). If the second has a lower variance, then the efficiency is less than 1. Efficiency is a measure of the loss incurred by using the first estimator in place of the second for a particular sample size from a particular distribution. Tukey (1960) considered the relative efficiency of the median relative to the mean when estimating the center of contaminated Gaussian distributions. Most of the dots in a sample are drawn from a Gaussian with unknown mean μ and known variance σ 2, but a small proportion p are drawn from a distribution with the same mean but higher variance. In the following example, we use 16σ 2 (or 4σ). 
The mean is not robust
While the mean is the minimum variance unbiased estimator for the Gaussian case, even small amounts of contamination can inflate its variance to the point that other unbiased estimators such as the median have lower variance. To illustrate this point, we computed the efficiency of the median relative to the mean for a sample of size 100 taken from the contaminated Gaussian just described. Figure A1 shows the relative efficiency of the median as a function of the proportion of contamination in the sample. The relative efficiency of the median is about 0.64 when no contamination is present. That is, the mean has about 36% lower variance than the median. With 10% contamination, however, the mean and median exchange roles, the relative efficiency is about 1.39, and now it is the median that has about 28% lower variance than the mean.
Figure A1
 
Relative efficiency. The efficiency of the median relative to the mean for a contaminated Gaussian is plotted as a function of the proportion of contamination in the sample (see 1 for details). When the relative efficiency of the median is 1, that means that the mean and the median have the same variance. When the relative efficiency is less than 1, then the median has a higher variance than the mean. When the relative efficiency is greater than 1, then the median has a lower variance than the mean.
Figure A1
 
Relative efficiency. The efficiency of the median relative to the mean for a contaminated Gaussian is plotted as a function of the proportion of contamination in the sample (see 1 for details). When the relative efficiency of the median is 1, that means that the mean and the median have the same variance. When the relative efficiency is less than 1, then the median has a higher variance than the mean. When the relative efficiency is greater than 1, then the median has a lower variance than the mean.
 
Tukey's paradox
Tukey (1960) argued that other estimators such as trimmed means would outperform the mean even when the degree of contamination was so slight that its presence or absence could not be detected in the sample. These alternative estimators can outperform the mean (i.e., relative efficiency > 1) when dealing with samples drawn from contaminated Gaussians, while maintaining a relative efficiency that is only slightly less than 1 when dealing with samples drawn from a true Gaussian. They are robust to failures of distributional assumptions. Since we can never be certain that empirical data are drawn from an uncontaminated Gaussian, we can never justify the use of the mean instead of its robust cousins when analyzing data. 
Acknowledgments
This research was funded in part by CCF-0541185 (MS) and a Humboldt Research Award (LTM). We thank Dylan Simon for helpful discussions and suggestions. 
Commercial relationships: none. 
Corresponding author: Mordechai Z. Juni. 
Email: mjuni@nyu.edu. 
Address: Department of Psychology, New York University, 6 Washington Place, Room 275, New York, NY 10003, USA. 
References
Ahumada A. J.Jr. (2002). Classification image weights and internal noise level estimation. Journal of Vision, 2, (1):8, 121–131, http://www.journalofvision.org/content/2/1/8, doi:10.1167/2.1.8. [PubMed] [Article] [CrossRef]
Ahumada A. J., Jr. Lovell J. (1971). Stimulus features in signal detection. Journal of the Acoustical Society of America, 49, 1751–1756. [Article] [CrossRef]
Battaglia P. W. Schrater P. R. (2007). Humans trade off viewing time and movement duration to improve visuomotor accuracy in a fast reaching task. Journal of Neuroscience, 27, 6984–6994. [PubMed] [CrossRef] [PubMed]
Brainard D. H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436. [PubMed] [CrossRef] [PubMed]
Cohen E. H. Schnitzer B. S. Gersch T. M. Singh M. Kowler E. (2007). The relationship between spatial pooling and attention in saccadic and perceptual tasks. Vision Research, 47, 1907–1923. [PubMed] [CrossRef] [PubMed]
Cohen E. H. Singh M. (2006). Perceived orientation of complex shape reflects graded part decomposition. Journal of Vision, 6, (8):4, 805–821, http://www.journalofvision.org/content/6/8/4, doi:10.1167/6.8.4. [PubMed] [Article] [CrossRef]
Cohen E. H. Singh M. Maloney L. T. (2008). Perceptual segmentation and the perceived orientation of dot clusters: The role of robust statistics. Journal of Vision, 8, (7):6, 1–13, http://www.journalofvision.org/content/8/7/6, doi:10.1167/8.7.6. [PubMed] [Article] [CrossRef] [PubMed]
Denisova K. Singh M. Kowler E. (2006). The role of part structure in the perceptual localization of a shape. Perception, 35, 1073–1087. [PubMed] [CrossRef] [PubMed]
Duda R. O. Hart P. E. Stork D. G. (2000). Pattern classification (2nd ed). New York: Wiley-Interscience.
Efron B. Tibshirani R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.
Girshick A. R. Banks M. S. (2009). Probabilistic combination of slant information: Weighted averaging and robustness as optimal percepts. Journal of Vision, 9, (9):8, 1–20, http://www.journalofvision.org/content/9/9/8, doi:10.1167/9.9.8. [PubMed] [Article] [CrossRef] [PubMed]
Green D. M. Swets J. A. (1966/1974). Signal detection theory and psychophysics (A reprint, with corrections of the original 1966 edition). Huntington, NY: Robert E Krieger Publishing.
Hampel F. R. (1974). The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69, 383–393. [Article] [CrossRef]
Harris C. M. Wolpert D. M. (1998). Signal-dependent noise determines motor planning. Nature, 394, 780–784. [PubMed] [CrossRef] [PubMed]
Huber P. J. (1981). Robust statistics. New York: Wiley.
Huber P. J. (2001). John W Tukey's contributions to robust statistics. The Annals of Statistics, 30, 1640–1648. [Article]
Knoblauch K. Maloney L. T. (2008). Estimating classification images with generalized additive models. Journal of Vision, 8, (16):10, 1–19, http://www.journalofvision.org/content/8/16/10, doi:10.1167/8.16.10. [PubMed] [Article] [CrossRef] [PubMed]
Landy M. S. Maloney L. T. Johnston E. B. Young M. (1995). Measurement and modeling of depth cue combination: In defense of weak fusion. Vision Research, 35, 389–412. [PubMed] [CrossRef] [PubMed]
McCullagh P. Nelder J. A. (1989). Generalized linear models (2nd ed). Boca Raton, FL: Chapman & Hall.
McGowan J. W. Kowler E. Sharma A. Chubb C. (1998). Saccadic localization of random dot targets. Vision Research, 38, 895–909. [PubMed] [CrossRef] [PubMed]
Melcher D. Kowler E. (1999). Shapes, surfaces and saccades. Vision Research, 39, 2929–2946. [PubMed] [CrossRef] [PubMed]
Mood A. Graybill F. A. Boes D. C. (1974). Introduction to the theory of statistics (3rd ed). New York: McGraw-Hill.
Morgan M. J. Hole G. Glennerster A. (1990). Biases and sensitivities in geometrical illusions. Vision Research, 30, 1793–1990. [PubMed] [CrossRef] [PubMed]
Oruç I. Maloney L. T. Landy M. S. (2003). Weighted linear cue combination with possibly correlated error. Vision Research, 43, 2451–2468. [PubMed] [CrossRef] [PubMed]
Pelli D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10, 437–442. [PubMed] [CrossRef] [PubMed]
Tassinari H. Hudson T. E. Landy M. S. (2006). Combining priors and noisy visual cues in a rapid pointing task. Journal of Neuroscience, 26, 10154–10163. [PubMed] [CrossRef] [PubMed]
Trommershäuser J. Maloney L. T. Landy M. S. (2008). Decision making, movement planning, and statistical decision theory. Trends in Cognitive Sciences, 12, 291–297. [PubMed] [CrossRef] [PubMed]
Tukey J. W. (1960). A survey of sampling from contaminated distributions. In Olkin I. (Ed.), Contributions to probability and statistics (pp. 448–485). Stanford, CA: Stanford University Press.
Whitaker D. Walker H. (1988). Centroid evaluation in the vernier alignment of random dot clusters. Vision Research, 7, 777–784. [PubMed] [CrossRef]
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×