Free
Article  |   April 2013
Regional effects of clutter on human target detection performance
Author Affiliations
Journal of Vision April 2013, Vol.13, 25. doi:https://doi.org/10.1167/13.5.25
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Matthew F. Asher, David J. Tolhurst, Tom Troscianko, Iain D. Gilchrist; Regional effects of clutter on human target detection performance. Journal of Vision 2013;13(5):25. https://doi.org/10.1167/13.5.25.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract
Abstract
Abstract:

Abstract  Clutter is something that is encountered in everyday life, from a messy desk to a crowded street. Such clutter may interfere with our ability to search for objects in such environments, like our car keys or the person we are trying to meet. A number of computational models of clutter have been proposed and shown to work well for artificial and other simplified scene search tasks. In this paper, we correlate the performance of different models of visual clutter to human performance in a visual search task using natural scenes. The models we evaluate are Feature Congestion (Rosenholtz, Li, & Nakano, 2007), Sub-band Entropy (Rosenholtz et al., 2007), Segmentation (Bravo & Farid, 2008), and Edge Density (Mack & Oliva, 2004) measures. The correlations were performed across a range of target-centered subregions to produce a correlation profile, indicating the scale at which clutter was affecting search performance. Overall clutter was rather weakly correlated with performance (r ≈ 0.2). However, different measures of clutter appear to reflect different aspects of the search task: correlations with Feature Congestion are greatest for the actual target patch, whereas the Sub-band Entropy is most highly correlated in a region 12° × 12° centered on the target.

Introduction
Visual search is a ubiquitous, everyday, and much-researched task. Whether it's finding your car in a parking lot or detecting a flaw in a diamond, the task involves detecting a target in the presence of distracting visual information. Some search tasks are harder than others, and so a central question arises: “Can we predict how hard a particular search task will be”? 
A large number of factors have been identified which predict task difficulty in search (Wolfe, 1994, 1998), including the similarity of the target to the distractor items and the heterogeneity of the distracters themselves (Duncan & Humphreys, 1989). Many of these studies of search have depended on the relationship between performance and set size, measured as the number of punctuated items in the display. Typically an increase in the number of distractors increases the search time in a linear way and it is the slope of these response functions that indicates the difficulty of a particular search. Whereas the objects used in these search tasks have evolved from simple features, such as lines, colors, and Gabors to complex shapes, natural objects, and even texture samples (Treisman & Gelade, 1980; Wolfe, 1994, 1998; Lovell, Gilchrist, Tolhurst, & Troscianko, 2009; Eckstein, 2011), this approach still fundamentally depends on the relationship between the number of objects in the display and search performance. 
The problem with this approach is that it does not easily extend to dealing with more complex natural scenes. Within a natural scene, it is often less obvious what would constitute a distinct and separable object: Natural scenes often contain complex objects that have distinct subparts and areas of textures which themselves are made up of a field of small objects. What constitutes an object may also depend on the task in hand: When searching for your car, it may be reasonable to assume that the car is the object; but when reaching to open the door, then the handle on the car door may “become” the object. Note here that the visual input hasn't changed, but rather the task has changed the unit of analysis. 
As a result, there is a need to develop and refine new measures that predict search task difficulty, ones that do not depend on being able to independently define the number of objects in the scene. A number of groups have developed measures of clutter as a possible solution to this problem. 
Rosenholtz, Li, Mansfield, and Jin (2005) define clutter and suggest a metric for evaluating the clutter content of an image. They evaluate this metric using maps and chart stimuli, which contain a mixed density of iconic representations and complex arrangements of distinctive shapes, which, while not natural scenes, are a significant advance from the simple style of search task, where there is a set of distinct and matched objects. Rosenholtz et al. (2005) show that there is a high correlation between human clutter assessment and their own metric of clutter, Feature Congestion, when applied to these types of images (Rosenholtz, Li, Mansfield, & Jin, 2005), as well as showing a strong correlation with search times for small Gabor patches placed at a random locations within the charts (Rosenholtz et al., 2007). They also show a strong relationship between human performance and two other proposed clutter metrics: Sub-band Entropy and Edge Density
Neider and Zelinsky (2011) have extended the application of these clutter metrics to searches in artificially generated cityscapes, created using the simulation package Simcity4™ (EA Games, 2003). This approach produces an even more complex and potentially cluttered scene, but with well-defined control over the number of objects in the scene, along with the object density. They confirmed the strong connection between human assessments of clutter in these images and the Feature Congestion metric, as well as the Edge Density, and show a relationship between search time and these metrics. 
Henderson, Chanceaux, and Smith, (2009) apply these same metrics to real scenes, testing the metrics against participants' performance when searching for an “L” or “T” embedded in the scene. They found reliable but weaker relationships between these metrics and performance (r = 0.42–0.53). 
These studies demonstrate the utility of clutter metrics as a substitute for set size in more complex search displays, including naturalistic scenes. However, only the study of Henderson et al. (2009) uses scenes that are fully natural (i.e., photographic), and all of the studies use an artificial search target that would not occur in the actual scene. 
Here we use natural scenes and target patches that come from the actual scene (target present) or just outside it (target absent). This allows us to judge the effect of the target on performance while moving closer to more realistic task demands and to evaluate the performance of these clutter metrics under these conditions. Some of these results have been presented briefly before (Asher, Gilchrist, Troscianko, & Tolhurst, 2012). 
Methods
Participants
Twenty-five members of the University of Bristol, School of Experimental Psychology, student body participated in the study (seven male; Age = 27.4 years, range 22–37 years). Each participant had normal or corrected to normal vision. A non-financial reward was provided for attendance. The experiment was in compliance with the Declaration of Helsinki, and ethical approval was given by the local university ethics committee. 
Stimuli
The experimental stimuli were derived from 120 natural images, which were each cropped into a 1024 × 1280 scene, a 90 pixel diameter circular Target Present Patch (Present-Patch) taken from inside the border of the scene and a 90 pixel diameter circular Target Absent Patch (Absent-Patch) selected from outside the scene (Figure 1). 
Figure 1
 
Example stimuli: From a natural image (left), three sub-images are taken: a scene of 1024 × 1280 pixels (white outline), and a Present-Patch (green) and an Absent-Patch (red), both of which are 90 pixel diameter circles. Present-Patches are always from inside the scene, while Absent-Patches are always from outside the scene. (See Supplementary Materials for further examples.) Original photo copyright of Trey Ratcliff and Stuck in Customs, used under creative commons license, details at http://creativecommons.org/licenses/by-nc-sa/2.0/. The A′ for this scene task was 0.90, across 25 observers.
Figure 1
 
Example stimuli: From a natural image (left), three sub-images are taken: a scene of 1024 × 1280 pixels (white outline), and a Present-Patch (green) and an Absent-Patch (red), both of which are 90 pixel diameter circles. Present-Patches are always from inside the scene, while Absent-Patches are always from outside the scene. (See Supplementary Materials for further examples.) Original photo copyright of Trey Ratcliff and Stuck in Customs, used under creative commons license, details at http://creativecommons.org/licenses/by-nc-sa/2.0/. The A′ for this scene task was 0.90, across 25 observers.
Figure 2
 
Trial sequence for a single trial.
Figure 2
 
Trial sequence for a single trial.
The natural images were taken from three sources: 44 from the TNO Search_2 Database (http://opticalengineering.spiedigitallibrary.org/article.aspx?articleid=1098382) (Toet, Bijl, & Valeton, 2001); 62 from the Flikr open access image website (Flickr.com; http://www.flickr.com); and 14 were photographed specifically for this study (See Appendix A in Supplementary Material for more information). The images were chosen to have a mixture of levels of clutter (subjectively and objectively assessed) as well as to depict a variety of scenarios (e.g., domestic, countryside, street scene, etc.). 
The majority of the images were down-sampled before scenes and targets were selected in order to match the display resolution of the experimental apparatus. 
Both the Present-Patches and Absent-Patches were chosen to be uniquely distinguishable from any other part of the scene, and the Present-Patches were also selected so that the distribution of their locations was uniform around the center in both the X and Y dimensions across the 120 scenes. As a result, there were equal numbers of Present-Patches in all four quadrants, and there were more Present-Patches closer to the centre of the scene than at the edges. One of the 120 scene-target sets was discarded at a later stage, because the Absent-Patch was also present within the scene. 
Design
All participants were shown all 120 scenes in four blocks of 30 trials, presented in a computer-generated random order. In 80% of the trials for each participant, search was for a target which was present in the scene (the target was the Present-Patch) while in the remaining trials the target was absent (the Absent-Patch). These target-absent trials were counterbalanced across the 25 participants so that all 120 scenes had an equal number of target-absent trials, i.e., five trials. 
Participants sat at a fixed distance of 55 cm from a CRT monitor, where the scenes subtended a visual angle of 37.6° × 30.5° and the targets of 2.7° visual diameter. All eye movements were recorded during each trial using an EYELINK© 1000 eyetracker (SR Research, Canada), though these data are not presented in this paper. 
Procedure
Participants were first briefed verbally, and then given written task instructions on the screen. Participants were given a short practice block of six trials (three target-present trials) to introduce them to the eye tracker calibration procedure and the task. After the practice trials, the participant was given the opportunity to ask questions. A full calibration and a validation of the eye-tracking equipment were performed at the beginning of each block of 30 trials. 
Each trial had the following sequence of events: A fixation cross was displayed for as long as it took the experimenter to perform a drift correction on the participant's fixation. The target for the trial was then displayed for 1000 ms in the centre of the screen, followed by a blank, mid-grey screen for 250 ms. The scene itself was then presented for a maximum of 30 s, or until the participant responded by pressing the left or right hand buttons on a control pad connected to the control PC to indicate whether they thought the target was present or not. The scene was retained on screen for an extra 700 ms (after Zelinsky, 2008), and eye movements were recorded from the start of the fixation cross to the removal of the scene from the screen. There was then a 250 ms blank screen before the fixation cross for next trial appeared. 
Participants were told to search as quickly as possible, but to be reasonably certain that the target was present or absent. They were told of the 30 s timeout at the beginning of the experiment. Each scene was only shown once to each participant but if the timeout was reached, which was a very infrequent occurrence, participants retook the failed trial at the end of the block. Whereas no feedback was given at the end of each trial, participants were informed of their score at the end of the experiment. 
Results
General performance
We used the non-parametric A′, a measure closely related to the standard d′ from signal detection theory, as a measure of performance in the task (Pollack & Norman, 1964). A′ has a range from 0, indicating all responses have been incorrect; to 1, which is perfect performance, while a value of 0.5 represents performance at chance. 
The A′ measure is a nonparametric estimate of the area under the ROC (Receiver Operator Characteristics) curve. The ROC plots the relative true-positive rate (TPR) against the false-positive rate (FPR), where the true-positive rate is defined as the number of correct target-present responses divided by the total number of target-present trials, whereas the false positive rate is the number of incorrect target-present responses divided by the total number of target-absent trials (i.e., the proportion of target-absent trials where the participant responded target-present). The A′ uses a single point observation along with two other points at (0,0) and (1,1); In this experiment, this single point could be the value for one participant looking at 119 scenes, or it could be the value for all 25 observers looking at one scene. The A′ is a robust measure of performance in a discrimination task which also accounts for any unbalance in the relative numbers of trials for the two conditions. See Macmillan and Creelman (2005) for more details on using A′
Overall, participants' performance was relatively good ( Display FormulaImage not available = 0.80, range = 0.70–0.90). No participants achieved a maximum A′ = 1, and every participant was significantly above chance (A′ = 0.5). On only eight trials out of the total 2,975 was the timeout reached, and these trials have been excluded from the results. 
The primary focus of this paper is in understanding what makes one particular search task difficult when compared to another. So we calculated A′ for each scene, across participants, to assess the difficultly of each search task. Figure 3 shows the distribution of A′ for all the scenes. For five of the scenes, performance was numerically at, or below, chance, with the other images distributed across the above chance levels. All of these 119 scenes have been included in the on-going analysis. 
Figure 3
 
Distribution of performance for individual scenes ( Image not available = 0.80, SD = 0.16). Note that some of the results are at or below chance, suggesting some search tasks are so hard that people's performance is worse than a guess. This poor performance could be because of visual properties of the Search task, such as Present Patch is very indistinct, or the Absent Patch is very similar to something else in the scene. Data are shown for 119 scenes; there were 120 scenes in the experiments, but one was discarded from analysis because of ambiguity in the Target-Absent Patch.
Figure 3
 
Distribution of performance for individual scenes ( Image not available = 0.80, SD = 0.16). Note that some of the results are at or below chance, suggesting some search tasks are so hard that people's performance is worse than a guess. This poor performance could be because of visual properties of the Search task, such as Present Patch is very indistinct, or the Absent Patch is very similar to something else in the scene. Data are shown for 119 scenes; there were 120 scenes in the experiments, but one was discarded from analysis because of ambiguity in the Target-Absent Patch.
Of immediate interest is the overall ability to detect targets in natural scenes, and so we do not report response times in detail; however, note that there is a reasonable level of correlation between the A′ measure and the mean response times, r(117) = −0.566 p = 0.001, indicating that a high A′, which indicates an easier task, will likely to lead to a faster response. 
Assessing clutter
We have chosen four previously published measures of clutter for comparison, which are summarized below. 
Feature Congestion (Rosenholtz et al., 2007)
The Feature Congestion measure extracts a set of feature maps and then evaluates the amount of feature variation, including the variance of locally grouped colors in CIELab space and the oriented opponent energy (Bergen & Landy, 1991). If there are many different colors, orientations, or luminance contrast changes in a given local area, that area has a lot of features and is therefore cluttered. The final clutter measure for the scene is the average across the scene's Feature Congestion heatmap. An example heatmap showing the amount of clutter across an image measured in this way is shown in Figure 4
Figure 4
 
Feature congestion heatmap. Black areas have very few conflicting features whereas light patches are relatively feature intensive.
Figure 4
 
Feature congestion heatmap. Black areas have very few conflicting features whereas light patches are relatively feature intensive.
Sub-band Entropy (Rosenholtz et al., 2007)
Sub-band Entropy is a measure of the amount of information in the scene, evaluated as the Shannon Entropy of a set of wavelet transformed sub-bands, such as those used in JPEG compression algorithms. It is inversely related to the amount of redundant information in the scene. 
Multi scale segmentation (Bravo & Farid, 2008)
In this method, the image is separated into object segments, at a series of “scales,” using a fast graph segmentation algorithm, which uses the color similarity of pixels when “growing” the segments (Figure 5). The clutter measure itself is the constant of proportionality for a fixed exponent power law curve that best describes the rate of change of the number of segments at each scale for the scene. 
Figure 5
 
Output of the segmentation algorithm (Felzenszwalb & Huttenlocher, 2004) prior to the segmentation count at different scales. The scale defines the minimum number of pixels that can contribute to an individual segment's area, shown underneath each image. As the figure shows, as this parameter increases, the smaller clusters merge into larger regions, so there are fewer segments.
Figure 5
 
Output of the segmentation algorithm (Felzenszwalb & Huttenlocher, 2004) prior to the segmentation count at different scales. The scale defines the minimum number of pixels that can contribute to an individual segment's area, shown underneath each image. As the figure shows, as this parameter increases, the smaller clusters merge into larger regions, so there are fewer segments.
Edge Density (Mack & Oliva, 2004)
This Edge Density measure is the simple ratio of pixels that lie on extracted edges to the remaining pixel area of an object, where an area with more edges in it is more cluttered. The implementation in this paper uses the Matlab® (The MathWorks, Inc., Natick, MA) implementation of the Canny edge detector (Canny, 1986) for the RGB channels separately, shown collectively in Figure 6, and takes the ratio of the number of edge pixels to the total number of pixels (after Rosenholtz et al., 2007). 
Figure 6
 
Output from Canny Edge Detector for Edge Density Calculation. Notice the low number of edges in the uncluttered area of image.
Figure 6
 
Output from Canny Edge Detector for Edge Density Calculation. Notice the low number of edges in the uncluttered area of image.
Modeling clutter in the whole scene
The four measures of clutter were applied to the whole of each of the scenes, and then compared against the human performance (A′) for the respective scenes, and against the constituent components of A′: the true positive rate and the false positive rate (Table 1). These last two comparisons are included because the scene contains the Present-Patch, but not the Absent-Patch, so the Present-Patch is included in the calculation of the whole scene clutter. We also show the Response Time correlation for comparison. 
Table 1
 
Correlation of human performance (A; true positive rate [TPR]; false positive rate [FPR]; reaction time [RT]) metrics against clutter model values calculated for whole the scene (degrees of freedom = 117), showing Spearman's Correlation, as well as the associated p value and the 95% confidence intervals (bootstrapped). For comparison with regression results, r2 is shown below each r. Entries in bold are statistically significant (p < 0.05).
Table 1
 
Correlation of human performance (A; true positive rate [TPR]; false positive rate [FPR]; reaction time [RT]) metrics against clutter model values calculated for whole the scene (degrees of freedom = 117), showing Spearman's Correlation, as well as the associated p value and the 95% confidence intervals (bootstrapped). For comparison with regression results, r2 is shown below each r. Entries in bold are statistically significant (p < 0.05).
Correlation A' TPR FPR RT
r (r2) p CI r (r2) p CI r (r2) p CI r (r2) p CI
Subband Entropy 0.21 0.02 0.35 −0.11 0.24 −0.26 0.2 0.03 0.02 0.05 0.56 −0.12
(0.05) 0.03 (0.01) 0.08 (0.04) 0.35 (0.01) 0.21
Feature Congestion 0.14 0.14 −0.11 −0.01 0.93 −0.28 0.18 0.05 0.32 0.01 0.88 −0.18
(0.02) 0.31 (0.00) 0.18 (0.03) 0.01 (0.00) 0.24
Edge Density 0.21 0.02 0.37 −0.12 0.21 −0.26 0.19 0.04 0.00 0.02 0.82 −0.16
(0.05) 0.03 (0.01) 0.05 (0.04) 0.35 (0.00) 0.18
Segmentation 0.14 0.13 −0.08 0.03 0.71 −0.19 −0.15 0.09 −0.31 −0.01 0.88 −0.20
(0.02) 0.32 (0.00) 0.21 (0.02) 0.04 (0.00) 0.20
Table 1 shows that there is a small but reliable negative relationship between A' and both Sub-band Entropy and Edge Density. In both cases, as the overall scene clutter increases, the performance drops. Care should be taken when drawing conclusions from the TPR and FPR, as when viewed independently, they do not control for the bias in the number of true cases (20) against the false cases (5) for each image. However, it would appear that the effect is being carried somewhat by the tendency for participants to erroneously report the presence of a target when it is absent. 
For the A′ correlations that are significant, Table 1 shows that the TPR correlations are in the opposite direction to the FPR correlation. This reversal suggests that the properties of clutter consistently facilitate both correct identification when the target is present, and correct rejection when the target is absent. The Feature Congestion only seems to be related to the false positive rate, with almost zero correlation in the true positive condition, and the negative correlation suggests that high levels of Feature Congestion style clutter are related to low numbers of false positives (i.e., not giving the wrong response when the target is not there), and therefore potentially connected to discriminability, perhaps because it is a measure of the contrast of features within the scene. 
Unlike previous studies (Rosenholtz, Li, & Nakano, 2007; Henderson, Chanceaux, & Smith, 2009; Neider & Zelinsky, 2011), we find no significant correlations between any of the four clutter measures and response time, despite our strong correlation between mean RT and A. However, our task is more challenging, resulting in more errors and almost certainly increases uncertainty in responding that the target is not present, as well as some uncertainty about target-like patches. As a result the response time here is unlikely to be an instant response to finding the stimulus, but more likely to reflect a complex judgment. The variation in response times will also reflect a combination of a speed/accuracy trade off (Wickelgren, 1977) with the more typical pattern of faster responses being made in easier tasks. Further work and modeling beyond the scope of this paper will be required to account for response time variability in these search tasks when error rates are high. 
All four measures considered here measure visual clutter in different ways, but there are still correlations between these measures: Sub-band Entropy is highly correlated with Edge Density (r(117) = 0.77, p = 0.001), and Feature Congestion is also highly correlated to Segmentation, r(117) = 0. 91, p = 0.001 whereas Sub-band Entropy is less correlated with Feature Congestion, r(117) = 0.25 p = 0.006 . 
In order to investigate the combined ability of these measures of clutter to predict performance, we carried out a stepwise linear regression using a Bayesian Information Criterion (BIC, also known as Schwarz Criterion; Schwarz, 1978). Note that this method does not take into account the relative complexity of the individual clutter models, but only penalizes the inclusion of model outputs in the regression model. Starting with a constant model, the four clutter metric results were added to the regression model one at a time based upon the BIC selection results. See Supplementary Material (Appendix G) for details. The final model, after discarding non-predictive factors, includes the following predictors: Feature Congestion and Sub-band Entropy with no interaction between predictors, with r2 = 0.0846, F(1, 116) = 5.36, p = 0.006. This result is quite poor, but is actually an improvement over the simple linear regression results for these Metrics separately (Sub-band Entropy: r2 = 0.0461, F(117) = 5.65 p = 0.02); Feature Congestion: r2 = 0.0182 , F(117) = 2.17 p = 0.14. 
The individual correlations between measures of clutter and performance are low, at best r(117) = 0.22. And even when these measures are combined, a good deal of the variability in performance remains unaccounted for. Figure 7 (left hand side) plots A' for each scene against its overall Sub-band Entropy, which showed one of the highest individual correlations with performance. Interestingly the relationship is not a simple linear one; instead there appears to be a lower-bound relationship (red line, Figure 7): When Sub-band Entropy is high, there is large variability in performance, so presumably here A is partially determined by some other factor. However, when Sub-band Entropy is low, search performance tends to vary less and performance is good. One way of quantifying this lower bound is shown as a red line in Figure 7, which is a plot of the regression line of the rolling lower 2 standard deviations (SD) points, which are calculated by using densely overlapping bins and then calculating the mean and standard deviation in each bin to find the performance point 2 SDs below the mean performance for the bin. The observation that this lower bound line drops with Sub-band entropy provides evidence for this lower bound idea. The scatter plot in Figure 7 suggests the possibility of the same separation for Feature Congestion for at least part of the data (Feature Congestion > 4), although the metric itself is not significantly correlated with A′. The lack of correlation between the true positive rate and Feature Congestion, despite a significant result for the false positive rate, potentially indicates that Feature Congestion in the scene is only useful for the correct rejection of a target absent, noting the negative correlation with the false positive (failed rejection) means that higher Feature Congestion facilitates correct rejection of an Absent Patch. 
Figure 7
 
Scatter plot of the scene A′ against Sub-band Entropy, r(117) = −0.22, p = 0.02 (left), and Feature Congestion, r(117) = 0.14, p = 0.14 (right), in the scene. See that despite both measures being positive representations of clutter, the regressions are in opposite directions. The more convincing comparison is that Sub-band Entropy seems to have a linear cut off, shown by the red line. This line is a regression of the red points, each of which is lower 2 SD. A′ value for all points with a Sub-band Entropy value ±0.5 the central value of the point (±1 for Feature Congestion). This relationship doesn't appear for Feature Congestion, or either of the other clutter metrics. This lower bound would indicate a theoretical limit on the difficulty of the search, with actual difficulty affected by the properties of both the scene and the targets. The insert images show the scene for some of the extreme ends of the distribution.
Figure 7
 
Scatter plot of the scene A′ against Sub-band Entropy, r(117) = −0.22, p = 0.02 (left), and Feature Congestion, r(117) = 0.14, p = 0.14 (right), in the scene. See that despite both measures being positive representations of clutter, the regressions are in opposite directions. The more convincing comparison is that Sub-band Entropy seems to have a linear cut off, shown by the red line. This line is a regression of the red points, each of which is lower 2 SD. A′ value for all points with a Sub-band Entropy value ±0.5 the central value of the point (±1 for Feature Congestion). This relationship doesn't appear for Feature Congestion, or either of the other clutter metrics. This lower bound would indicate a theoretical limit on the difficulty of the search, with actual difficulty affected by the properties of both the scene and the targets. The insert images show the scene for some of the extreme ends of the distribution.
Modeling local clutter
Having found these relationships, we began to ask, if the whole scene is only partially correlated to the search difficulty, is it the clutter directly around the Present-Patch that determines performance? Perhaps the answer is to look at how the level of clutter in the image around the Present-Patch affects correlation between search performance and the clutter metric, and to assess this effect parametrically over different areas around the Present-Patch. 
For this second stage of modeling, each scene was cropped into a series of square patches centered on the Target Present Patch, starting at 1024 × 1024 pixels, down to sub-target sizes (84 × 84 and 64 × 64 pixels), in steps of 20 pixels, with a special case at the target size (90 × 90 pixels). Any pixels that fell outside the scene boundary, while inside the cropping square boundary, for example when the Present Patch was near the edge of the scene, were set to black (i.e., zero). This black padding reflects the real display situation in the experiment, as there was no ambient light in the laboratory (see Figure 8). For each subregion, the clutter measures outlined above were applied and correlated to the human performance (A′) for all 119 scenes, at each scale, to get a series of regional correlation values for each metric. Plotting the change in this regional correlation against the size of the region indicates how the clutter changes in importance across the visual field. The blue lines in Figure 9 show the Clutter Correlation Profiles for all four measures of clutter. The four measures differ in the way that the correlation changes with region size; the correlation of A' with Sub-band Entropy and Edge Density (Figure 9) shows a steady rise, peaking at a region edge size of around 400–500 pixels (i.e., 200–250 pixels from the center to region edge), equating to a visual angle of 12°–15°, whereas the Feature Congestion and Segmentation (Figure 9) correlation stay relatively constant (or even declines) and only rises and peaks at the target size (90 × 90 pixels) and the sub-target size patches. This upswing at the target patch size in Feature Congestion and Segmentation led us to question the influence of the Present-Patch on the performance compared to the other regions in the image, so we repeated these analyses but with the Present-Patch removed, replaced with a 90 pixel square with all black pixels. The red lines in Figure 9 show the effect of removing the Present-Patch. For Sub-band Entropy and Edge Density there is a small reduction in the correlation without the Present-Patch, but the general profile stays the same with a similar peak (Figure 9). However, the effect is most noticeable for Feature Congestion and Segmentation, as now the local effects around the target size (< 250 pixels, or 7.3°) are all but eliminated (Figure 9), implying that the correlation seen at the target size is due to the target patch itself. 
Figure 8
 
Example of stepping through the image. Boxes indicate progressively smaller subareas of the scene, centered on the target (blue box) to which clutter algorithms were applied. Green boxes indicate that the subarea was completely within the scene, while Red boxes indicate that for a square subarea, focus on the target part of the subarea extended beyond the scene edge, and was filled in with black pixels. On the Right hand side are examples of what the clutter metric would see for different conditions: From the top: Small and wholly within scene; partially outside the scene; the target blank case for the same subarea, and maximum pixel area (1024 × 1024 pixels).
Figure 8
 
Example of stepping through the image. Boxes indicate progressively smaller subareas of the scene, centered on the target (blue box) to which clutter algorithms were applied. Green boxes indicate that the subarea was completely within the scene, while Red boxes indicate that for a square subarea, focus on the target part of the subarea extended beyond the scene edge, and was filled in with black pixels. On the Right hand side are examples of what the clutter metric would see for different conditions: From the top: Small and wholly within scene; partially outside the scene; the target blank case for the same subarea, and maximum pixel area (1024 × 1024 pixels).
Figure 9
 
The regional correlation profiles for the four clutter metrics. The solid blue line shows the correlation profile for the metric against participant A′ for different sizes of subregion, while the red line shows the profile for the case with the Target-Present Patch in the subregion fill in black. The dotted lines represent the correlation coefficient of the whole scene (1280 × 1024) without padding. Note that as shown, correlations for Sub-band Entropy and Edge Density are in the negative direction, and have been rectified to make viewing of the profile easier.
Figure 9
 
The regional correlation profiles for the four clutter metrics. The solid blue line shows the correlation profile for the metric against participant A′ for different sizes of subregion, while the red line shows the profile for the case with the Target-Present Patch in the subregion fill in black. The dotted lines represent the correlation coefficient of the whole scene (1280 × 1024) without padding. Note that as shown, correlations for Sub-band Entropy and Edge Density are in the negative direction, and have been rectified to make viewing of the profile easier.
Next we asked if combined measures of local clutter could produce a better prediction of performance than each alone. A linear regression using the Sub-band Entropy and Feature Congestion values at the subregion size at the peaks (484 pixels (14.2°) and 84 pixels (2.5°) respectively) in these profiles produced a model that accounted for more variance that the previous full image model, r2 = 0.130, F(1, 116) = 8.69, p < 0.001, compared to r2 = 0.085. To confirm that these factors give the best possible result, a stepwise regression using the BIC, with all possible metrics, at all sizes of subregion, produced a model with a marginally better result, r2 = 0.132, F(1, 116) = 8.85, p < 0.001 by using slightly different input factor region sizes than those at the profile peaks. This final regression model included the Sub-band Entropy, for a subregion size of 484 pixels only, which is the same as the actual maximum value in the Sub-band Entropy profile in Figure 9, and the Feature congestion, at the subregion size of 1024 pixels, which is surprisingly at the opposite end of the scale to the individual peak result for this measure and corresponds to the whole display. There are no significant interactions between terms (See Supplementary Material, Appendix H for details). It is important to note that, although this combination of factors produces a higher r2 than the model derived from the peaks of the correlations profile, the improvement is small. Additionally, during the BIC regression, when the selection of the next addition factor is considered, the target-only subregion of Feature Congestion still has a potentially significant contribution (BIC modification = −0.73707) though the subregion size of 1024 pixel beats it marginally (BIC modification = −1.019). These results suggest that the improvement by using the largest size of subregion may not be significant, and that the target location is still a viable and sensible factor for modeling. 
Discussion
In this paper, we have investigated the role of scene clutter in predicting search performance in a difficult natural image visual search task. In addition, we have looked at how the level of clutter in subregions of the scene surrounding the target influences the search task difficulty, and at what distance from the target this clutter has the greatest effect. 
We show that Sub-band Entropy, and to a lesser extent Edge Density, in a natural scene around the target area correlate with performance. In addition we find that Feature Congestion and Segmentation correlates to performance only when it is the Feature Congestion of the target itself that is being measured rather than the clutter of the whole scene. Thus, performance appears to be determined by both the clutter around the target location and by how cluttered the target itself is. 
The results reported here are in some ways inconsistent with findings reported in other papers. However, there are significant differences in the task and approach taken here, which may account for these differences. These differences are discussed in detail below. 
Segmentation model
We found no significant correlation between this metric and measures of performance, both for the whole scene condition and for the Correlation Profile at any region sizes. 
The Segmentation Model was suggested as a fast, scale-invariant predictor of search times when the “simple features are not known,” and was tested using a categorical search task by Bravo and Farid, (2008), who showed significant results for the model's predictions. Although these results are in stark contrast to our low correlations, it should be noted that the tasks are different, in that participants in the Bravo and Farid study searched for targets in a category, with no knowledge of the specific low-level features, whereas our task is based upon matching of specific low-level features without any explicit categorization. 
Our results suggest that there is a robust connection between the Segmentation model and the Feature Congestion. It is possible that these two metrics measure the same feature phenomena: The Segmentation clusters groups of pixels inside contrast edges, and the Feature Congestion evaluates the amount of contrast at local edges using a Gabor methodology. 
Edge Density
The Edge Density model (Mack & Oliva, 2004) is really the baseline model of clutter, which we do find to be significantly related to the human performance for whole scene clutter, a result echoed in a number of papers, which generally found this metric to give the highest correlation to reaction time (Rosenholtz, Li, & Nakano, 2007; Henderson, Chanceaux, and Smith, 2009). Henderson, Chanceaux, and Smith also found the Edge Density to be the best predictor of variability for several other metrics of performance, such as the mean fixation duration and the number of errors. 
Neider & Zelinsky (2011, in their supplementary material) concluded that Edge Density and Feature Congestion are highly reflective of each other. We have also shown that there is relationship between the Edge Density and the Feature Congestion, though it is not as strong as the Edge Density's connection to the Sub-band Entropy. 
Our results support the general notion that Edge Density, despite being a simple and non-biologically inspired process, is actually quite good at predicting human performance, second only to the Sub-band Entropy. 
Sub-band Entropy
We have shown that Sub-band Entropy (Rosenholtz et al., 2007) is the strongest single predictor of search performance in a natural image search task. 
In our study, the Sub-band Entropy, when evaluated across the whole scene, has the highest correlation of any clutter metric against performance; this result is not generally consistent with other studies, which typically show Edge Density to have the highest correlation. However Sub-band Entropy is typically a stronger correlate of search performance across the whole scene than is Feature Congestion. This outcome is consistent with Rosenholtz et al. (2007) in which, for their visual search task, they found a slightly higher correlation between Log(reaction time) in both the target present and target absent conditions for the Sub-band Entropy than for Feature Congestion. 
Henderson et al. (2009) also found that, across a number of performance metrics (misses, first fixation duration, mean fixation duration), there was a stronger correlation with Sub-band Entropy than Feature Congestion, though for the standard response time, this was not the case. It is gratifying to note that the correlations in Henderson et al. are much closer to our own (i.e., low). This finding contrasts with the studies that use somewhat artificial scenes which produce higher correlations (e.g., Rosenholtz et al., 2007; Neider & Zelinsky, 2011). 
Our lack of correlation with the overall reaction time (the usual dependent measure in such studies) may well reflect differences in overall task difficulty. More difficult tasks, such as the one studied here, require two processes, i.e., both isolation and selection. For example, if there are two objects with similar characteristics (for example the two trees in the back ground of Figure 1), in a relatively uncluttered scene, then isolating the two similar objects is easy, but deciding which is the target is more complicated. 
Despite these differences with other studies, we have shown two interesting results specifically about using the Sub-band Entropy as a predictor of performance: 
  •  
    Even with a significant correlation, it is apparent that the Sub-band Entropy predicts performance only by describing a lower bound on the scatter plot, suggesting that Sub-band Entropy of the whole scene is a LIMITING factor of search performance. In a search task, there are two factors that influence search difficulty: the scene, and the target, so this result would be consistent with the idea of a variable saliency target, a feature of our experiment, which is discussed further below in the target effects section.
  •  
    There is a definite peak region size, where the clutter in the region is most influential on the search performance, a region of about 7° eccentricity around the Present Patch. This is quite a large area of effect, but could be related to crowding or masking (Bouma, 1970; Levi, 2008).
Feature Congestion
Our first analysis showed that Feature Congestion is a very weak predictor of search performance in a natural image search task, although the false positive rate just achieves significance in its correlation. This negative correlation implies that the greater the Feature Congestion in the scene, the easier it is for participants to reject the Absent Patch. On the face of it, this is a counterintuitive result unless it is considered that the number of contrasting features, which is what Feature Congestion is measuring, actually helps to define the signature of a scene. The more features that are congesting, the more definite and distinctive the signature of the scene and the easier it is to recognize a target as not matching that signature. 
Further simulations suggested that this effect is a result of the amount of Feature Congestion within the Present Patch itself, implying that the more clutter there is in the target patch, the easier it is to find. There are two plausible explanations for this effect: The first is that the more visually complex the target patch is, the more likely it is to be remembered and the easier it will then be to compare this memory representation to other parts of the scene. The second possibility is that a target which has a high level of Feature Congestion will be more visually different from its surrounds than one with a low level of Feature Congestion. Either way, this result demonstrates that the visual properties of the target clearly have a part to play in determining search performance, in addition to the distracting clutter of the scene. 
Real world complexity
Across the studies reviewed here, there is a general trend that can be seen across all compared metrics: As the task becomes less “real world,” the better the correlation of performance to the metrics. This is somewhat expected: As the complexity of the clutter in a scene increases, simple clutter measures will become less able to perfectly quantify the multi-dimensional level of clutter, and so there will be more noise in the clutter measure as well as in any human data captured. 
Note, however, that our experiment has a different design from previous studies: We use a target present/target absent response; the patches are quite difficult to find/reject, resulting in a large number of errors in our experiment, and our targets were naturally present parts of those scenes. Our primary performance metric, A', tells us how easily people find the target, or make the rejection, as opposed to how quickly people make the decision. As we found no correlation between RT and any clutter metric, it is obvious that the time taken for making a decision of present or absent is not strictly related to the clutter when there is plenty of time to make a decision, despite the performance (as A') and RT being related. 
Limitation of clutter
Sub-band Entropy describes a limit on difficulty, but why only a limit? The answer of course is that clutter metrics do not account for all the variability in search performance. Instead, search performance also depends on the nature of the target and the relationship between the target and the background. This result makes a connection with classical visual search theory and the model of Duncan and Humphreys (1989). Duncan and Humphreys argue that, for search tasks with punctate artificial stimuli, two factors determine search difficulty. The first is the heterogeneity of the distractors, i.e., the more similarity between the distractors, the less difficult the search task will be. In experiments with natural scenes, discrete distractors are undefined. However, we have shown that Sub-band Entropy around the target location, a measure of visual variability in the image, is related to search difficulty. The second factor that determines search difficulty is target-distractor difference. In the current study we do not have a measure of this difference directly, but we have shown that the Feature Congestion of the target is important and have argued that the positive correlation between performance and increased Feature Congestion of the target may result from a higher value of Feature Congestion being sometimes more different from the background; this is a good analogue for target-distractor difference. 
Region of influence
The region of effect for the Sub-band Entropy is perhaps the most interesting result, where the peak of the correlation appears to be about 6°–7° (200–250 pixels) eccentricity from the target. It is just outside the size for the typical limit of saccade length, and the efficient search window of 5° (Bertera & Rayner, 2000), and similar in size to the conspicuity zone of Motter and Simoni (2007) around a target where a fixation should land to have a chance of finding the target next fixation. If the clutter is most deterministic within this region, it makes sense to link it to long-range interactions of the clutter contributing features, and these long-range interference interactions would be very similar to the phenomena of Crowding (Levi, 2008) and Masking. In our experiment, it is not possible to separate the effect of crowding (which serves to obscure the nature of the target) and lateral masking (which serves to obscure the existence of the target features), and so it is simpler to treat them as having a single masking effect, that is of reducing the conspicuity (Wertheim, Hooge, Krikke, & Johnson, 2006) of the target. 
If we are to assume that the clutter influence of this masking is across the whole of the 7° of eccentricity, it would make it more likely that crowding, which is shown to occur over greater distances than lateral masking (Levi, 2008), is the prime contributor to the clutter's influence; and by Bouma's rule (Bouma, 1970), the crowding effects would be across a region this size when the target is 12°–15° away from current fixation, which is nearly 1/3 of the screen. This seems excessively large for a normal saccade, but may be enough to disrupt saccade planning. 
Conclusions
In summary, by using real targets in natural scenes, we have shown that some of the existing clutter metrics are one useful alternative to set size for fully natural scenes in visual search tasks, though the correlations only account for a small percentage (<10%) of the variance; smaller than less natural tasks. Clutter, as measured by Sub-band Entropy, is most influential on search performance in a region around the target. This region is relatively large and is outside the range normally reported for other interference effects in peripheral vision such as lateral masking and crowding. Feature Congestion is not a strong predictor of performance when applied to the whole scene, but identifies the important feature of the target that changes the difficulty of the task. 
These results also suggest that these different clutter metrics measure fundamentally different aspects of the image, not just from the differences in process, but from the fact that the profiles across space are qualitatively different. As a result, a combination of these metrics somewhat improves the ability to predict performance. 
We conclude that for search in a natural scene, clutter of the region around the target and the complexity of the target itself are just two determinants of search performance. 
Supplementary Materials
Acknowledgments
This work was supported by a CASE studentship to the first author funded by EPSRC and QinetiQ. The authors of this paper would like to thank the authors of the clutter metrics used in this paper for access to their code. 
Commercial relationships: none. 
Corresponding author: Matthew Asher. 
Email: psmfa@bristol.ac.uk. 
Address: School of Experimental Psychology, University of Bristol, Bristol, UK. 
References
Asher M. Gilchrist I. Troscianko T. Tolhurst D. (2012). Predicting performance in natural scene searches. Journal of Vision, 12 (9): 266, http://www.journalofvision.org/content/12/9/266, doi:10.1167/12.9.266. [Abstract] [CrossRef]
Bergen J. R. Landy M. S. (1991). Computational modeling of visual texture segregation. In Landy M. S. Movshon J. A. (Eds.), Computational models of visual processing (pp. 253–271). Cambridge, MA: MIT Press.
Bertera J. H. Rayner K. (2000). Eye movements and the span of the effective stimulus in visual search. Perception & Psychophysics, 62 (3), 576–585, doi:10.3758/BF03212109. [CrossRef] [PubMed]
Bouma H. (1970). Interaction effects in parafoveal letter recognition. Nature, 226 (5241), 177–178, doi:10.1038/226177a0. [CrossRef] [PubMed]
Bravo M. J. Farid H. (2008). A scale invariant measure of clutter. Journal of Vision, 8 (1): 23, 1–9, http://journalofvision.org/8/1/23/, doi:10.1167/8.1.23. [PubMed] [Article] [CrossRef] [PubMed]
Canny J. (1986). A computational approach to edge-detection. IEEE transactions on pattern analysis and machine intelligence, 8 (6), 679–698. [CrossRef] [PubMed]
Duncan J. Humphreys G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96 (3), 433–458, doi:10.1037//0033-295x.96.3.433. [CrossRef] [PubMed]
Eckstein M. P. (2011). Visual search: A retrospective. Journal of Vision, 11 (5): 14, 1–36, http://www.journalofvision.org/content/11/5/14, doi:10.1167/11.5.14. [PubMed] [Article] [CrossRef] [PubMed]
Felzenszwalb P. F. Huttenlocher D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59 (2), 167–181. [CrossRef]
Henderson J. M. Chanceaux M. Smith T. J. (2009). The influence of clutter on real-world scene search: Evidence from search efficiency and eye movements. Journal of Vision, 9 (1): 32, 1–8, http://journalofvision.org/9/1/32/, doi:10.1167/9.1.32. [PubMed] [Article] [CrossRef] [PubMed]
Levi D. M. (2008). Crowding - an essential bottleneck for object recognition: A mini-review. Vision Research, 48 (5), 635–654, doi:10.1016/j.visres.2007.12.009. [CrossRef] [PubMed]
Lovell P. G. Gilchrist I. D. Tolhurst D. J. Troscianko T. (2009). Search for gross illumination discrepancies in images of natural objects. Journal of Vision, 9 (1): 37, 1–14, http://journalofvision.org/9/1/37/, doi:10.1167/9.1.37. [PubMed] [Article] [CrossRef] [PubMed]
Mack M. L. Oliva A. (2004). Computational estimation of visual complexity. Paper presented at the 12th Annual Object, Perception, Attention, and Memory Conference, Minneapolis, Minnesota.
Macmillan N. A. Creelman C.D. (2005). Detection theory: A user's guide (2nd ed.). Mahwah, NJ: Erlbaum. (Original work published 1991).
Motter B. C. Simoni D. A. (2007). The roles of cortical image separation and size in active visual search performance. Journal of Vision, 7 (2): 6, 1–15, http://journalofvision.org/7/2/6/, doi:10.1167/7.2.6. [PubMed] [Article] [CrossRef] [PubMed]
Neider M. B. Zelinsky G. J. (2011). Cutting through the clutter: Searching for targets in evolving complex scenes. Journal of Vision, 11 (14): 7, 1–16, http://www.journalofvision.org/content/11/14/7, doi:10.1167/11.14.7. [PubMed] [Article] [CrossRef] [PubMed]
Pollack I. Norman D. A. (1964). Non-parametric analysis of recognition experiments. Psychonomic Science, 1, 125–126. [CrossRef]
Rosenholtz R. Li Y. Nakano L. (2007). Measuring visual clutter. Journal of Vision, 7 (2): 17, 1–22, http://journalofvision.org/7/2/17/, doi:10.1167/7.2.17. [PubMed] [Article] [CrossRef] [PubMed]
Rosenholtz R. Li Y. Mansfield J. Jin Z. (2005). Feature congestion, a measure of display clutter. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 761–770). ACM.
Schwarz G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464. [CrossRef]
Toet A. Bijl P. Valeton J. M. (2001). Image dataset for testing search and detection models. Optical Engineering, 40 (9), 1760, doi:10.1117/1.1388608.
Treisman A. M. Gelade G. (1980). Feature-integration theory of attention. Cognitive Psychology, 12 (1), 97–136, doi:10.1016/0010-0285(80)90005-5. [CrossRef] [PubMed]
Wertheim A. Hooge I. Krikke K. Johnson A. (2006). How important is lateral masking in visual search? Experimental Brain Research, 170 (3), 387–402, doi:10.1007/s00221-005-0221-9. [CrossRef] [PubMed]
Wickelgren W. A. (1977). Speed-accuracy tradeoff and information processing dynamics. Acta Psychologica, 41 (1), 67–85, doi:10.1016/0001-6918(77)90012-9. [CrossRef]
Wolfe J. M. (1994). Visual-search in continuous, naturalistic stimuli. Vision Research, 34 (9), 1187–1195, doi:10.1016/0042-6989(94)90300-x. [CrossRef] [PubMed]
Wolfe J. M. (1998). What can 1 million trials tell us about visual search? Psychological Science, 9 (1), 33–39. [CrossRef]
Zelinsky G. J. (2008). A theory of eye movements during target acquisition. Psychological Review, 115 (4), 787–835, doi:10.1037/a0013118. [CrossRef] [PubMed]
Figure 1
 
Example stimuli: From a natural image (left), three sub-images are taken: a scene of 1024 × 1280 pixels (white outline), and a Present-Patch (green) and an Absent-Patch (red), both of which are 90 pixel diameter circles. Present-Patches are always from inside the scene, while Absent-Patches are always from outside the scene. (See Supplementary Materials for further examples.) Original photo copyright of Trey Ratcliff and Stuck in Customs, used under creative commons license, details at http://creativecommons.org/licenses/by-nc-sa/2.0/. The A′ for this scene task was 0.90, across 25 observers.
Figure 1
 
Example stimuli: From a natural image (left), three sub-images are taken: a scene of 1024 × 1280 pixels (white outline), and a Present-Patch (green) and an Absent-Patch (red), both of which are 90 pixel diameter circles. Present-Patches are always from inside the scene, while Absent-Patches are always from outside the scene. (See Supplementary Materials for further examples.) Original photo copyright of Trey Ratcliff and Stuck in Customs, used under creative commons license, details at http://creativecommons.org/licenses/by-nc-sa/2.0/. The A′ for this scene task was 0.90, across 25 observers.
Figure 2
 
Trial sequence for a single trial.
Figure 2
 
Trial sequence for a single trial.
Figure 3
 
Distribution of performance for individual scenes ( Image not available = 0.80, SD = 0.16). Note that some of the results are at or below chance, suggesting some search tasks are so hard that people's performance is worse than a guess. This poor performance could be because of visual properties of the Search task, such as Present Patch is very indistinct, or the Absent Patch is very similar to something else in the scene. Data are shown for 119 scenes; there were 120 scenes in the experiments, but one was discarded from analysis because of ambiguity in the Target-Absent Patch.
Figure 3
 
Distribution of performance for individual scenes ( Image not available = 0.80, SD = 0.16). Note that some of the results are at or below chance, suggesting some search tasks are so hard that people's performance is worse than a guess. This poor performance could be because of visual properties of the Search task, such as Present Patch is very indistinct, or the Absent Patch is very similar to something else in the scene. Data are shown for 119 scenes; there were 120 scenes in the experiments, but one was discarded from analysis because of ambiguity in the Target-Absent Patch.
Figure 4
 
Feature congestion heatmap. Black areas have very few conflicting features whereas light patches are relatively feature intensive.
Figure 4
 
Feature congestion heatmap. Black areas have very few conflicting features whereas light patches are relatively feature intensive.
Figure 5
 
Output of the segmentation algorithm (Felzenszwalb & Huttenlocher, 2004) prior to the segmentation count at different scales. The scale defines the minimum number of pixels that can contribute to an individual segment's area, shown underneath each image. As the figure shows, as this parameter increases, the smaller clusters merge into larger regions, so there are fewer segments.
Figure 5
 
Output of the segmentation algorithm (Felzenszwalb & Huttenlocher, 2004) prior to the segmentation count at different scales. The scale defines the minimum number of pixels that can contribute to an individual segment's area, shown underneath each image. As the figure shows, as this parameter increases, the smaller clusters merge into larger regions, so there are fewer segments.
Figure 6
 
Output from Canny Edge Detector for Edge Density Calculation. Notice the low number of edges in the uncluttered area of image.
Figure 6
 
Output from Canny Edge Detector for Edge Density Calculation. Notice the low number of edges in the uncluttered area of image.
Figure 7
 
Scatter plot of the scene A′ against Sub-band Entropy, r(117) = −0.22, p = 0.02 (left), and Feature Congestion, r(117) = 0.14, p = 0.14 (right), in the scene. See that despite both measures being positive representations of clutter, the regressions are in opposite directions. The more convincing comparison is that Sub-band Entropy seems to have a linear cut off, shown by the red line. This line is a regression of the red points, each of which is lower 2 SD. A′ value for all points with a Sub-band Entropy value ±0.5 the central value of the point (±1 for Feature Congestion). This relationship doesn't appear for Feature Congestion, or either of the other clutter metrics. This lower bound would indicate a theoretical limit on the difficulty of the search, with actual difficulty affected by the properties of both the scene and the targets. The insert images show the scene for some of the extreme ends of the distribution.
Figure 7
 
Scatter plot of the scene A′ against Sub-band Entropy, r(117) = −0.22, p = 0.02 (left), and Feature Congestion, r(117) = 0.14, p = 0.14 (right), in the scene. See that despite both measures being positive representations of clutter, the regressions are in opposite directions. The more convincing comparison is that Sub-band Entropy seems to have a linear cut off, shown by the red line. This line is a regression of the red points, each of which is lower 2 SD. A′ value for all points with a Sub-band Entropy value ±0.5 the central value of the point (±1 for Feature Congestion). This relationship doesn't appear for Feature Congestion, or either of the other clutter metrics. This lower bound would indicate a theoretical limit on the difficulty of the search, with actual difficulty affected by the properties of both the scene and the targets. The insert images show the scene for some of the extreme ends of the distribution.
Figure 8
 
Example of stepping through the image. Boxes indicate progressively smaller subareas of the scene, centered on the target (blue box) to which clutter algorithms were applied. Green boxes indicate that the subarea was completely within the scene, while Red boxes indicate that for a square subarea, focus on the target part of the subarea extended beyond the scene edge, and was filled in with black pixels. On the Right hand side are examples of what the clutter metric would see for different conditions: From the top: Small and wholly within scene; partially outside the scene; the target blank case for the same subarea, and maximum pixel area (1024 × 1024 pixels).
Figure 8
 
Example of stepping through the image. Boxes indicate progressively smaller subareas of the scene, centered on the target (blue box) to which clutter algorithms were applied. Green boxes indicate that the subarea was completely within the scene, while Red boxes indicate that for a square subarea, focus on the target part of the subarea extended beyond the scene edge, and was filled in with black pixels. On the Right hand side are examples of what the clutter metric would see for different conditions: From the top: Small and wholly within scene; partially outside the scene; the target blank case for the same subarea, and maximum pixel area (1024 × 1024 pixels).
Figure 9
 
The regional correlation profiles for the four clutter metrics. The solid blue line shows the correlation profile for the metric against participant A′ for different sizes of subregion, while the red line shows the profile for the case with the Target-Present Patch in the subregion fill in black. The dotted lines represent the correlation coefficient of the whole scene (1280 × 1024) without padding. Note that as shown, correlations for Sub-band Entropy and Edge Density are in the negative direction, and have been rectified to make viewing of the profile easier.
Figure 9
 
The regional correlation profiles for the four clutter metrics. The solid blue line shows the correlation profile for the metric against participant A′ for different sizes of subregion, while the red line shows the profile for the case with the Target-Present Patch in the subregion fill in black. The dotted lines represent the correlation coefficient of the whole scene (1280 × 1024) without padding. Note that as shown, correlations for Sub-band Entropy and Edge Density are in the negative direction, and have been rectified to make viewing of the profile easier.
Table 1
 
Correlation of human performance (A; true positive rate [TPR]; false positive rate [FPR]; reaction time [RT]) metrics against clutter model values calculated for whole the scene (degrees of freedom = 117), showing Spearman's Correlation, as well as the associated p value and the 95% confidence intervals (bootstrapped). For comparison with regression results, r2 is shown below each r. Entries in bold are statistically significant (p < 0.05).
Table 1
 
Correlation of human performance (A; true positive rate [TPR]; false positive rate [FPR]; reaction time [RT]) metrics against clutter model values calculated for whole the scene (degrees of freedom = 117), showing Spearman's Correlation, as well as the associated p value and the 95% confidence intervals (bootstrapped). For comparison with regression results, r2 is shown below each r. Entries in bold are statistically significant (p < 0.05).
Correlation A' TPR FPR RT
r (r2) p CI r (r2) p CI r (r2) p CI r (r2) p CI
Subband Entropy 0.21 0.02 0.35 −0.11 0.24 −0.26 0.2 0.03 0.02 0.05 0.56 −0.12
(0.05) 0.03 (0.01) 0.08 (0.04) 0.35 (0.01) 0.21
Feature Congestion 0.14 0.14 −0.11 −0.01 0.93 −0.28 0.18 0.05 0.32 0.01 0.88 −0.18
(0.02) 0.31 (0.00) 0.18 (0.03) 0.01 (0.00) 0.24
Edge Density 0.21 0.02 0.37 −0.12 0.21 −0.26 0.19 0.04 0.00 0.02 0.82 −0.16
(0.05) 0.03 (0.01) 0.05 (0.04) 0.35 (0.00) 0.18
Segmentation 0.14 0.13 −0.08 0.03 0.71 −0.19 −0.15 0.09 −0.31 −0.01 0.88 −0.20
(0.02) 0.32 (0.00) 0.21 (0.02) 0.04 (0.00) 0.20
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×