This study investigated the mechanisms of grouping and segregation in natural scenes of close-up foliage, an important class of scenes for human and non-human primates. Close-up foliage images were collected with a digital camera calibrated to match the responses of human *L*, *M*, and *S* cones at each pixel. The images were used to construct a database of hand-segmented leaves and branches that correctly localizes the image region subtended by each object. We considered a task where a visual system is presented with two image patches and is asked to assign a category label (either *same* or *different*) depending on whether the patches appear to lie on the *same* surface or *different* surfaces. We estimated several approximately ideal classifiers for the task, each of which used a unique set of image properties. Of the image properties considered, we found that ideal classifiers rely primarily on the difference in average intensity and color between patches, and secondarily on the differences in the contrasts between patches. In psychophysical experiments, human performance mirrored the trends predicted by the ideal classifiers. In an initial phase without corrective feedback, human accuracy was slightly below ideal. After practice with feedback, human accuracy was approximately ideal.

**(distal stimuli) from properties of the retinal image**

*ω***s**(proximal stimuli), where

**and**

*ω***s**represent properties in the two domains. The key statistic needed for making such inferences is the joint probability distribution of the environment and image properties,

*p*(

**,**

*ω***s**), which gives directly the posterior probability of environment properties given observed image properties,

*p*(

**∣**

*ω***s**). If these posterior probabilities and the costs and benefits of the different possible environment-behavior outcomes are known, then it is possible in principle to derive the Bayesian ideal observer for the task and to determine how well that ideal observer performs (e.g., see Geisler, 2008; Geisler & Diehl, 2003; Kersten, Mamassian, & Yuille, 2004; Knill & Richards, 1996). The computations carried out by the ideal observer provide insight into what

*should*be computed by the visual system when performing the task, and hence can suggest principled hypotheses for perceptual mechanisms. The performance of the ideal observer quantifies the potential usefulness of the particular image properties under consideration and provides an appropriate benchmark against which to compare human performance.

*patch grouping task*where the observer is given two equal size image patches sampled from a natural foliage image at some spatial separation and must decide whether they belong to the

*same*or

*different*physical surfaces.

*p*(,

**s**). The obvious pitfall of the hand-segmentation approach is that there may be conditions where human segmentations are not reliable enough to approximate ground truth, although this is unlikely to be a problem in our case (see later).

**can assume one of two nominal values, which we represent with the nominal variable**

*ω**ω*∈ {same, different}. There are many possible image properties (

**s**) that could be considered, but here we focus on the mean luminance and color of each patch as well as the luminance and color contrast of each patch. The specific definitions of the image properties are given in the methods. These local image properties were chosen because they are simple and because they allowed us to determine approximate ideal classifiers.

*texture-removed*condition, the subjects were presented with natural image patches that were modified to preserve only the difference in mean luminance and color. Comparison of human and ideal performance in this case allowed us to determine how efficiently humans use this information. In the

*full*condition, the subjects were presented the actual natural image patches. In the

*texture-only*condition, the subjects were presented natural image patches, where mean luminance and color were equated but where contrast and spatial texture information were unchanged. The

*full*and

*texture-only*conditions are useful for determining whether humans use (in this task) image properties in addition to those we measured.

*L*), middle (

*M*) and short (

*S*) wavelength cones at each pixel location. In the first step of the calibration, we verified that the camera responded linearly over its dynamic range and that the f-stop and shutter speed controls functioned accurately.

*R*), green (

*G*), and blue (

*B*) sensors at every wavelength (see Figure 1a).

*L*,

*M*, and

*S*cones defined as the Stockman, MacLeod, and Johnson (1993) 2-deg cone fundamentals based on CIE 10-deg color matching functions adjusted to 2-deg (solid curves in Figure 1b). The optimal transformation matrix was estimated by minimizing the squared error between predicted and actual log

*L*, log

*M*, and log

*S*responses to natural radiance spectra. Each natural radiance spectrum was simulated by multiplying a randomly selected natural reflectance spectrum (Krinov, 1947) with a randomly selected natural irradiance spectrum (Dicarlo & Wandell, 2000). The histograms of prediction errors is shown in Figure 1c. As can be seen the errors are generally less than a half percent for the

*L*and

*M*responses and less than 1% for the

*S*responses. The camera's effective

*L*,

*M*, and

*S*sensitivity functions are given by the dots in Figure 1b. We also confirmed the accuracy of the camera's calibration by comparing camera and spectrophotometer measurements of the

*L*,

*M*, and

*S*responses for the test patches of a MacBeth color checker illuminated by a tungsten source.

*low quality*. The remaining boundaries were labeled

*high quality*. As a final quality control step, one of the authors (ADI) meticulously fixed all obvious errors in the quality labeling (this was a small percentage). All low quality leaves were then discarded, leaving 1,645 high quality segmented leaf boundaries. Although this restriction may limit the generality of the results to some extent, it guarantees that the segmentations closely approximate ground truth. Furthermore, even with this restriction, the regions of interest were densely segmented. Thus, we believe the statistics reported here are highly relevant to natural tasks in the world of close-up foliage (and perhaps in other environments as well). Figure 3 shows the distribution of the number of polygons used to segment a leaf and distribution of leaf diameters (square root of the total number of pixels in the leaf) in the entire database. The quality of the segmentations is best appreciated by visiting the website http://www.cps.utexas.edu/FoliageDB, which contains images that illustrate the segmentations as well as a database containing the images and segmented objects.

*L*,

*M*, and

*S*cone responses is approximately Gaussian. They determined the three principal axes of the three-dimensional Gaussian distribution using Principal Component Analysis (PCA). These axes are convenient because the marginal distributions along each axis amount to a complete description of the full three-dimensional probability distribution of cone responses. We applied PCA to the distribution of log

*L*, log

*M*and log

*S*responses measured for our image set and we obtained the same eigenvectors reported by Ruderman et al. (see Table 1). Therefore we chose to define our intensity and contrast image properties in an

*lαβ*space like Ruderman et al. given by the following transformations:

*l*is a luminance-like value (which we will call “intensity”),

*α*a blue-yellow color opponent value, and

*β*a red-green color opponent value. The scalars on the right side of each formula convert the values to standard-deviation units (z-scores).

Mean | Covariance Matrix | Eigenvalues | ||||||
---|---|---|---|---|---|---|---|---|

log L | 18.1 | log L | log M | log S | l | α | β | |

log M | 18.0 | log L | 0.398 | 0.395 | 0.375 | 1.15E+00 | 1.38E−02 | 7.52E−05 |

log S | 17.3 | log M | 0.392 | 0.372 | 98.81% | 1.19% | 0.01% | |

log S | 0.373 | |||||||

Correlation Matrix | Eigenvectors | |||||||

log L | log M | log S | l | α | β | |||

log L | 1.000 | 1.000 | 0.973 | 1/ 3 | 1/ 6 | 1/ 2 | ||

log M | 1.000 | 0.973 | 1/ 3 | 1/ 6 | −1/ 2 | |||

log S | 1.000 | 1/ 3 | −2/ 6 | 0 |

*lαβ*space described above. Each pair of image patches can be described in terms of twelve image properties, described below in Equations 2, 3, 4, and 5. To define these properties, let the means of patches “1” and “2” be labeled as

_{1},

_{2},

_{1},

_{2},

_{1}, and

_{2}; and, the standard deviations be labeled as

*σ*

_{ l,1},

*σ*

_{ l,2},

*σ*

_{ α,1},

*σ*

_{ α,2},

*σ*

_{ β,1}, and

*σ*

_{ β,2}. The labels “1” and “2” were assigned so that

_{1}≤

_{2}(this assignment rule is arbitrary and has no effect on analysis results).

*μ*

_{3}”) reflects the mean intensity and color of the patches:

*μ*

_{2}” consisting of only

_{3}”) reflects the mean intensity and color differences:

*δ*

_{3}”) reflects the contrast difference between patches:

*F*

_{3}”) reflects the ratio of contrasts between patches:

_{3}) and the contrast differences (

*δ*

_{3}) because they are intuitive and prominent in the literature. The other properties were included for completeness, because they were available to subjects in the

*texture-removed*and

*texture-only*stimulus conditions (and of course in the

*full*condition).

*diameter*was defined as the square-root of the total number of pixels contained in the leaf. (This unit makes the results relatively invariant with viewing distance.) One circular image patch inside the leaf was randomly selected. A second patch was selected so that its center was a distance of 1/4, 1/2, or 1 diameter from the first patch's center. If the second patch was inside the leaf, the pair was

*same*; if the second patch was outside the leaf, the pair was

*different*. The diameter of the image patches was 1/5 of the reference leaf diameter. Neither patch was allowed to intersect a polygon boundary of the reference leaf.

*full*condition, both patches contained all of their original properties and were essentially circular cut-outs from the image. In the

*texture removed*condition, the patches were uniform, only the average intensity and color differences (Δ

*l*, Δ

*α*, and Δ

*β*) were preserved. In the

*texture only*condition, natural contrast and spatial structure (texture) was preserved, but the difference in average intensity and color were removed by applying uniform addition, in

*lαβ*space, so that Δ

*l*= 0, Δ

*α*= 0, and Δ

*β*= 0. In all conditions, the value

^{2}) and the means of

Average color | Color difference between patches | All contrast and spatial structure (texture) information | ||||||
---|---|---|---|---|---|---|---|---|

α ― | β ― | Δ l | Δ α | Δ β | l*( x,y) | α*( x,y) | β*( x,y) | |

Full | x | x | x | x | x | x | x | x |

Texture Removed | x | x | x | x | x | |||

Texture Only | x | x | x | x | x |

*same*and the rest

*different*, and subjects were informed of this. On every trial, two image patches were displayed on the screen. During the first 20 trials, a large portion of the original image was also displayed next to the patches. These trials served to demonstrate the nature of the task to the subjects. In the subsequent 600 trials, only the two image patches were displayed. All data analyses were performed on these 600 trials. On each trial, subjects viewed the patches for as long as they desired and pushed a button to categorize the patch pair as

*same*or

*different*( Figure 4). Each of the 9 conditions was run in a separate 620-trial block, on separate days, and feedback was not provided.

*same*-

*different*classification decisions. The natural scene statistics measured in the current study consisted of the twelve image properties defined above (

*l*, Δ

*α,*Δ

*β,*Δ

*σ*

_{ l}, Δ

*σ*

_{ α}, Δ

*σ*

_{ β}, ln

*F*

_{ l}, ln

*F*

_{ α}, ln

*F*

_{ β}), and hence we only consider ideal classifiers that use different combinations of these image properties. Specifically, we constructed classifiers for each unique combination of the property sets defined in Equations 2, 3, 4, and 5 (

*μ*

_{2},

*μ*

_{3}, Δ

_{3},

*δ*

_{3}, and

*F*

_{3}). For example, the classifier labeled

*μ*

_{3}Δ

_{3}

*δ*

_{3}

*F*

_{3}used all 12 image properties in Equations 2, 3, 4, and 5, and the classifier labeled Δ

_{3}

*δ*

_{3}used the 6 image properties in Equations 3 and 4.

*texture*-

*removed*conditions, the stimuli contained only five of these properties (

*l*, Δ

*α*, Δ

*β*) and hence an ideal classifier that uses these properties (

*μ*

_{2}Δ

_{3}) is the appropriate benchmark for comparison with human performance. Theoretically it is impossible for humans to outperform this ideal classifier in the

*texture-removed*conditions, so if humans reach optimal performance levels, then humans can solve the task with perfect efficiency.

*full*and

*texture-only*conditions, there are a large number of additional stimulus properties that humans could use in performing the task, and hence humans could potentially perform better than an ideal classifier that is limited to pick from the twelve properties that we considered. If humans outperform an ideal classifier using the twelve properties, then we know that humans are using stimulus properties not in the set. Such an outcome would indicate that ideal performance can be improved by including more stimulus properties, and hence that it would be worth considering other potentially relevant stimulus properties.

*p*(

*ω*=

*same*∣

**s**) and response “

*same*” if the posterior probability exceeds 0.5, which is equivalent to computing the log likelihood ratio:

*same*” if

*z*(

**s**) > 0, where

**s**is the stimulus on the trial. The log likelihood ratio function

*z*(

**s**) will also be referred to here as the optimal decision function.

*k*-nearest neighbor technique. A quadratic classifier is based on the assumption that the underlying probability density functions are Gaussian. Both the exemplar and quadratic classifiers specify a decision function that can be estimated from the natural scene statistics data. Under certain conditions (discussed below) these classifiers are optimal.

*D*-dimensional stimulus

**s**= (

*s*

_{1}…

*s*

_{ D}), the classifier “blurs” the exemplars surrounding

**s**to estimate the log likelihood ratio ( Equation 6) of the two categories at

**s**. Here, we adopted an exponential blurring kernel because it is commonly used and because preliminary simulations suggested that it would work well for our data. The exemplar classifier has a free parameter for each stimulus dimension, which determines the amount of blur along that dimension. Thus, the decision function is defined by

**w**= (

*w*

_{1}…

*w*

_{ D}) is the set of blurring parameters,

**x**

_{ same,i}is the

*i*

^{ th}exemplar in the training set from the category

*same*,

**x**

_{ diff,i}is the

*i*

^{ th}exemplar from the category

*different*, and

*N*is the number of exemplars in each training set.

**w**approach infinity). In this case, the exemplar classifier will approach the Bayesian ideal

*z*(

**s**) in Equation 6. However, under realistic conditions with finite sample sizes, the classifier performs best with some intermediate kernel size.

*p*(

**s**∣

*ω*=

*same*) and

*p*(

**s**∣

*ω*=

*diff*)] are Gaussian, then the optimal decision function is

**A**, vector

**b**, and scalar

*c*are free parameters that define the shape of a quadric surface defined over

**s**(Duda et al., 2001; Fisher, 1936). For example, if there are three stimulus dimensions, then

*q*(

**s**,

**A**,

**b**,

*c*) does not reference data in local neighborhoods. Instead it asserts a global (quadric) shape for the decision function, but this also means that it is less flexible than the exemplar classifier. The quadratic classifier will perform well if the underlying distributions are similar to Gaussian (e.g., generalized Gaussian distributions), but it can perform poorly for more complex distribution shapes.

*p*(

*ω*=

*same*∣

**s**) =

*p*[

*ω*=

*same*∣

*z*(

**s**)]. 1 also shows that the probability of category membership given the value of the decision function

*p*[

*ω*=

*same*∣

*z*(

**s**)] is a non-decreasing function of the value of the decision function

*z*(

**s**). Furthermore, this constraint holds for any decision function that is monotonic with

*z*(

**s**).

*ω*=

*same*be a monotonic function of the decision variable. Let

**s**) represent an estimate of the optimal decision function

*z*(

**s**). Here, the exemplar

*g*(

**s**,

**w**) and quadratic

*q*(

**s**,

**A**,

**b**,

*c*) decision functions can be regarded as estimates of

*z*(

**s**). For any given training data set consisting of

*N*samples, we compute

**s**) for each sample and then sort these

**s**) values into quantiles. By definition, the quantiles contain equal numbers of samples. The

*j*

^{th}quantile will then contain

*n*

_{ j,same}samples that are actually in category

*same*and

*n*

_{ j,diff}samples that are actually in category

*different*. This provides an estimate

*ω*=

*same*∣

*j*) of the posterior probability of category

*same*for each quantile of the decision variable:

*ω*=

*same*∣

*j*) is a noisy estimate of the true posterior probability because it is subject to the effects of sampling noise and systematic errors in the estimated decision function

**s**). For example, the blue curve in Figure 5 illustrates the kind of non-monotonic relationship expected due to sampling noise. The blue curve was obtained by applying the optimal quadratic decision function to 10,000 random samples from Gaussian stimulus distributions and then binning the decision values into 200 quantiles to obtain

*ω*=

*same*∣

*j*). This number of samples is representative of our training data sets. The thick black curve shows the actual posterior probabilities that would be obtained with infinite sample size. The sampling noise apparent in the blue curve can make it difficult to search the parameter space of the decision function. To reduce the effects of sampling noise we enforce the non-decreasing constraint by finding the best fitting monotonic function

*f*(

*ω*=

*same*∣

*j*) through

*ω*=

*same*∣

*j*) by using a monotonic regression algorithm. This is illustrated by the red curve in Figure 5, which is much closer to black curve (the actual posterior probabilities) than to the blue curve.

*J*[

**s**

_{ ω,i})] is the quantile of the value of the decision variable for stimulus

**s**

_{ ω,i}, and

*N*is the number of training stimuli in each category. The quantity

*same*and

*different*in each bin (see 1). The rationale for Equation 10 is that entropy is a principled measure of the uncertainty associated with a probability distribution. At chance performance,

*f*(

*ω*=

*same*∣

*j*) = 0.5, the entropy is 1 bit; and at perfect performance,

*f*(

*ω*=

*same*∣

*j*) = 1 or

*f*(

*ω*=

*same*∣

*j*) = 0, the entropy is 0 bits. Thus,

*D*(i.e.,

*D*is equal to the length of

**s**), then the exemplar classifier has

*D*free parameters and the quadratic classifier has

*D*(

*D*+ 3)/2 − 1 free parameters. (The number of parameters for the quadratic classifier reflects the fact that, without loss of generality, the additive constant

*c*can be set to zero and the vector

**b**can be scaled to a unit vector.)

*l*, Δ

*α,*Δ

*β,*Δ

*σ*

_{ l}, Δ

*σ*

_{ α}, Δ

*σ*

_{ β},

*F*

_{ l},

*F*

_{ α},

*F*

_{ β}) used when making decisions in the patch grouping task. The performance of all the approximate ideal classifiers is summarized in the Supplementary Material. In general, all of the quadratic classifiers achieved levels of performance that were approximately equal or slightly better than the corresponding exemplar classifiers. This suggests that the category distributions are amenable to quadratic classification and that the quadratic classifier's predictions are nearly optimal. In this subsection we present detailed results for four of these classifiers (when the distance between image patches is

^{1}/

_{4}of the leaf diameter): Δ

_{3},

**s**= (Δ

*l*, Δ

*α*, Δ

*β*);

*δ*

_{3},

**s**= (Δ

*σ*

_{ l}, Δ

*σ*

_{ α}, Δ

*σ*

_{ β});

*F*

_{3},

**s**= (

*F*

_{ l},

*F*

_{ α},

*F*

_{ β});

*μ*

_{3},

**s**= (

_{3}, only considers the differences in mean intensity and color between the two patches. The three bivariate scatter plots in Figure 6a show the pair-wise distributions of intensity and color differences for patches from the same surface (green pixels) and from different surfaces (red pixels). The upper left plot shows the distributions for Δ

*l*, Δ

*α*(intensity vs. blue-yellow), the upper right plot for Δ

*l*, Δ

*β*(intensity vs. red-green) and the lower plot Δ

*α*, Δ

*β*(blue-yellow vs. red-green). As expected, the differences between patches from the same surface tend to be more tightly clustered than those from different surfaces. This property of the image statistics can also be seen in the 1D marginal plots arrayed along the diagonal in Figure 6a; the upper plot shows the marginal distributions for Δ

*l*(intensity), the middle plot for Δ

*α*(blue-yellow) and the lower plot for Δ

*β*(red-green). The horizontal bars show the regions where one distribution dominates. We do not plot the full three dimensional distributions, but instead plot, in the lower left, the distributions of the posterior probabilities computed by the classifier for stimuli from the category

*same*(green curve) and for stimuli from category

*different*(red curve).

*δ*

_{3}, uses only the differences in intensity and color contrast between the two patches. This information is also useful, but the performance is poorer than for Δ

_{3}. The third classifier,

*F*

_{3}, uses only the ratio of the intensity and color contrasts between the two patches. This classifier performs slightly less well than

*δ*

_{3}, which uses contrast differences. Finally, the fourth classifier,

*μ*

_{3}, uses only the average intensity and color of the two patches. We included this case because some of this information is available to the human observers, but these stimulus dimensions provide little useful information.

^{1}/

_{2}and 1 leaf diameter (see Table 3). However, the rank ordering of performance across the classifiers is preserved (see Supplementary Material), and the statistical distributions for

*same*and

*different*image patches are similar in shape to those in Figure 6, although the overlap of the distributions increases with distance, primarily because of an increased spread in the

*same*distributions.

Accuracy | Mean Entropy | |||||
---|---|---|---|---|---|---|

1/4 | 1/2 | 1 | 1/4 | 1/2 | 1 | |

μ _{3}Δ _{3} δ _{3} F _{3} | 0.80 | 0.75 | 0.70 | 0.68 | 0.76 | 0.85 |

μ _{2}Δ _{3} δ _{3} F _{3} | 0.79 | 0.75 | 0.70 | 0.68 | 0.77 | 0.84 |

μ _{3}Δ _{3} δ _{3} | 0.80 | 0.75 | 0.70 | 0.68 | 0.76 | 0.85 |

μ _{3}Δ _{3} F _{3} | 0.80 | 0.75 | 0.70 | 0.70 | 0.76 | 0.85 |

Δ _{3} | 0.78 | 0.74 | 0.67 | 0.72 | 0.81 | 0.91 |

*l*alone is sufficient for 74% correct performance, Δ

*β*for 73% correct, Δ

*α*for 67%, and when all three dimensions are combined they yield 78% correct performance. However, if the three dimensions provided statistically independent information, then the accuracy would have reached 88% correct. Furthermore, when all of the contrast dimensions are combined with all of the intensity and color difference dimensions, performance increases to only 79% correct, and when all twelve stimulus dimensions are combined performance increases to only 80% correct (see Table 3).

*l*, Δ

*α,*Δ

*β*) and contrast differences (Δ

*σ*

_{ l}, Δ

*σ*

_{ α}, Δ

*σ*

_{ β}) are the most useful for performance of the patch grouping task in close-up foliage, but even among these there is considerable redundancy so that good performance can be obtained with intensity differences alone (upper left plot in Figure 6a).

_{3}produces small (but significant) improvements in performance.

*full*condition the image patches were taken directly from the natural images (except for a scaling to bring them into the range of the monitor), in the

*texture-removed*condition the image patches retained their mean color as well as their intensity and color differences, and in the

*texture-only*condition the image patches retained everything in the

*full*condition except for the intensity and color differences. The green symbols show the raw accuracy levels. The black symbols show the accuracy corrected for bias, based on standard signal detection analysis (Green & Swets, 1966). This corrected accuracy is given by

*p*

_{s}and

*p*

_{d}are subject's probability correct for

*same*and

*different*stimuli, Φ(·) is the standard normal integral function, and Φ

^{−1}(·) its inverse. As can be seen, the performance of the two observers was highly correlated (

*r*= 0.84), accuracy declined as a function of the separation between the image patches, was similar for the

*full*and

*texture-removed*conditions, and poorer for the

*texture-only*condition.

*full*conditions, the model classifier used intensity and color differences, contrast differences, contrast ratios and means (

*μ*

_{2}Δ

_{3}

*δ*

_{3}

*F*

_{3}); for the

*texture-removed*conditions, the model classifier used only the intensity and color differences and means (

*μ*

_{2}Δ

_{3}); and for the

*texture-only*conditions, the model classifier used only the contrast differences, contrast ratios and the means (

*μ*

_{2}

*δ*

_{3}

*F*

_{3}). Note that only for the

*texture-removed*conditions is the model classifier guaranteed to be close to the true ideal classifier. This cannot be guaranteed in the

*full*and

*texture-only*conditions because the model classifiers were restricted to use the magnitudes of patch contrast while humans were shown the exact spatial pattern of the patches.

*texture-removed*conditions the results show that human efficiency is relatively high. For the other conditions, little can be concluded about efficiency because the model classifiers do not incorporate all the potential sources of information available to the human observers.

*r*= 0.92), but the performance of both subjects improved (

*p*< 0.01 by a bootstrap test). Here human performance has improved and is closer to the model classifiers. This result suggests that although the subjects entered the experiment with good knowledge of the local statistics of foliage images, they were able to adapt to the task with training.

*full*and

*texture-only*conditions does not exceed the performance of model classifiers that were restricted from using any texture and spatial pattern information except the magnitudes of patch contrast. This suggests that perhaps high order texture and spatial pattern information is not used by human observers in this specific task. Another potential way of detecting whether humans are using such information is to compare their performance on small and large image patches. The smallest quartile of image patches in the experiment had diameters ranging from 15 to 23 minutes of arc and the largest quartile had diameters ranging from 43 to 195 minutes of arc. Presumably the larger patches contain more high order information. If humans use this information in the task, then we might expect them to perform better than model classifiers on larger patches. Figure 9 shows the raw accuracy scores for stimuli containing the smallest quartile of image patches (blue symbols) and the largest quartile (red symbols). There is little or no improvement in accuracy as a function of patch size in the

*full*and

*texture-removed*conditions, but in the

*texture-only*conditions, human and model classifier accuracy increased by similar amounts. The curves show the performance levels of the model classifiers for large and small patch sizes (these are the same classifiers used for Figure 8). As the figure shows, these classifiers, which are restricted from using any texture properties except the magnitudes of patch contrast, can account for the change in accuracy (as a function of patch size) shown by humans. These results add further evidence that humans do not make use of texture and spatial pattern information in our patch grouping task.

*same*or

*different*surfaces. We evaluated two kinds of classifier: A quadratic classifier that is guaranteed to be optimal when the underlying distributions are Gaussian, and an exemplar classifier that (with sufficient data) can outperform the quadratic classier if the underlying distributions are not Gaussian. Both classifiers performed similarly (see Supplementary Materials), suggesting that both classifiers approached optimal performance for the local image properties that we analyzed. We found that differences in mean intensity/color (Δ

*l*, Δ

*α,*Δ

*β*) were the most effective properties for solving the task, the contrast differences (Δ

*σ*

_{ l}, Δ

*σ*

_{ α}, Δ

*σ*

_{ β}) were the next most effective, the contrast ratios (

*F*

_{ l},

*F*

_{ α},

*F*

_{ β}) were the next most effective, and the overall average intensity/color (

^{1}/

_{4}leaf diameter (chance = 50%). The fact that performance is better at smaller distances suggests that there may be advantages to using region-growing mechanisms when performing region grouping and segregation.

- the opponent color space used by Johnson, Kingdom, and Baker (2005),
- the CIE L*u*v* and

*lαβ*space consistently gave the best performance.

*same*as a function of the value of the optimized quadratic decision variable. The black curve shows the actual posterior probability function based on the underlying Gaussian distributions. The close agreement between the red and black curves is not surprising given that the quadratic decision function is the optimal decision function when the underlying distributions are Gaussian—the quadratic decision function is the logarithm of the likelihood ratio of the underlying Gaussian distributions (see Equations 6 and 8).

*texture-only*conditions humans do not perform better than a model classifier that uses only simple contrast information. These results (especially the good match with ideal after practice) suggest either that like the model classifiers, the human observers only use simple intensive and contrast differences or, more likely, that the higher order spatial pattern information (e.g., surface markings, shading and shadow patterns, etc.) is not of much value in this particular task.

^{1}/

_{4}diameter distance. These images contain strong shadows, specular highlights and/or different colored regions within a surface. Row 2 shows portions of images for which classifier performance was approximately 80% correct. These images may contain surface markings, minor shading and lighting features and/or similar intensity and color across the surface boundaries. Finally, row 3 shows portions of images where the classifiers performed at or above 90% correct. In these images the leaves are fairly uniform and the intensity and color differences across the surface boundaries are relatively strong.

*p*(Δ

*l*, Δ

*α*, Δ

*β*), in the Ruderman et al. color space, as a function of the spatial distance between pixels within natural images, as well as across completely different images. They assumed that neighboring pixels in the same image are random samples from the same physical surface, pixels from different images are random samples from different surfaces, and that pixels at some separation within an image are random samples from a mixture of both same and different surfaces. These assumptions allowed them to determine for any pair of pixels (at any spatial separation) the posterior probability that they were drawn from the same or different surfaces. They then compared the posterior probabilities generated by this statistical model with judgments of human subjects in a task where the subjects were asked to judge whether or not each pixel in a randomly sampled patch of natural image belonged to the same surface as the pixel at the center of the patch. They found a modest correlation between the human judgments and the predictions of the statistical model, but not nearly as strong a correlation as that observed in the current study (see texture-removed conditions in Figures 8 and 9). There are at least two potential reasons for this difference. One is that the image patches they presented to subjects included more contextual information. Thus, the subjects' judgments may have been strongly influenced by natural image statistics that were not represented in their model. Another potential reason is that their model is based on the assumption that the joint distribution of color differences does not depend on the spatial distance between pixels that fall within the same surface. We found that this assumption is violated in close-up foliage images. In fact, the strong dependence of the joint distributions on distance within surfaces is the reason that the models in the current study predict performance to decline as function of the distance between image patches (see Figures 8 and 9).

*p*(

*ω*=

*same*∣

**s**) and then responding “

*same*” if

*p*(

*ω*=

*same*∣

**s**) > 0.5. A simple application of Bayes' formula shows that for equal prior probabilities of the two categories this rule is equivalent to computing the log likelihood ratio

*same*” if

*z*(

**s**) > 0. It follows that

*p*(

*ω*=

*same*∣

*z*(

**s**)) =

*p*(

*ω*=

*same*∣

**s**), because all points

**s**

_{1}in the set defined by

*z*(

**s**

_{1}) =

*z*

_{1}have the same posterior probability

*p*(

*ω*=

*same*∣

**s**

_{1}). Furthermore, it follows from Equation A1 that if

*z*(

**s**

_{2}) >

*z*(

**s**

_{1}) then

*p*(

*ω*=

*same*∣

**s**

_{2}) >

*p*(

*ω*=

*same*∣

**s**

_{1}) and hence

*p*(

*ω*=

*same*∣

*z*(

**s**

_{2})) >

*p*(

*ω*=

*same*∣

*z*(

**s**

_{1})).

*m*is the number of quantiles, then the number of samples in a quantile is

*n*= 2

*N*/

*m,*and hence

*f*(

*ω*=

*same*∣

*j*) =

*f*(

*ω*=

*diff*∣

*j*) =