To understand the computations of our visual system, it is important to understand also the natural environment it evolved to interpret. Unfortunately, existing models of the visual environment are either unrealistic or too complex for mathematical description. Here we describe a naturalistic image model and present a mathematical solution for the statistical relationships between the image features and model variables. The world described by this model is composed of independent, opaque, textured objects, which occlude each other. This simple structure allows us to calculate the joint probability distribution of image values sampled at multiple arbitrarily located points, without approximation. This result can be converted into probabilistic relationships between observable image features as well as between the unobservable properties that caused these features, including object boundaries and relative depth. We show that the image model is sufficient to explain a wide range of natural scene properties. Finally, we discuss the implications of this description of natural scenes for the study of vision.

*N*pixels, for which we can now calculate the multivariate joint intensity distribution. This distribution can then be transformed into feature probabilities and related to the unobserved object properties.

*N*pixels by defining an

*object membership function,*, designating which pixels are “members” of which objects. (The symbol is the Hebrew letter

*mem,*chosen to evoke the word

*mem*bership.) In mathematical language, is a set partition of the

*N*pixels, so it is technically a set of sets: each set corresponds to a different object, and it contains the pixel locations at which that object is unobscured by any other objects. For example, one might find in a given image that pixels

**x**

_{1},

**x**

_{2}, and

**x**

_{6}fall into one object,

**x**

_{4}and

**x**

_{5}fall into a different object, and

**x**

_{3}is alone in a third object (Figure 1C). Then, the corresponding object membership function can be expressed as = {{

**x**

_{1},

**x**

_{2},

**x**

_{6}},{

**x**

_{3}},{

**x**

_{4},

**x**

_{5}}}, or abbreviated as = {126∣3∣45}.

*P*(

**I**∣) of image intensities

**I**factorizes into a product over objects: The different object textures are independent, and hence so are their respective pixels. In the above example, the probability distribution of intensities at those six pixels would be

*P*(

**I**∣) =

*P*(

*I*

_{1},

*I*

_{2},

*I*

_{6}∣)

*P*(

*I*

_{3}∣)

*P*(

*I*

_{4},

*P*

_{5}∣). In general, where ∣∣ is the number of objects,

_{n}is the set of pixels falling into the

*n*th object of , and

**I**

_{n}is a vector of intensities at those pixels. The factors

*P*(

**I**

_{n}∣) reflect the joint probabilities of intensities in a single, textured object. This formulation requires that we specify a texture model to provide these probabilities. The texture model may be a simple distribution over object brightness (as in Figure 1A) or may include extra hidden variables for each object, such as texture variance and orientation (as in Figure 1B). Note that the texture model is wholly unrelated to the geometrical aspects of the dead leaves model.

*P*(

**I**) is an average over all possible configurations. The factorized conditional distributions of Equation 1 are then combined in the weighted sum This is a mixture distribution in which each mixture component

*P*(

**I**∣) has a distinct correlation structure among pixels, induced by the different object membership functions. The weighting coefficients are

*object membership probabilities P*(), i.e., the probability of observing the corresponding memberships over all possible dead leaves images with a given shape ensemble. Figure 2 shows examples of simple mixture distributions.

*P*() represents how frequently the

*N*selected pixels are grouped into different objects according to . We calculate each probability recursively, generalizing an argument of Ruderman (1997). To do so, we must introduce some additional notation. We designate

_{\n}as the object membership function that remains after removing the

*n*th object. We also define a Boolean vector

**(,**

*σ**n*) with

*N*components

*σ*

_{i}(,

*n*) = (

**x**

_{i}∈

_{n}) that each indicate whether the pixel

**x**

_{i}is contained in the

*n*th object of . For instance,

**({126∣3∣45},3) = (0,0,0,1,1,0).**

*σ**Q*

_{σ(,n)}, that this top object includes all of the pixels in the set

_{n}, while excluding all the other pixels in

_{\n}. Such an arrangement

*partially*satisfies the membership constraint imposed by . However, for this object configuration to contribute to

*P*(), we still need to ensure that the excluded pixels are also grouped appropriately by objects “deeper” in the image. The probability that deeper objects satisfy these reduced membership constraints is

*P*(

_{\n}). Note that this probability is unaffected by whether the deeper objects would have enclosed the pixels in

_{n}: Objects at those positions are already occluded by the top object. There is also a probability

*Q*

_{σ(,0)}that the top object contains

*none*of the

*N*selected pixels. Given this event, the probability of finding objects deeper in the stack that satisfy the membership constraints is just the original factor

*P*(). Summing together all possibilities for the top object, we find

*P*() =

*Q*

_{σ(,0)}

*P*() + ∑

_{n=1}

^{∣∣}

*Q*

_{σ(,n)}

*P*(

_{\n}). Solving for

*P*() gives the recursion relation Crucially, the image that remains below the top object is yet another dead leaves image, with all the same statistical properties as before, so we can calculate

*P*(

_{\n}) by the same formula, recursively. Eventually, the recursion terminates when there are no pixels left in , with

*P*(Ø) = 1.

*Q*

_{σ(,n)}depend on the particular shape ensemble and the chosen set of pixels. In the Methods section, we derive the general form of these factors for arbitrary shapes with smooth boundaries. The Supplementary material provides mathematical details of the calculation for the scale-invariant ensemble of elliptical shapes used from here onward, as well as the software implementing the calculation.

*B*

_{N}, which unfortunately grows slightly faster than exponentially. In practice, this restricts exact calculation to around a dozen pixels. Despite this limitation, interesting insights can be gained both by using few pixels or few subsets of possible object memberships and by analyzing the general behavior in various limits. For instance, in low-clutter conditions when the maximal distance between pixels

*u*is much smaller than the minimum object size

*r*

_{−}(e.g., Figure 1A), object membership probabilities

*P*() behave as

*P*() ∼ (

*u*/

*r*

_{−})

^{∣∣−1}(Figure 2). Consequently, edges are rare (∣

_{edge}∣ = 2), T-junctions are rarer (∣

_{T-junc}∣ = 3), and every other feature is rarer still (∣∣ > 3). In the sections below, we use the solution of the dead leaves model and relevant approximations to explain complex statistical properties of natural scenes.

*f*=

*F*

**I**for some filter matrix

*F*. We set the filters of

*F*to be wavelets, local derivative operators (edge detectors) with a given orientation and scale. Choosing Haar wavelets, which weight intensities by ±1, emphasizes non-Gaussianity of natural feature distributions and thereby establishes a more stringent test for the image model (Lee et al., 2001).

*P*(Δ) ∝

*e*

^{−∣Δ∣ β }for an exponent

*β*near 1 (Lee et al., 2001; Mallat, 1989; Ruderman & Bialek, 1994; Wainwright & Simoncelli, 2000). The heavy tails in these distributions cannot be obtained from a standard correlated Gaussian model, because any projection of a multidimensional Gaussian is again Gaussian. Higher order statistical structure is required.

*f*=

*I*

_{1}−

*I*

_{2}. The resultant feature distribution is a mixture of two components, a narrow central peak and a broader tail (Figures 3B and 3E). While this is a more kurtotic distribution than the Gaussian texture, it does not closely match natural derivative histograms (Figure 3A).

*X*

_{i}, one might naively expect the total

_{i}=

*I*

_{j}to be Gaussianly distributed. However, the usual central limit theorem does not apply, because of the correlations between the variables that are induced by spatially extended objects. These correlations can be segregated by re-expressing the total intensity in an image patch as a sum of the mean intensities in visible objects, weighted by their visible areas,

_{i}= ∑

_{n=1}

^{∣∣}∣

_{n}∣

*J*

_{n}, where ∣

_{n}∣ is the number of pixels sampling from the

*n*th object, i.e., its visible area, and

*J*

_{n}is the average intensity in that area. The summands ∣

_{n}∣

*J*

_{n}are approximately independent because they correspond to different objects. (They are not strictly independent because the visible areas ∣

_{n}∣ are constrained to add up to the total area of the patch.) This way of writing

_{i}reveals two reasons why the sum does not converge to a normal distribution: the number of summands is a random variable, and the summands themselves have long-tailed distributions.

*J*

_{n}, is distributed more narrowly than this, then the distribution of visible areas ∣

_{n}∣ will dominate the tail behavior of the products ∣

_{n}∣

*J*

_{n}.

_{i}is thus a sum over a random number (one per visible object) of power-law-distributed terms. A generalized central limit theorem holds that the distribution of such random sums is a two-sided exponential distribution when summands are power-law distributed with exponents of 2 or higher (Kotz, Kozubowski, & Podgórski, 2001; Figure 3D).

^{1}Indeed, Figure 3D shows nearly straight tails on the log-probability plot. In natural images, areas of homogeneous regions are again distributed as power laws, but depending on the particular image, the exponents can be slightly below 2 (Alvarez et al., 1999). Another generalized central limit theorem shows that under such conditions the distribution of the sum has slightly heavier tails (Gnedenko, 1972), as observed (Figure 3A). While these considerations apply directly to the average pixel intensities, they pertain equally to intensity differences.

^{2}

*P*(

*f*

_{2}∣

*f*

_{1}) broadens with ∣

*f*

_{1}∣ without any change in the mean (Figure 5E). This explains why the variability in one feature amplitude increases with the amplitude of a nearby feature.

*causes*of image features—objects and their attributes—are represented explicitly. Our results relate these causes to each other as well as to the observable, pixel-based image features.

*ordered*object membership function . (This Hebrew letter,

*final mem,*appears only at the end of words and is used here to indicate that object order matters.) Whereas was a set of subsets, is an ordered set of subsets, with

_{n}representing the pixels contained by the

*n*th highest object sampled by any of the

*N*selected pixels. For instance, Figure 1C depicts the ordered set = {{

**x**

_{3}},{

**x**

_{4},

**x**

_{5}},{

**x**

_{1},

**x**

_{2},

**x**

_{6}}}, which we abbreviate as {3 > 45 > 126}. The probabilities of these ordered object membership functions can be calculated by a very similar recursion equation as that used for the unordered variant (Methods section). Figure 6D plots likelihood ratios using a new definition of shared cause, which includes depth (Table 1), in which a common cause occurs whenever a single object is

*on top*at both edges. The results show that certain edges are more likely to have a common cause, whereas other edges are more likely to be independent. The pattern closely resembles results of Geisler and Perry (2009) and Geisler et al. (2001; Figure 6A). Since those statistics were predictive of human judgments about contour completion across occluders, the dead leaves model also qualitatively predicts human inference about such ambiguous stimuli.

(A) Classification of unordered | |||
---|---|---|---|

S: Shared cause | D: Different causes | ||

1256 ∣ 34 ∣ 78 | 12 ∣ 34 ∣ 56 ∣ 78 | ||

1278 ∣ 34 ∣ 56 | |||

12 ∣ 56 ∣ 3478 | |||

12 ∣ 78 ∣ 3456 | |||

1256 ∣ 3478 | |||

1278 ∣ 3456 | |||

(B) Classification of ordered | |||

S: Shared cause | D: Different causes | ||

1256 > 34 ∣ 78 | 34 ∣ 78 > 1256 | 34 > 1256 > 78 | 78 > 1256 > 34 |

3478 > 12 ∣ 56 | 12 ∣ 56 > 3478 | 12 > 3478 > 56 | 56 > 3478 > 12 |

1278 > 34 ∣ 56 | 34 ∣ 56 > 1278 | 34 > 1278 > 56 | 56 > 1278 > 34 |

3456 > 12 ∣ 78 | 12 ∣ 78 > 3456 | 12 > 3456 > 78 | 78 > 3456 > 12 |

1256 ∣ 3478 | 12 ∣ 34 ∣ 56 ∣ 78 | ||

1278 ∣ 3456 |

*least unlikely*ones are parallel in the model but perpendicular in natural images. This detail depends on the particular shape ensemble. Elliptical objects have no edge pairs arranged in a T, whereas more complex shapes with inflection points might. Simulations with non-convex shapes (dumbbells, rings, random Bezier curves) do show such differences in details of their likelihood ratios (data not shown). However, there is no change in the basic trend, which favors common causes for edges with similar orientations and headings but otherwise favors different causes.

Assumption | Purpose |
---|---|

General | |

Occlusion | Improve realism of model images |

Object independence | Solve general dead leaves model |

Shape | |

Elliptical shapes | Find exact membership probabilities P( ) |

Inverse-cubic size distribution | Approximate scale invariance |

Minimum size bigger/smaller than image feature | Simulate low-/high-clutter images |

Texture | |

Gaussian white noise texture | Simplify conditional intensities P(I∣ ) |

Low-contrast texture | Emphasize consequences of occlusion |

_{edge}= {1∣2}. Assuming that objects have Gaussian distribution intensities and the image sensors have some additive Gaussian noise, the probability of an edge given the intensity difference Δ across space is

*P*(

_{edge}∣Δ) = [1 +

*k*exp(−

*β*Δ

^{2})]

^{−1}, where

*k*and

*β*are positive constants that depend on the spatial scale, overall image contrast, and sensor noise. This function resembles the contrast energy model of complex cells (Adelson & Bergen, 1985) with a saturating non-linearity. Thus, we might interpret complex cell activity as encoding the probability of a local edge in a world of objects. It will be interesting to explore such a model more thoroughly and to see if other neurons have properties that map nicely onto representations of still more complex features within the dead leaves model. Since synaptic connections are modified by neural correlations, and the occlusion model explains stimulus correlations, the model may also help generate predictions about cortical circuitry that has matured in the natural world.

*P*() in terms of some geometric factors

*Q*

_{σ(,n)}. These factors represent the probability that points

**x**

_{i}∈

_{n}are included in one object while the other points

**x**

_{i}∈

_{\n}are not, averaged over all object positions and shapes. For convenience, we name these

*Q*

_{σ}“inclusion probabilities.” Note that these quantities involve the geometry of single objects only; the recursion of Equation 3 converts them into the multiobject probabilities

*P*() that characterize the dead leaves model geometry. In this section, we show how the inclusion probabilities can be calculated for arbitrary objects.

*L*

_{σ}(

**x**,

*ρ*), which is an indicator function over space

**x**and shape parameter(s)

*ρ*. The function can indicate either the inside or the outside of an object centered on the origin, depending on the binary variable

*σ*∈ {0,1}:

*L*

_{σ}(

**x**,

*ρ*) equals

*σ*when pixel

**x**is inside the object and 1 −

*σ*when

**x**is outside it (Figure 9A). With this definition, is the inclusion probability that a leaf with shape

*ρ*and location

**c**includes all sample points

**x**

_{i}∈

_{n}and excludes all remaining

**x**

_{i}∈

_{\n}(Figure 9B).

*Q*

_{ σ }in Equation 3 are averages over all possible object shapes and positions. Thus, we are interested in the average of

*Q*

_{ σ }(

**c**,

*ρ*) over the distribution of leaf positions

*P*(

**c**) and shapes

*P*(

*ρ*):

**c**to obtain

*Q*

_{ σ }(

*ρ*), and subsequently calculate the average over object shape

*ρ*.

*P*(

**c**) = 1 and the

**c**-integral of binary-valued

*Q*

_{ σ }(

**c**,

*ρ*) gives the inclusion probabilities for a given

*ρ*as the areas of the regions with constant

*Q*

_{ σ }(

**c**,

*ρ*).

**V**=

**c**has divergence (in

**c**-space) of ∇ ·

**V**= 1, so integrating this divergence over the desired region gives the enclosed area. The divergence theorem says that this integral equals the flux of

**V**across the region boundary:

*C*is the region in

**c**-space where

*Q*

_{ σ }(

**c**,

*ρ*) = 1, ∂

*C*is its boundary,

*ds*is the arc length. The boundary is composed of piecewise smooth segments of the object outline centered on the sample points

**x**

_{ i }(Figure 9B). We index the relevant segments by

*m*∈

*M*and represent the curves by

**s**

_{ m }(

*t*):

*t*

_{ m }′ <

*t*<

*t*

_{ m }″ for

*t*between the cusps at which the contour changes direction abruptly. The integral along each segment is then

*Q*

_{ σ }(

*ρ*) =

*A*

_{ m }.

*Q*

_{ σ }(

*ρ*) over the shape ensemble

*P*(

*ρ*), we need to compute ∫

*Q*

_{ σ }(

*ρ*)

*P*(

*ρ*)

*dρ*. Note that the set of piecewise smooth segments composing the contour ∂

*C*may change depending on

*ρ*(Figure 9C), so the

*ρ*-integral must itself be done piecewise. We define an index ℓ specifying the regions

*R*

_{ℓ}in

*ρ*-space where a given set of segments

*M*

_{ℓ}composes the contour. Within

*R*

_{ℓ,}the integral over

*ρ*can then be carried out on each summand

*A*

_{ m }separately, yielding

*r*

_{+}= 100 and

*r*

_{−}= 1. For Figures 3–5, we used high-clutter conditions by setting the pixel spacing to 5

*r*

_{−}. For Figure 6, to replicate the relatively low-clutter conditions under which the natural image statistics were measured empirically (Geisler et al., 2001), we chose the pixel spacing to be

*r*

_{−}/5.

_{0}and Ξ

_{1}. For this texture ensemble, the conditional distribution of pixel intensities is

*P*(

**I**∣) ∝ exp(−

**I**

*C*

_{}

^{−1}

**I**), with zero mean and covariance matrix given by (

*C*

_{})

_{ij}= Ξ

_{0}∑

_{n=1}

^{∣∣}

*σ*

_{i}(,

*n*)

*σ*

_{j}(,

*n*) + Ξ

_{1}

*δ*

_{ij}. In the results shown in this paper, Ξ

_{0}= 1 and Ξ

_{1}= 0.01.

**f**=

*F*

**I**, the conditional distribution is

*P*(

**f**∣) ∝ exp(−

**f**

*FC*

_{}

*F*

^{−1}

**f**) and the joint probability is the mixture distribution

*P*(

**f**) = ∑

_{}

*P*()

*P*(

**f**∣).

*X*

_{i}, defining

_{i}over an image patch has a conditional distribution

*P*(

_{i}∣), which is Gaussian with zero mean and variance Here

*C*

_{}is the covariance matrix of all pixels in image patch

*X*

_{i}conditioned on the object membership function , and ∣

_{n}∣ is the number of sampled pixels falling into the

*n*th object. Thus, the variance increases with the square of the sampled area of each object and is maximized when only one object covers the sampling area.

*H*=

_{1}−

_{2}between sums

_{1}and

_{2}over two distinct regions (Figure 10A). The corresponding variance does not necessarily increase with the square of each object's sampled area, because some of the samples are weighted with opposite signs and thus cancel. The conditional covariance between two Haar wavelets

*H*

_{i}and

*H*

_{j}is where ∣

_{n}

^{k,i}∣ is the number of samples in region

*k*of wavelet

*I,*which fall into the

*n*th object (Figure 10A), and

*N*

_{ij}is the number of samples shared by wavelets

*H*

_{i}and

*H*

_{j}.

*θ*. The covariance of the mixture distribution conditioned on this object configuration is

*N,*this covariance matrix is nearly singular, with almost unity correlation coefficient between the variations along

*H*

_{1}and

*H*

_{2}. Contours of the corresponding bivariate Gaussian have maximum extent at feature amplitudes proportional to (±

*θ,*±(

*θ*)). The envelope of these contours produces the diamond shown in Figures 4B and 4D.

*d*

_{l}and

*d*

_{r}, then the covariance of the mixture distribution is

_{ i }= min(

*d*

_{ i },1 −

*d*

_{ i }) and the width of each lobe of the Haar wavelet is 1. These components all have a correlation coefficient of nearly ±1/2 but have different variances. By changing

*d*

_{l}and

*d*

_{r}for the configuration shown in Figure 10C, we obtain conditional distributions with the ensemble of contours seen in Figure 4I. Their envelope produces the pillow shape (Figure 4).

^{−1}(1/2.75) = 40° (Figure 11A) that matches the orientation bandwidth used in Geisler et al. (2001). Pairs of edges are described by two such bisected four-pixel clusters (Figure 11B). This definition of edge pairs restricts these eight pixels to have one of only seven possible object membership functions (Table 1A). In one of these configurations, every pixel pair is a member of a different object: = {12∣34∣56∣78}. In the remaining configurations, at least two pairs are members of the same object (Figure 6D). This latter category serves as one possible definition of a “shared cause” for the two edges.

*ordered*membership functions (Generalizations section), and we classify these according to whether a pair of pixels from each edge both falls into the same object

*and*that object is above the object present at the remaining pixels (Figure 6E). The relevant are listed in Table 1B.

*L*= ∑

_{∈S}

*P*()/∑

_{∈D}

*P*(), where

*S*and

*D*are the sets of membership functions categorized as shared or different causes, respectively. This likelihood ratio varies as a function of the positions and relative orientation of the two edge pairs (Figures 6C–6D).

*ordered*object membership function rather than an unordered membership function . The recursion in this case is even simpler than Equation 3: There is no summation here because there is only one term for which the first object is highest in the stack of objects. One may use a partial ordering if not all relative depths are of interest, and then there will be a sum over arrangements consistent with the partial ordering.

*ρ*is now a vector of

*N*shape parameters, with

*ρ*

_{ n }indicating the shape parameters for the topmost object present at pixel location

**x**

_{ n }.

*τ,*visibility decays exponentially with depth. Objects are generated until each point is covered at least ten times the extinction coefficient, 10/log

*τ*

^{−1}, which ensures that omitting deeper objects will give only a negligible error. The software generating dead leaves images is available as Supplementary material.