In his influential book,
The perception of the visual world, Gibson (
1950) introduced the concept of texture gradients as a potential source of optical information about the layout of surfaces in the environment. Since then, many different properties of optical texture have been identified that can be used to estimate various aspects of 3D surface structure (Gårding,
1992,
1993; Purdy,
1958), and many psychophysical experiments have been performed in an effort to determine which of these potential sources of information are most relevant to human perception. The literature on this topic contains many conflicting results, however, and there is no clear consensus as yet on how patterns of optical texture are perceptually analyzed by human observers.
One of the most popular computational models for estimating 3D shape from texture is based on an assumption that variations in reflectance on a surface are statistically isotropic—i.e., that they are approximately equal in all directions—and that anisotropic patterns of texture within a visual image can be attributed to variations in surface slant. These distortions are often referred to as texture foreshortening. They can be measured from the aspect ratios of individual optical texture elements (Stevens,
1981a,
1984), the distribution of edge orientations in each local image region (Aloimonos,
1988; Blake & Marinos,
1990; Blostein & Ahuja,
1989; Marinos & Blake,
1990; Witkin,
1981), or from the relative anisotropy of their local amplitude spectra (Bajcsy & Lieberman,
1976; Brown & Shvayster,
1990; Krumm & Shafer,
1992; Sakai & Finkel,
1995; Super & Bovik,
1995).
Although foreshortening is probably the most popular source of information in computational analyses of shape from texture, there is a growing body of evidence to indicate that it has little or no influence on observers' perceptions of surface slant (Backus & Saunders,
2006; Gillam,
1970; Todd, Christensen, & Guckes,
2010; Todd, Oomes, Koenderink, & Kappers,
2004; Todd, Thaler, & Dijkstra,
2005; Todd, Thaler, Dijkstra, Koenderink, & Kappers,
2007). Consider, for example, the image of a slanted plane under orthographic projection in the top panel of
Figure 1. Any of the computational models cited above would be able to correctly determine that the depicted surface has a 50° slant relative to the frontoparallel plane. However, when the same image is presented to human observers, they are unable to perceive any slant at all.
An alternative source of information for estimating shape from texture is based on an assumption that distributions of reflectance on a surface are statistically homogeneous—i.e., that they are the same at all surface locations—and that systematic gradients of optical texture across different regions of an image can therefore be attributed to variations in surface depth, slant or curvature. The first computational analysis of texture gradients was performed by Purdy (
1958), who examined the spatial variations of several different aspects of optical texture elements, including their major axes, minor axes, areas, aspect ratios and density. He showed that the any of these properties
μ(
ϕ,
γ) at a point on a planar surface in a visual direction (
ϕ,
γ) can be used to determine the optical slant
σ(
ϕ,
γ) at that point from the normalized gradient defined by the following equation:
where k is a constant that is peculiar to each type of gradient. Gårding (
1992) later extended this analysis to include curved surfaces. He showed that the normalized major axis gradient is the only one that that can be used to estimate local surface slant independently of surface curvature. This is often referred to as a scaling gradient. It can be measured by calculating the relative lengths of texture elements in different regions of an image (Purdy,
1958), or from the affine correlations between the amplitude spectra in those regions (Clerc & Mallat,
2002; Malik & Rosenholtz,
1994,
1997).
A recent series of studies by Todd et al. (
2005,
2007,
2010) has shown that the ability to distinguish slants from texture for planar or asymptotically planar surfaces requires relatively large viewing angles, which provides strong evidence that observers' perceptions cannot be based on computational analyses within small local neighborhoods, such as spatial derivatives. In light of this finding, they proposed that perceived shape from texture for this class of surfaces is based on an alternative source of information called scaling contrast, as defined by the following equation:
where Z(
ϕ,
γ) is the relative depth of a surface point in a visual direction (
ϕ,
γ),
λ(
ϕ,
γ) is the projected major axis of a texture element at that point, and
λMax and
λMin are the maximum and minimum major axis lengths. Note that this is similar to a normalized gradient, but it is designed to evaluate the variations in scaling over large regions of visual space, rather than small local neighborhoods.
The scaling contrast model predicts that slanted surfaces presented under orthographic projection should have no apparent depth at all because there are no variations in texture scaling. For slanted surfaces under perspective projection, the model predicts that the magnitude of perceived relief should increase systematically with the depicted field of view as has been demonstrated by Backus and Saunders (
2006) and Todd et al. (
2005,
2007,
2010). It can also account for differences in perceived relief between concave and convex surfaces (Todd et al.,
2005,
2007), and the perceptual distortions that occur when images of textured surfaces are observed from an inappropriate viewing distance (Backus & Saunders,
2006; Todd et al.,
2007,
2010).
If scaling contrast is the primary source of information for the perception of 3D shape from texture, then the magnitude of perceived relief should be reduced to zero when a surface is rendered under orthographic projection so as to remove all variations in scaling. That is exactly what occurs with images of slanted planes as is shown in
Figure 1. However, continuously curved surfaces can produce a compelling perception of shape from texture, even under orthographic projection (see
Figure 2), and the apparent depths of those surfaces are relatively unaffected by the addition of perspective (see Todd & Akerstrom,
1987; Todd & Oomes,
2002). These observations suggest that texture scaling may be the primary source of information for surfaces that are planar or asymptotically planar, but that some other aspect of texture must also be used in other contexts.
Could that other aspect of texture be foreshortening? The perceptual appearance of the surface in
Figure 2 provides strong evidence that this is probably not the case. This surface has an anisotropic texture that is elongated in a vertical direction. If any of the foreshortening models cited above were applied to this image, all of the points along a vertical cross-section through its center would have an estimated slant of 41°, and they would all be interpreted as inflexion points in the depth profiles of the horizontal cross-sections. That is not how the surface is perceived, however. Despite the fact that the texture elements in the central axis of the cylinder are all noticeably foreshortened, they appear nonetheless to have a frontoparallel orientation, and they are perceptually interpreted as local depth minima rather than inflection points.
Todd and Akerstrom (
1987) investigated observers' perceptions for displays that were quite similar to
Figure 2, and concluded that the primary source of information must be the systematic variations of texture element widths in the direction of slant. Fleming, Torralba, and Adelson (
2004) have more recently pointed out that the pattern of texture compression along an arbitrary surface cross-section has a functional form that is quite similar to the pattern of surface slant. In the discussion that follows we will extend these observations to develop a specific computational model for estimating the 3D shapes of objects like the one in
Figure 2.
The basic insight underlying this model is that the optical projections of texture elements at local depth extrema along a surface cross-section can be used as anchors to estimate the local slants of all other points that are aligned in the same direction.
Figure 3 shows a narrow cross-section of the image in
Figure 2, together with the depth profile of the depicted surface. The projected width of a texture element at a position
ϕ on the cross-section is labeled
ω(
ϕ), and the local width maximum that occurs at the depth extremum is labeled
ω Max. Consider a vector
N(
ϕ) in the plane of the surface cross-section that is normal to its outer boundary at the position
ϕ. If the surface is depicted with negligible perspective and its texture is homogeneous, then the angle
σ(
ϕ) of this vector relative to the z-axis can be determined by the following equation:
It is important to recognize that the vector
N(
ϕ) will not in general be a surface normal. Rather, it is the component of the surface normal that is in the plane of the cross-section. Similarly, surface points at the local depth extrema will not generically have fronto-parallel orientations. They are the points along a cross-section that are closer to fronto-parallel than their neighbors. It should also be noted that edge density along a cross-section is the reciprocal of texture width, and could therefore provide an alternative measure for calculating slant in
Equation 3. Indeed, this would be the preferred approach for surfaces with more fine scale textures whose elements are not easily individuated (see Thaler, Todd, & Dijkstra,
2007).
In order to effectively implement this model using either widths or densities it is necessary to identify the local depth extrema along arbitrary surface cross-sections. Todd et al. (
2004) have shown that observers can do this quite accurately, even in some cases when the depicted surface has an inhomogeneous texture. The stimuli in that study were created using a technique called volumetric texturing, which is analogous to sculpting an object out of a solid material, such as wood or marble. A schematic illustration of this process is provided in
Figure 4A. This texture was created using a set of spheres that were distributed in 3D space such that their centers were constrained to lie on an ellipsoid surface. Any region of the ellipsoid that cut through a sphere was colored black, and any region that cut through the space between spheres was colored white. This produces a pattern of circular polka dots on the surface that is statistically homogeneous and isotropic. Local depth minima along surface cross-sections can be identified in this case by local maxima in the widths of the texture elements in the direction of the cross-section, or by local minima of texture foreshortening.
The images presented in
Figures 4B and
4C show the same ellipsoid surface with volume textures that were created using a set of ellipsoids rather than spheres. For the surface shown in
Figure 4B, the texture ellipsoids were oriented in a horizontal direction. Note in this case that the overall pattern of perceived relief is completely incompatible with what would be expected from an analysis of foreshortening gradients, because the regions that appear closest in depth are not the ones whose optical texture elements are closest to circular. The perceptual appearance is consistent, however, with what would be expected from an analysis of directional width gradients. That is to say, along any given surface cross-section in any given direction, the apparent near point is always located in a region where the width of the texture elements in that direction is a local maximum. Even though this type of texture is inhomogeneous, an analysis of directional width gradients will produce qualitatively accurate estimates of 3D shape whenever the ellipsoids that define the texture elements are oriented either parallel or perpendicular to the surface cross-section an observer is asked to judge. These estimates will be systematically distorted, however, when the ellipsoids used to generate the texture are oriented in depth at an oblique angle to the designated cross-section. An example of this condition is shown in
Figure 4C. Note in this case that the object appears perceptually to be slanted to the right, in precisely the same way that would be theoretically predicted by an analysis of its directional width gradients. The research described in the present article was designed to provide more rigorous empirical support for these anecdotal observations.