The perception of shape from shading (SFS) has been an active research topic for more than two decades, yet its quantitative description remains poorly specified. One obstacle is the variability typically found between observers during SFS tasks. In this study, we take a different view of these inconsistencies, attributing them to uncertainties associated with human SFS. By identifying these uncertainties, we are able to probe the underlying computation behind SFS in humans. We introduce new experimental results that have interesting implications for SFS. Our data favor the idea that human SFS operates in at least two distinct modes. In one mode, perceived slant is linear to luminance or close to linear with some perturbation. Whether or not the linear relationship is achieved is influenced by the relative contrasts of edges bounding the luminance variation. This mode of operation is consistent with collimated lighting from an oblique angle. In the other mode, recovered surface height is indicative of a surface under lighting that is either diffuse or collimated and frontal. Shape estimates under this mode are partially accounted for by the “dark-is-deep” rule (height ∝ luminance). Switching between these two modes appears to be driven by the sign of the edges at the boundaries of the stimulus. Linear shading was active when the boundary edges had the same contrast polarity. Dark-is-deep was active when the boundary edges had opposite contrast polarity. When both same-sign and opposite-sign edges were present, observers preferred linear shading but could adopt a combination of the two computational modes.

*only the tilt angle*of light source directions). More support for this independence can also be found in studies of global shading where the addition of information on light source direction provided by global shading did not improve the accuracy of local shape judgments (Erens, Kappers, & Koenderink, 1993), although it can help resolve the concave/convex ambiguity (Berbaum, Bever, & Chung, 1983; Koenderink & van Doorn, 2004). This independence is interesting because the information on surface orientation conveyed by shading is an angle relative to the light source (Horn, 1975, 1977; Horn & Brooks, 1989)—surface orientation cannot be uniquely determined without a reference light source, yet SFS functions in humans even without such a reference.

*z*

_{1}(

*x*,

*y*) and

*z*

_{2}(

*x*,

*y*) are depth functions estimated by two observers, where

*x*and

*y*are coordinates in the image plane, the relationship between

*z*

_{1}(

*x*,

*y*) and

*z*

_{2}(

*x*,

*y*) can be described by a multiple linear regression:

*z*

_{1}(

*x*,

*y*) =

*az*

_{2}(

*x*,

*y*) +

*bx*+

*cy*+

*d*where the constant

*a*represents a scaling factor and

*b*,

*c*, and

*d*control shearing transforms of the 3D surface. Therefore, we argue that human SFS can be expressed by

*x*,

*y*) is the 3D shape as reported by the observer, and

*z*(

*x*,

*y*) is the provisional 3D surface computed from shading alone and is (we presume) a function of low-level processes and therefore more or less common to all observers. In other words, the 3D information computed from shading alone is not enough to recover a full representation of the 3D shape. We call this 3D information the “proto-surface.” The remaining information is normally provided by other cues. When no other cues are available, participants “make up” for the missing information by applying their “beholder's share” (Koenderink et al., 2001), resulting in large interobserver variances. Following this logic, we believe that the common proto-surface

*z*(

*x*,

*y*) in Equation 1 is a key to understanding human SFS.

*perceptual edges*made of either step changes in luminance or peaks in gradient magnitude, i.e., zero crossings of the second derivative of luminance (Georgeson, May, Freeman, & Hess, 2007; Hesse & Georgeson, 2005). Thus, perceptual edges are located in wherever mean luminance is achieved in sine waves but not so for linear ramps. Further, each grating in Experiment 1 had at least two edges that were equal in magnitude and contrast polarity (edge locations and their contrast polarity are indicated on the cross sections of Figure 2). Stimuli were presented at three orientations (horizontal and ±45°).

Participants | Sine wave | Sawtooth | Square wave | ||||||
---|---|---|---|---|---|---|---|---|---|

−45° | 90° | 45° | −45° | 90° | 45° | −45° | 90° | 45° | |

JCY | 0.98 | 0.99 | 0.90 | −0.9 | 0.99 | 0.99 | −0.61 | 0.85 | 0.86 |

HW | 0.9 | 0.9 | 0.96 | −0.94 | 0.97 | 0.97 | −0.25 | 0.44 | 0.06 |

PS | 0.2 | 0.58 | 0.67 | 0.93 | 0.99 | 0.99 | 0.66 | 0.84 | 0.75 |

Participants | Sine wave | Sawtooth | Square wave | ||||||
---|---|---|---|---|---|---|---|---|---|

−45° | 90° | 45° | −45° | 90° | 45° | −45° | 90° | 45° | |

JCY | 0.2 | 0.23 | 0.09 | −0.23 | 0.05 | 0.04 | −0.28 | 0.002 | 0.03 |

HW | 0.1 | 0.004 | 0.34 | −0.29 | −0.03 | 0.17 | −0.73 | −0.66 | −0.81 |

PS | −0.95 | 0.62 | 0.5 | −0.05 | 0.07 | 0.03 | 0.58 | 0.04 | 0.16 |

Slant proportional to luminance | Height proportional to luminance | |||||
---|---|---|---|---|---|---|

−45° | 90° | 45° | −45° | 90° | 45° | |

JCY | −0.97 | 0.74 | 0.70 | −0.33 | −0.17 | −0.15 |

HW | −0.95 | 0.53 | 0.46 | −0.19 | 0.04 | 0.04 |

PS | 0.7 | 0.71 | 0.7 | −0.08 | 0.02 | 0.02 |

*C*is a constant and

*z*(

*x*) is the surface height. The solution to Equation 2 is given by

*b*in Equation 3) is left completely to the individuals “beholder's share” (Koenderink et al., 2001). When surfaces appeared convex, observers still resolved the ambiguity by assigning roughly the same surface height to boundary positions, resulting in a proportional relationship between perceived slant and luminance. People may be applying additional constraints based on the physics of shapes (Pizlo, 2008). For example, two mounds resting on a single central valley is not stable; it will fall to either side. Mounds with three points of contact at the same height are stable 3D objects. In the concave case, a central ridge can rest on flanks with any height.

*z*″(

*x*) is a good approximation of surface curvature, this idea is consistent with the claim that, with respect to SFS, the visual system codes surface curvature, not height (Johnston & Passmore, 1994b). Note that the LSM does not require precise knowledge of the

*slant angle*component of the light source direction, although it assumes that the illumination tilt angle is inline with the direction of local luminance gradient (Pentland, 1982). Such ability to compute shape without precise knowledge of the light source is presumably a desirable feature as humans readily convert shading to shape without such knowledge.

Participants | Sine | Cropped Sine | Square | Cropped square | ||||
---|---|---|---|---|---|---|---|---|

Gradient | Height | Gradient | Height | Gradient | Height | Gradient | Height | |

TT | 0.98 | −0.44 | −0.23 | 0.96 | 0.98 | −0.32 | −0.26 | 0.76 |

ZXQ | 0.87 | −0.28 | −0.2 | 0.76 | 0.97 | −0.64 | −0.23 | 0.69 |

KL | 0.66 | 0.58 | −0.27 | 0.94 | 0.98 | −0.5 | −0.37 | 0.73 |

*one*source of suitable constraints.

*σ*is the angle between the incident ray and the viewing direction,

*σ*, sin

*σ*) is the vector of the incident ray,

*p*is the slope of the surface along the image plane, i.e.,

*p*= tan

*θ*, and

*p*) is the vector of the surface norm (Figure A1). Note that the image plane has been simplified to be 1D in this expression. We have also omitted the multiplying constant associated with the light source. Taking the Taylor series expansion of Equation A1 about

*p*= 0 up to its quadratic term will give

*p*∣ ≪ 1 (leading to a negligible quadratic term

*p*

^{2}) and the DC term cos

*σ*is ignored, the relationship between image intensity and the surface slope is linear. However, we think that omitting the DC term in Equation A2 is rather ad hoc. A more principled way to decouple the DC term from the linear term (supposing that the quadratic term

*p*

^{2}is small enough to be ignored) is to differentiate the two sides of the equation:

*C*is a constant and

*z*(

*x*) is the height function of the physical surface.

*i*+

*θ*=

*σ*. Here, we do not consider the backlit condition, so

*σ*< 90°. Let

*i*= tan

*σ*+

*θ*) = tan

*σ*equals 90°, then tan

*θ*≥ sin

*θ*cos

*θ*, so the slant angle should always be underestimated. As

*σ*varies and let

*α*= 90° −

*σ*> 0, then tan

*θ*+

*α*) > tan

*θ*when

*θ*is very small, but tan

*θ*+

*α*) < tan

*θ*as

*θ*increases and the difference becomes even larger as

*θ*approaches 90°, i.e., perceived slant is overestimated when the Lambertian surface is only slightly slanted but becomes underestimated when the slant gets larger. The underestimation will increase with the true slant of the surface.