We examined visual estimation of surface roughness using random, computer-generated, three-dimensional (3D) surfaces rendered under a mixture of diffuse lighting and a punctate source. The angle between the tangent to the plane containing the surface texture and the direction to the punctate source was varied from 50 to 70 deg across lighting conditions. Observers were presented with pairs of surfaces under different lighting conditions and indicated which 3D surface appeared rougher. Surfaces were viewed either in isolation or in scenes with added objects whose shading, cast shadows, and specular highlights provided information about the spatial distribution of illumination. All observers perceived surfaces to be markedly rougher with decreasing illuminant angle. Performance in scenes with added objects was no closer to constant than that in scenes without added objects. We identified four novel cues that are valid cues to roughness under any single lighting condition but that are not invariant under changes in lighting condition. We modeled observers' deviations from roughness constancy as a weighted linear combination of these “pseudocues” and found that they account for a substantial amount of observers' systematic deviations from roughness constancy with changes in lighting condition.

*not*invariant under changes in lighting conditions. There is evidence that the perception of surface relief (Belhumeur, Kriegman, & Yuille, 1999; Berbaum, Bever, & Chung, 1983; Oppel, 1856; Yonas, 1979) and shape estimated from shading (Koenderink, van Doorn, Christou, & Lappin, 1996a, 1996b; Langer & Bülthoff, 2000) are far from invariant under changes in lighting conditions.

*x,y,z*) ( Figure 3). The origin (0,0,0) is in the frontoparallel plane on which we present stimuli (the

*stimulus plane*). The

*Z*-axis lies along the observer's line of sight. The

*X*-axis is horizontal and the

*Y*-axis is vertical. The punctate illuminant direction is represented as a vector

**P**using spherical coordinates (

*ψ,ϕ,d*). We define elevation

*ϕ*as the angle between

**P**and the projection of

**P**on the

*XY*-plane; it ranges from 0 to 90 deg. The azimuth

*ψ*is the angle between the projection of

**P**onto the

*XY*-plane and the

*X*-axis.

*N*×

*N*grid of base points was generated in the stimulus plane with width

*w*. The base point coordinates are denoted (

*X*

_{ ij}

*,Y*

_{ ij}

*,Z*

_{ ij}), 0 ≤

*i, j*≤

*N*− 1. Let

*U*

_{ ij}

^{ x}

*, U*

_{ ij}

^{ y}

*,*and

*U*

_{ ij}

^{ z}be independent, uniformly distributed random variables on the interval [−1,1]. The base point coordinates were defined as

*w*= 19 cm,

*δ*=

*N*= 20 points on a side. Setting

*n*

_{ xy}= 0.49 ensured that no facets would overlap or intersect one another in the jittered base grid. The amount of jitter in depth could take on any of eight distinct values,

*r*=

*k*

^{2}/16 cm, depending on the

*roughness level k*= {1,2,…,8}. The standard deviation of the

*Z*

_{ ij}coordinates in a surface with roughness level

*r*was

*k*

^{2}/12 cm. A flat surface,

*r*= 0, would thus mean all

*Z*

_{ ij}= 0. The

*Z*

_{ ij}coordinates were always 4 cm or less in absolute value. Note that the spacing of the standard deviation of successive roughness levels is quadratic in

*r*. In initial testing, we found that linear spacing led to stimuli that were difficult to discriminate at high roughness levels, and we consequently adopted quadratic spacing. The 3D surface was constructed from triangular facets. Each set of four neighboring grid points (

*i, j*), (

*i*+ 1,

*j*), (

*i, j*+ 1), and (

*i*+ 1,

*j*+ 1) was split into two triangular facets by randomly selecting one of the two diagonals to be connected by an edge. For each value of roughness, illuminant elevation, and context condition (see Illuminant cues section), four random surfaces were generated to minimize the possibility of observers using patterns in the distribution of facets as cues to roughness.

*ψ*= 180 deg (light came from the left) and elevation

*ϕ*= 50, 60, or 70 deg (Figure 5). It was located 80 cm away from the surface, and the punctate-total ratio was 0.62.

^{1}

^{2}. The stereoscope was contained a box with the front face of the box missing wherein the observer was seated with his/her chin placed in a chinrest. The interior of the box was coated with black flocked paper (Edmund Scientific, Tonawanda, NY) to absorb stray light. Only the stimuli on the screens of the monitors were visible to the observer. The casings of the monitors and any other features of the room were hidden behind the nonreflective walls of the enclosing box.

*test*surface patch and a

*match*surface patch were presented, and the observer's task was to indicate which patch appeared to be rougher. The test, under one illuminant, was chosen from the intermediate range of roughness levels (0.25 ≤

*r*≤ 3.06 cm, see Figure 7), whereas the match was under a different illuminant and could have any of the eight roughness levels.

*line of roughness constancy*(i.e., the identity line) and the measured PSEs should show no patterned deviation from the line. Almost all PSEs fell below the line of roughness constancy. This trend strongly suggests that a surface appears rougher when illuminated from a more grazing angle.

*α*level for seven tests [

*α*≅ .007] for a

*z*test of difference of slopes for the three illuminant comparisons). Thus, the additional cues to the illuminant direction provided in Condition II did not improve observers' judgments of roughness across varying illumination conditions ( Figure 13).

*c*(i.e., the slope of a linear fit to the data in Figures 11 and 12), (2) the standard deviation

*σ*of normally distributed noise that causes the variability in observers' judgments, and (3) a noise-scaling parameter

*γ*. We assume that the noise is scaled by

*r*

^{ γ}; thus, if

*γ*= 0, this corresponds to stimulus-independent noise. If

*γ*= 1, this corresponds to Weber's law for our roughness scale

*r*(see 1 for details). We denote the three illuminant conditions by A, B, and C (corresponding to illuminant elevations

*ϕ*= 70, 60, and 50 deg, respectively), resulting in three roughness transfer parameters

*c*

_{AB},

*c*

_{BC}, and

*c*

_{AC}, plus

*σ*and

*γ,*for a total of five model parameters. We estimated these parameters by maximum likelihood.

*α*level for seven tests,

*α*≅ .007). This outcome is not particularly surprising because we have little reason to expect perfect roughness constancy; however, what is of interest is the magnitude of the failure of roughness constancy across observers. The estimates of the roughness transfer parameter

*c*were 0.78 on average, markedly less than 1. Additionally, most values of

*γ*fell very close to 1 for all observers, suggesting that noise increases in a manner that follows Weber's law with increasing levels of roughness ( Table 1).

Observer | ||||||||
---|---|---|---|---|---|---|---|---|

Context condition | Model parameter | C.P. | J.G. | M.F. | M.S.L. | P.J.N. | T.A. | Y.X.H. |

I | c ^ _{70 deg 60 deg} | 1.074 | 0.840 | 0.826 | 0.802 | 0.783 | 0.688 | 0.787 |

c ^ _{60 deg 50 deg} | 1.037 | 0.743 | 0.828 | 0.991 | 0.779 | 0.872 | 0.757 | |

c ^ _{70 deg 50 deg} | 0.826 | 0.599 | 0.646 | 0.797 | 0.582 | 0.453 | 0.648 | |

σ ^ | 0.462 | 0.354 | 0.398 | 0.479 | 0.392 | 0.795 | 0.269 | |

γ ^ | 0.976 | 0.967 | 0.944 | 0.980 | 0.990 | 0.979 | 0.972 | |

II | c ^ _{70 deg 60 deg} | 0.944 | 0.848 | 0.825 | 0.912 | 0.737 | 0.808 | 0.813 |

c ^ _{60 deg 50 deg} | 0.947 | 0.792 | 0.846 | 0.834 | 0.759 | 0.831 | 0.780 | |

c ^ _{70 deg 50 deg} | 0.800 | 0.708 | 0.700 | 0.757 | 0.559 | 0.620 | 0.635 | |

σ ^ | 0.397 | 0.438 | 0.481 | 0.365 | 0.404 | 0.698 | 0.299 | |

γ ^ | 0.980 | 0.980 | 0.960 | 0.981 | 0.996 | 0.984 | 0.974 |

*r*

_{A}is the roughness of any surface under Illuminant A, and it matches a surface with roughness

*r*

_{B}under Illuminant B, then

_{70 deg 50 deg}plotted against the corresponding slope predictions

_{60 deg 50 deg}

_{70 deg 60 deg}based on transitivity. Most of the data fall along the identity line. We obtained 95% confidence intervals for both prediction and estimate by a bootstrap method (Efron & Tibshirani, 1993).

*p*values are listed in Table 2. We performed a

*z*test to determine whether the measured and predicted slopes were significantly different from each other for each possible comparison. There was no significant difference between slopes for both conditions (I and II) at the Bonferroni-corrected

*α*level for seven tests (

*α*≅ .007) for six of the seven observers, consistent with the claim that observers' judgments are transitive.

Observer | ||||||||
---|---|---|---|---|---|---|---|---|

Context condition | Transitivity predictions | C.P. | J.G. | M.F. | M.S.L. | P.J.N. | T.A. | Y.X.H. |

I | c ^ _{70 deg 50 deg} | 0.826 | 0.599 | 0.646 | 0.797 | 0.582 | 0.453 | 0.648 |

c ^ _{60 deg 50 deg} c ^ _{70 deg 60 deg} | 1.113 | 0.624 | 0.684 | 0.794 | 0.610 | 0.600 | 0.596 | |

% ɛ | 25.770 | 3.980 | 5.630 | −0.300 | 4.640 | 24.560 | −8.750 | |

p | .001 | .540 | .453 | .972 | .480 | .041 | .126 | |

II | c ^ _{70 deg 50 deg} | 0.800 | 0.708 | 0.700 | 0.757 | 0.559 | 0.620 | 0.635 |

c ^ _{60 deg 50 deg} c ^ _{70 deg 60 deg} | 0.894 | 0.672 | 0.698 | 0.761 | 0.559 | 0.671 | 0.635 | |

% ɛ | 10.480 | −5.380 | −0.370 | 0.490 | 0.010 | 7.610 | −0.002 | |

p | .131 | .473 | .966 | .941 | .999 | .497 | ∼1 |

*z*tests,

*p*< .001).

*R*

_{d}denote the estimate of roughness based on illumination-invariant cues. Note that

*R*

_{d}may be the result of combining multiple illumination-invariant cues. For our purposes, it suffices to lump all such cues together.

*R*

_{d}is the true roughness of the surface:

*E*(

*R*

_{d}) =

*r*. That is,

*R*

_{d}is an unbiased cue. If a visual system used only

*R*

_{d}as its roughness estimate, then it would display roughness constancy. However, if the variance of

*R*

_{d}is large, then the observer's estimates would be highly variable from trial to trial. Consequently, the observer might seek to reduce the variance by combining

*R*

_{d}with other roughness cues. These additional cues are necessarily affected by change in illumination, given how we have defined

*R*

_{d}.

*r*: (1)

*r*

_{p}, the

*proportion*of the image that is not directly lit by the punctate source (the proportion of the image in shadow); (2)

*r*

_{s}, the

*standard deviation*in luminance of nonshadowed pixels in the image due to differential illumination by the punctate source; (3)

*r*

_{m}, the

*mean*luminance of nonshadowed pixels; and (4)

*r*

_{c}, the texture

*contrast*

^{2}as defined by Pont and Koenderink (2005). Texture contrast is intended to be a robust statistic for characterizing materials across lighting conditions. It is less sensitive to lighting changes than the other three measures. Each measure is a function of the true roughness of the surface

*r*and the lighting condition

*L*and can be written

*r*

_{s}(

*r,L*),

*r*

_{m}(

*r,L*),

*r*

_{p}(

*r,L*), and

*r*

_{c}(

*r,L*) to emphasize this dependence.

*r*when only roughness is varied while the lighting condition

*L*is held constant. Increasing roughness, for example, increases the proportion of the scene

*r*

_{p}(

*r,L*) consisting of cast and attached shadows. Correspondingly, the mean image intensity decreases and the variation of facet illumination increases. We have verified that this is the case in our stimuli for all four of these measures. However, when surface roughness remains constant and lighting conditions change, the values of these measures also change. Consequently, the values of these measures confound roughness and lighting condition.

*R*

_{s},

*R*

_{m},

*R*

_{p}, and

*R*

_{c}, corresponding to the four physical measures just defined. We assume that each is an unbiased estimate of the corresponding physical measure; that is,

*E*[

*R*

_{p}] =

*r*

_{p}(

*r,L*) and similarly for the other three measures.

*r*in lighting condition

*L,*the observer forms the roughness estimate

*w*

_{i}combine the scale factors and weights and thus need not sum to 1 as weights do.

*r*′ viewed under a different lighting condition

*L*′,

*R*=

*R*′. Subtracting Equations 7 and 8 yields

*R*

_{ i}=

*R*′

_{ i}−

*R*

_{ i}. We assume that

*w*

_{d}is nonzero (the observer is making some use of illuminant-invariant cues) and rearrange as

*a*

_{s}= −

*w*

_{s}/

*w*

_{d}, and so forth. If

*R*

_{s},

*R*

_{m},

*R*

_{p}, and

*R*

_{c}were unbiased cues to roughness, then the expected values of Δ

*R*

_{s}, Δ

*R*

_{m}, Δ

*R*

_{p}, and Δ

*R*

_{c}would all be 0 and, as a consequence,

*E*[Δ

*R*

_{d}] =

*r*−

*r*′ = 0. We would expect the observer to be roughness constant on average but that is not what we found experimentally. Observers systematically matched surfaces with very different roughness

*r*≠

*r*′ across lighting conditions.

*R*

_{p}and

*R*

_{p}′ is the difference between the actual proportion of the image not directly lit by the punctate illuminants in the two scenes. If we denote this difference by Δ

*r*

_{p}=

*r*

_{p}(

*r*′,

*L*′) −

*r*

_{p}(

*r,L*), then we have

*E*[Δ

*R*

_{p}] = Δ

*r*

_{p}. Similarly,

*E*[Δ

*R*

_{m}] = Δ

*r*

_{m},

*E*[Δ

*R*

_{s}] = Δ

*r*

_{s}, and

*E*[Δ

*R*

_{c}] = Δ

*r*

_{c}.

*r*

_{s}, Δ

*r*

_{m}, Δ

*r*

_{p}, and Δ

*r*

_{c}for each value of roughness and lighting condition by first computing

*r*

_{s}(

*r,L*),

*r*

_{m}(

*r,L*),

*r*

_{p}(

*r,L*), and

*r*

_{c}(

*r,L*) for each possible roughness

*r*and lighting condition

*L*and then taking differences. These were computed using the four stimulus images for each condition.

*r*

_{p},

*r*

_{m}, and

*r*

_{s}, we must determine which pixels in each image are not directly illuminated by the punctate source. To do this, we rerendered our scenes with the diffuse lighting term set to 0 and surface albedo set to 1 and no interreflections among facets. We refer to these rerendered images as punctate-only images.

*r*

_{p}) and the other terms based on nonshadowed pixels (

*r*

_{m}and

*r*

_{s}) are easily computed once we know which pixels in the image are not directly illuminated by the punctate source. We determined the set of shadowed pixels using the left-eye images only.

*ɛ*combines all of the errors given by all terms in the model. The results of the regression fit are shown in Table 3. Note that we include a constant term

*a*

_{0}in the regression. We will return to this term in the discussion below.

VAF | Observer | ||||||
---|---|---|---|---|---|---|---|

C.P. | J.G. | M.F. | M.S.L. | P.J.N. | T.A. | Y.X.H. | |

R ^{2} _{c} | 35 | 57 | 45 | 40 | 34 | 25 | 70 |

R ^{2} _{s} | 63 | 41 | 32 | 42 | 19 | 30 | 45 |

R ^{2} _{m} | 2 | 17 | 19 | 16 | 15 | 8 | 18 |

R ^{2} _{p} | 12 | 15 | 24 | 14 | 29 | 2 | 57 |

R ^{2} _{s,c} | 71 | 58 | 47 | 43 | 41 | 30 | 80 |

R ^{2} _{m,c} | 44 | 58 | 45 | 40 | 34 | 26 | 70 |

R ^{2} _{m,s} | 72 | 41 | 33 | 42 | 22 | 30 | 46 |

R ^{2} _{p,c} | 73 | 57 | 45 | 43 | 40 | 32 | 78 |

R ^{2} _{p,s} | 73 | 48 | 37 | 42 | 37 | 30 | 72 |

R ^{2} _{p,m} | 19 | 29 | 37 | 24 | 43 | 10 | 71 |

R ^{2} _{m,s,c} | 74 | 59 | 47 | 44 | 41 | 30 | 81 |

R ^{2} _{p,s,c} | 73 | 61 | 51 | 43 | 42 | 32 | 81 |

R ^{2} _{p,m,c} | 81 | 58 | 45 | 47 | 44 | 32 | 78 |

R ^{2} _{p,m,s} | 77 | 49 | 40 | 43 | 44 | 30 | 76 |

R ^{2} _{p,m,s,c} | 82 | 66 | 53 | 48 | 44 | 33 | 81 |

*not*provide us directly with an estimate of how much weight the observers give each cue. However, we can determine how much each cue or combination of cues contributes to the observer's judgments by comparing the proportion of variance accounted for by each of the 15 possible combinations of predictors (Table 3). The combination of the four predictors of roughness judgments explains 58% of the variance in the data on average over the seven observers (values for individual observers ranged from 33% to 82%). Figure 16 shows Δ

*R*

_{d}(the observer's failure of roughness constancy) plotted against the predicted values

*not*provide accurate information about roughness across lighting conditions. The visual system's reliance on pseudocues accounts for the systematic deviations away from roughness constancy that we found in our data. In partial mitigation of the visual system's error, we note that these same pseudocues would have been valid cues to roughness had we not varied lighting conditions systematically.

*R*

_{p}by

*R*

_{p}+

*R*

_{m},

*R*

_{m}by

*R*

_{p}−

*R*

_{m}, and left

*R*

_{s}and

*R*

_{c}unchanged, we would have a new set of four cues that fit the data equally well. Nonlinear transformations of the four pseudocues may better account for the data.

*R*

_{m}′ and

*R*

_{s}′ for

*R*

_{m}and

*R*

_{s}. The expected values

*r*

_{m}′ and

*r*

_{s}′ were computed in the same way as for the unprimed versions but using the punctate-only images described above instead of the images that the observer saw. The revised model based on this second set of pseudocues accounts for a markedly larger proportion of the variance (90% on average, ranging from 74% to 95%). We note, however, that it is not obvious how observers could compute estimates of these cues from the images actually viewed. To do so, they would have to effectively discount the effect of the diffuse illuminant on the scene, as well as interreflections. Thus, if observers can compute these alternative pseudocues, then we have found a parsimonious model that predicts their failures of roughness constancy remarkably well.

*R*

_{p}, the proportion of the scene not lit directly by the punctate source.

The first time it brings its vision to bear on a sphere, the impression it gets of it stands for nothing but a flat circle, with shadow and light mixed. It does not, therefore, yet see a sphere: for its eye has not yet learned to assess the relief on a surface where shadow and light are distributed in a particular proportion. But it has touch now, and because it is learning to come to the same judgments with vision as it comes to with touch, the statue takes under its eyes the relief that it has under its hands (from Baxandall, 1995).

*r*

_{a}and is viewed under Illumination Condition A (first interval) and the other surface has roughness level

*r*

_{b}under Illumination Condition B (second interval). We first assume that the observer's roughness estimate is a transformation of actual roughness that depends on the illuminant,

*σ*

^{2}is the variance when

*ρ*

_{aA}equals 1, and

*γ*yields the power transformation. If

*γ*is 1, then Weber's law holds for the arbitrary roughness scale that we use. If

*γ*is 0, then the error is invariant with roughness level.

*ɛ*is normal with a mean of 0 and a variance of

*σ*

^{2}(

*L*

_{B}(

*r*

_{b})

^{2 γ}+

*L*

_{A}(

*r*

_{a})

^{2 γ}). The observer responds “second interval” if Δ > 0 and otherwise responds “first interval.” Let

*p*denote the probability of responding second interval. Then,

*c*

_{AB}=

*c*

_{A}/

*c*

_{B}and absorbing extra parameters into

*σ,*yielding:

*contour of indifference*to be the (

*r*

_{a},

*r*

_{b}) pairs such that

*L*

_{B}(

*r*

_{a}) =

*L*

_{A}(

*r*

_{b}). These pairs are predicted to appear equally rough to the observer under the corresponding illumination conditions. We refer to this contour as the

*transfer function*connecting Illumination Conditions A and B,

*c*

_{AB}is as defined above. Note that if

*c*

_{AB}= 1, the observer's judgments of roughness are unaffected by a change of illumination condition. That is, the observer is roughness constant, at least for this pair of illumination conditions.

*L*

_{A}(

*r*) for any Illumination Condition A or estimate the constant

*c*

_{A}in the form of

*L*

_{A}(

*r*) we have assumed. We can, however, estimate the transfer function parameter

*c*

_{AB}from our data.