Abstract
We examined visual estimation of surface roughness using random, computer-generated, three-dimensional (3D) surfaces rendered under a mixture of diffuse lighting and a punctate source. The angle between the tangent to the plane containing the surface texture and the direction to the punctate source was varied from 50 to 70 deg across lighting conditions. Observers were presented with pairs of surfaces under different lighting conditions and indicated which 3D surface appeared rougher. Surfaces were viewed either in isolation or in scenes with added objects whose shading, cast shadows, and specular highlights provided information about the spatial distribution of illumination. All observers perceived surfaces to be markedly rougher with decreasing illuminant angle. Performance in scenes with added objects was no closer to constant than that in scenes without added objects. We identified four novel cues that are valid cues to roughness under any single lighting condition but that are not invariant under changes in lighting condition. We modeled observers' deviations from roughness constancy as a weighted linear combination of these “pseudocues” and found that they account for a substantial amount of observers' systematic deviations from roughness constancy with changes in lighting condition.
In an image of a 3D surface, there exist a number of visual cues that are affected by changes in surface roughness. Some of these are invariant under changes in illumination conditions and others are not. An example of the former would be a measure of the depth variance of the surface patch based on disparity estimates. So long as the visual system can accurately estimate disparity values, this information should not be affected by changes in lighting. Let R d denote the estimate of roughness based on illumination-invariant cues. Note that R d may be the result of combining multiple illumination-invariant cues. For our purposes, it suffices to lump all such cues together.
We assume that the expected value of R d is the true roughness of the surface: E( R d) = r. That is, R d is an unbiased cue. If a visual system used only R d as its roughness estimate, then it would display roughness constancy. However, if the variance of R d is large, then the observer's estimates would be highly variable from trial to trial. Consequently, the observer might seek to reduce the variance by combining R d with other roughness cues. These additional cues are necessarily affected by change in illumination, given how we have defined R d.
Inspection of the rendered images suggests four physical measures of the scene, which would be affected by changes in roughness
r: (1)
r p, the
proportion of the image that is not directly lit by the punctate source (the proportion of the image in shadow); (2)
r s, the
standard deviation in luminance of nonshadowed pixels in the image due to differential illumination by the punctate source; (3)
r m, the
mean luminance of nonshadowed pixels; and (4)
r c, the texture
contrast 2 as defined by Pont and Koenderink (
2005). Texture contrast is intended to be a robust statistic for characterizing materials across lighting conditions. It is less sensitive to lighting changes than the other three measures. Each measure is a function of the true roughness of the surface
r and the lighting condition
L and can be written
rs(
r,L),
rm(
r,L),
rp(
r,L), and
rc(
r,L) to emphasize this dependence.
Each of these measures is highly correlated with roughness r when only roughness is varied while the lighting condition L is held constant. Increasing roughness, for example, increases the proportion of the scene r p( r,L) consisting of cast and attached shadows. Correspondingly, the mean image intensity decreases and the variation of facet illumination increases. We have verified that this is the case in our stimuli for all four of these measures. However, when surface roughness remains constant and lighting conditions change, the values of these measures also change. Consequently, the values of these measures confound roughness and lighting condition.
We assume that the visual system has four available “pseudocues” to roughness, R s, R m, R p, and R c, corresponding to the four physical measures just defined. We assume that each is an unbiased estimate of the corresponding physical measure; that is, E[ R p] = r p( r,L) and similarly for the other three measures.
We consider the possibility that the visual system errs in using these pseudocues across changes in lighting condition. We assume that cues and pseudocues are scaled and combined by a weighted average (Landy, Maloney, Johnston, & Young,
1995). In viewing a surface of roughness
r in lighting condition
L, the observer forms the roughness estimate
where the values
wi combine the scale factors and weights and thus need not sum to 1 as weights do.
In this experiment, observers compare this roughness estimate to the roughness estimate for a second surface patch with roughness
r′ viewed under a different lighting condition
L′,
to decide which surface was rougher. Consider the situation in which two surfaces are perceived to be as equally rough, that is,
R =
R′. Subtracting
Equations 7 and
8 yields
where Δ
R i =
R′
i −
R i. We assume that
w d is nonzero (the observer is making some use of illuminant-invariant cues) and rearrange as
where
a s = −
w s/
w d, and so forth. If
R s,
R m,
R p, and
R c were unbiased cues to roughness, then the expected values of Δ
R s, Δ
R m, Δ
R p, and Δ
R c would all be 0 and, as a consequence,
E[Δ
R d] =
r −
r′ = 0. We would expect the observer to be roughness constant on average but that is not what we found experimentally. Observers systematically matched surfaces with very different roughness
r ≠
r′ across lighting conditions.
The expected value of the difference between the pseudocue R p and R p′ is the difference between the actual proportion of the image not directly lit by the punctate illuminants in the two scenes. If we denote this difference by Δ r p = r p( r′, L′) − r p( r,L), then we have E[Δ R p] = Δ r p. Similarly, E[Δ R m] = Δ r m, E[Δ R s] = Δ r s, and E[Δ R c] = Δ r c.
We computed the expected values of each of the terms Δ r s, Δ r m, Δ r p, and Δ r c for each value of roughness and lighting condition by first computing r s( r,L), r m( r,L), r p( r,L), and r c( r,L) for each possible roughness r and lighting condition L and then taking differences. These were computed using the four stimulus images for each condition.
To compute r p, r m, and r s, we must determine which pixels in each image are not directly illuminated by the punctate source. To do this, we rerendered our scenes with the diffuse lighting term set to 0 and surface albedo set to 1 and no interreflections among facets. We refer to these rerendered images as punctate-only images.
Pixels with a value of 0 in a punctate-only image correspond to surfaces that are not directly illuminated by the punctate source (i.e., in shadow). The proportion of shadowed pixels ( r p) and the other terms based on nonshadowed pixels ( r m and r s) are easily computed once we know which pixels in the image are not directly illuminated by the punctate source. We determined the set of shadowed pixels using the left-eye images only.
Equation 10 posits that the difference in roughness at the PSE is a linear combination of the difference in each of the cues we have identified, each perturbed by error. To test this model, we collapsed Conditions I and II and regressed 36 PSE differences (2 context conditions × 3 illuminant comparisons × 6 test roughness levels) against the differences in the illuminant-variant cues. The resulting regression equation expresses the observers' failures of constancy in terms of the hypothetical light-variant cues,
where the error term
ɛ combines all of the errors given by all terms in the model. The results of the regression fit are shown in
Table 3. Note that we include a constant term
a 0 in the regression. We will return to this term in the discussion below.
Table 3 Percentage of variance ( R 2) accounted for by combinations of predictors regressed to deviations in roughness (Δ R d). For each row, we tested the null hypothesis that R 2 = 0 for each entry in the table, with a Bonferroni correction for multiple tests. Values in boldface indicate that R 2 was significantly different from 0. The overall α of the test was .05. The Bonferroni-corrected level for each was .007 = .05/7. We did not apply Bonferroni correction for all 49 tests as the results in each column are not independent.
Table 3 Percentage of variance ( R 2) accounted for by combinations of predictors regressed to deviations in roughness (Δ R d). For each row, we tested the null hypothesis that R 2 = 0 for each entry in the table, with a Bonferroni correction for multiple tests. Values in boldface indicate that R 2 was significantly different from 0. The overall α of the test was .05. The Bonferroni-corrected level for each was .007 = .05/7. We did not apply Bonferroni correction for all 49 tests as the results in each column are not independent.
VAF | Observer |
C.P. | J.G. | M.F. | M.S.L. | P.J.N. | T.A. | Y.X.H. |
R 2 c | 35 | 57 | 45 | 40 | 34 | 25 | 70 |
R 2 s | 63 | 41 | 32 | 42 | 19 | 30 | 45 |
R 2 m | 2 | 17 | 19 | 16 | 15 | 8 | 18 |
R 2 p | 12 | 15 | 24 | 14 | 29 | 2 | 57 |
R 2 s,c | 71 | 58 | 47 | 43 | 41 | 30 | 80 |
R 2 m,c | 44 | 58 | 45 | 40 | 34 | 26 | 70 |
R 2 m,s | 72 | 41 | 33 | 42 | 22 | 30 | 46 |
R 2 p,c | 73 | 57 | 45 | 43 | 40 | 32 | 78 |
R 2 p,s | 73 | 48 | 37 | 42 | 37 | 30 | 72 |
R 2 p,m | 19 | 29 | 37 | 24 | 43 | 10 | 71 |
R 2 m,s,c | 74 | 59 | 47 | 44 | 41 | 30 | 81 |
R 2 p,s,c | 73 | 61 | 51 | 43 | 42 | 32 | 81 |
R 2 p,m,c | 81 | 58 | 45 | 47 | 44 | 32 | 78 |
R 2 p,m,s | 77 | 49 | 40 | 43 | 44 | 30 | 76 |
R 2 p,m,s,c | 82 | 66 | 53 | 48 | 44 | 33 | 81 |
In using the variation in the cues from trial to trial to estimate the weight assigned to each cue, we are, in effect, applying the technique used by Ahumada and Lovell (
1971), which is the basis of image classification methods. Note that these coefficients do
not provide us directly with an estimate of how much weight the observers give each cue. However, we can determine how much each cue or combination of cues contributes to the observer's judgments by comparing the proportion of variance accounted for by each of the 15 possible combinations of predictors (
Table 3). The combination of the four predictors of roughness judgments explains 58% of the variance in the data on average over the seven observers (values for individual observers ranged from 33% to 82%).
Figure 16 shows Δ
Rd (the observer's failure of roughness constancy) plotted against the predicted values
using regression estimates of the coefficients for the four pseudocues but without the constant term
. Most values fall close to the identity line. Although the values of
were significantly different from 0 for some observers, they were relatively small and not patterned across observers. Hence, we recomputed the regression, forcing
to be 0.
To summarize, if observers relied solely on illuminant-invariant cues, such as binocular disparity, to make roughness estimates, they would have exhibited no patterned deviations from roughness constancy. Instead, it seems that observers relied on other measures in the scene such as the four pseudocues we considered. These pseudocues do not provide accurate information about roughness across lighting conditions. The visual system's reliance on pseudocues accounts for the systematic deviations away from roughness constancy that we found in our data. In partial mitigation of the visual system's error, we note that these same pseudocues would have been valid cues to roughness had we not varied lighting conditions systematically.
We do not claim that the four cues we advance are precisely the cues that the visual system uses. Any invertible matrix transformation of the four cues used here results in four alternative cues that would explain our results equally well. If, for example, we replaced R p by R p + R m, R m by R p − R m, and left R s and R c unchanged, we would have a new set of four cues that fit the data equally well. Nonlinear transformations of the four pseudocues may better account for the data.
In particular, we have fit a second model, substituting two pseudocues R m′ and R s′ for R m and R s. The expected values r m′ and r s′ were computed in the same way as for the unprimed versions but using the punctate-only images described above instead of the images that the observer saw. The revised model based on this second set of pseudocues accounts for a markedly larger proportion of the variance (90% on average, ranging from 74% to 95%). We note, however, that it is not obvious how observers could compute estimates of these cues from the images actually viewed. To do so, they would have to effectively discount the effect of the diffuse illuminant on the scene, as well as interreflections. Thus, if observers can compute these alternative pseudocues, then we have found a parsimonious model that predicts their failures of roughness constancy remarkably well.
Nonetheless, there is a parallel between one of the pseudocues we found and the “blackshot mechanism” of Chubb, Landy, and Econopouly (
2004). In studying 2D texture, they found evidence for a visual mechanism that was highly sensitive to very dark regions of the stimulus and that effectively computed a contrast between these regions and the brighter parts of the stimulus. It is possible that the blackshot mechanism plays a role in 3D texture perception, providing an estimate of what we referred to as
Rp, the proportion of the scene not lit directly by the punctate source.
This research was supported by National Institutes of Health Grant EY08266 and EY16165. We thank Hüseyin Boyaci and Katja Doerschner for help in developing the software used in the experiments described here, which is based on a code written by them, and for many helpful comments and suggestions.
Commercial relationships: none.
Corresponding author: Yun-Xian Ho.
Email: yunxian.ho@nyu.edu.
Address: Psychology Department, New York University, 6 Washington Place, Room 957, New York, NY 10003, USA.