In previous work, we examined how the apparent roughness of a textured surface changed with direction of illumination. We found that observers exhibited systematic failures of roughness constancy across illumination conditions for triangular-faceted surfaces where physical roughness was defined as the variance of facet heights. These failures could be due, in part, to cues in the scene that confound changes in surface roughness with changes in illumination. These cues include the following: (1) the proportion of the surface in shadow, (2) mean luminance of the nonshadowed portion, (3) the standard deviation of the luminance of the nonshadowed portion, and (4) texture contrast. If the visual system relied on such “pseudocues” to roughness, then it would systematically misestimate surface roughness with changes in illumination much as our observers did despite the availability of depth cues such as binocular disparity. Here, we investigate observers' judgments of roughness when illumination direction and surface orientation are fixed and the observers' viewpoint with respect to the surface changes. We find a similar pattern of results. Observers exhibited patterned failures of roughness constancy with change in viewpoint, and an appreciable part of their failures could be accounted for by the same pseudocues. While the human visual system exhibits some degree of roughness constancy, our results lead to the conclusion that it does not always select the correct cues for a given visual task.

*illuminant-variant*measures based on 2D texture: the proportion of shadows, the mean luminance of nonshadowed facets, the standard deviation of the luminance of nonshadowed facets, and a “texture contrast” measure described by Pont and Koenderink (2005) that we will define below. These measures varied systematically with changes in both physical roughness and illuminant position. Thus, these measures confounded differences in physical roughness with changes in illuminant position. Under conditions of fixed illumination, these measures are valid visual cues to surface roughness, but under the conditions of our experiment, they were not. The failure of roughness constancy we found suggests that observers, in judging mesoscale textures, made use of these “pseudocues” rather than relying solely on whatever

*illuminant-invariant*cues (e.g., binocular disparity) were present in the stimuli, and as a result, they systematically misestimated roughness.

*bas-relief ambiguity*. However, we emphasize that these subjects viewed stimuli binocularly, and thus, the availability of binocular disparity cues should have decreased or eliminated bas-relief ambiguity.

*Is this a wool sweater or a plaster wall?*), potentially a very different task from roughness estimation (

*Is this rough or matted-down wool?*).

*x,*

*y,*

*z*) to define our 3D surfaces (Figure 2). The origin was in the center of the plane on which the stimulus was presented (the

*stimulus plane*). The

*z*-axis was normal to the stimulus plane. It was identical to the viewing direction when the surface was viewed frontoparallel. The

*x*-axis was horizontal and the

*y*-axis was vertical in the stimulus plane. We described the position of the observer and the illuminant as vectors from the origin using a spherical coordinate system (

*ψ,*

*ϕ,*

*d*). The punctate illuminant was located at position (

*ψ*

_{p},

*ϕ*

_{p},

*d*

_{p}), and the observer's viewpoint position was (

*ψ*

_{v},

*ϕ*

_{v},

*d*

_{v}). Both the viewpoint and illuminant were always in the

*xz*-plane. We used a slightly nonstandard coordinate system by fixing the azimuth

*ψ*at 180°, thus defining elevation

*ϕ*as the angle between the vector and the negative

*x*-axis. Thus, elevation ranged from 0° to 180°.

*N*×

*N*grid of base points was generated in the stimulus plane with width

*w*. The base point coordinates were denoted (

*X*

_{ ij },

*Y*

_{ ij },

*Z*

_{ ij }), 0 ≤

*i,*

*j*≤

*N*− 1. Let (

*U*

_{ ij }

^{ x },

*U*

_{ ij }

^{ y },

*U*

_{ ij }

^{ z }) be independent, uniformly distributed random variables on the interval [−1,1]. The base point coordinates were defined as

*w*= 20 cm,

*N*= 20 points on a side. Setting

*n*

_{ xy }= 0.49 ensured that no facets would overlap or intersect one another in the jittered base grid. The amount of jitter in depth could take on one of eight distinct values

*r*=

*k*

^{2}/16 cm depending on the roughness level

*k*= {1, 2, …, 8}. Thus, the

*Z*

_{ ij }coordinates were always 4 cm or less in absolute value. The standard deviation of the

*Z*

_{ ij }coordinates in a surface with roughness level

*r*was

*r*= 0, and all

*Z*

_{ ij }= 0.

*r*. In initial testing, we found that linear spacing led to stimuli that were difficult to discriminate at high roughness levels, and we consequently adopted the spacing used here. The 3D surface was constructed from triangular facets. Each set of four neighboring grid points (

*i,*

*j*), (

*i*+ 1,

*j*), (

*i,*

*j*+ 1), and (

*i*+ 1,

*j*+ 1) was split into two triangular facets by randomly selecting one of the two diagonals to be connected by an edge. For each value of roughness and viewing angle, four random surfaces were generated to minimize the possibility of observers using intrinsic patterns in the distribution of facets as cues to roughness. The surface patch was then centrally embedded in a smooth 30 × 30 cm surface. To preclude cues that may have resulted from an abrupt change in depths of the edges of the surface patch and the wall, we multiplied a 5-cm border around the surface patch by a raised cosine function to smooth the edges of the surface in depth.

*ψ*

_{v},

*ϕ*

_{v}, 70 cm) were tested in this study (Figure 4), where the possible values of

*ϕ*

_{v}included the surface normal (90°), three viewing angles to the left (30°, 50°, and 70°), and their reflections about the surface normal (110°, 130°, and 150°). The punctate illuminant position was fixed with respect to the test surface at position (180°, 60°, 80 cm). Each scene was rendered twice from slightly different viewpoints (±3 cm, corresponding to an interpupillary distance of 6 cm) for each tested viewing angle corresponding to the positions of the observer's eyes. The scenes were viewed binocularly in the experiment. The punctate–total ratio was 0.62. This is the ratio of the intensity of light absorbed by an infinitesimal Lambertian test patch facing the punctate light source to the intensity of all light absorbed by the patch (Boyaci, Maloney, & Hersh, 2003). It is a measure of the relative intensities of punctate and diffuse sources. A stereo pair of a typical scene is shown in Figure 5, and a representative set of stimuli is shown in Figure 6.

^{2}. The stereoscope was contained in a box whose side measures 124 cm. The front face of the box was missing and that is where the observer sat in a chin/head rest. The interior of the box was coated with black flocked paper (Edmund Scientific, Tonawanda, NY) to absorb stray light. Only the stimuli on the screens of the monitors were visible to the observer. The casings of the monitors and any other features of the room were hidden behind the nonreflective walls of the enclosing box.

*test*surface presented at viewing angle

*ϕ*

_{vtest }and a

*match*surface viewed from

*ϕ*

_{vmatch }were presented sequentially. The observer's task was to indicate which patch appeared to be rougher. The match surface was always presented at a viewing angle different from the test. We note that the observer did not know which surface in any given trial was match and which was test. We use the distinction only in describing how the sequence of trials presented to the observers was affected by their judgments, which will be described next, as well as in analyzing the data as we describe below.

*ϕ*

_{vtest }and

*r*

_{test}, where the roughness level of the surface (

*r*

_{test}) was chosen from the intermediate range of roughness levels (0.25 ≤

*r*

_{test}≤ 3.06; see Figure 6), two interleaved staircases (2-up/1-down and 1-up/2-down) varied the roughness

*r*

_{match}of the match stimulus (with viewpoint position

*ϕ*

_{vmatch }and

*r*

_{match}chosen from the entire range of roughness levels in the stimulus set) to determine the point of subjective equality (PSE). The PSE is the point at which the match surface was perceived rougher than the test 50% of the time. We tested viewing angles that were spaced sufficiently far apart to be easily discriminated from each other. Left and right viewing angles were supplementary to each other (angles reflected about surface the normal) to account for any left/right viewing biases. Comparisons of viewing angles

*ϕ*

_{v}≤ 90° and

*ϕ*

_{v}≥ 90° were separated into two sets. In each set, there were four values of

*ϕ*

_{v}, resulting in six possible pairs of test and match viewpoint. For each such pair, the viewpoint closer or equal to 90° was the test and the other member of the pair was the match viewpoint. This resulted in a total of 72 test staircases (6 test roughness levels × 6 viewing angle comparisons × 2 staircase types) for each of the two sets of viewing angle comparisons. Staircases for one set of viewing angle comparisons were continued across a total of four sessions. For one of the sets, a control condition was interleaved throughout the four sessions wherein three punctate illuminant elevations (50°, 60°, and 70°) were compared with each other for a fixed viewpoint in the frontoparallel position, resulting in an additional 36 staircases (6 test roughness levels × 3 illuminant comparisons × 2 staircase types; for details, see Ho et al., 2006). Each observer completed a total of eight sessions (four sessions per viewing angle comparison set) with 20 trials per staircase.

*ϕ*

_{v}= 90° compared with oblique viewing angles

*ϕ*

_{v}< 90°; Group II, oblique viewing angles

*ϕ*

_{v}< 90° compared with oblique viewing angles

*ϕ*

_{v}< 90°; Group III,

*ϕ*

_{v}= 90° compared with oblique viewing angles

*ϕ*

_{v}> 90°; and Group IV, oblique viewing angles

*ϕ*

_{v}> 90° compared with oblique viewing angles

*ϕ*

_{v}> 90°. The 95% confidence intervals for each PSE were obtained by a bootstrap method (Efron & Tibshirani, 1993) whereby each human observer's performance in the corresponding condition was simulated 1,000 times and the 5th and 95th percentiles were calculated.

*line of roughness constancy*(i.e., the identity line) and the measured PSEs should show no patterned deviation from the line. However, this was not the case for comparisons between different illuminant positions in the control experiment and certain comparisons between viewpoints.

*ϕ*

_{v}> 90° were perceived to be rougher than the same surface viewed frontoparallel (

*ϕ*

_{v}= 90°), and the opposite pattern was observed for surfaces with

*ϕ*

_{v}≤ 90°. In other words, surfaces generally appeared rougher with an increase in the amount of visible cast shadow. Results from our control experiment in which viewpoint was fixed and illuminant angle was varied were consistent with the latter observation, suggesting that observers in this study were susceptible to the same roughness judgment biases as observers in our previous study (see Figure S1 and Supplementary Tables S1 and S2 in the online supplement).

*roughness transfer function*to the data. In this model, for each viewpoint, perceived roughness was assumed to be proportional to physical roughness perturbed by Gaussian noise. Suppose that the observer compares two surfaces; one surface has roughness level

*r*

_{ a }and is viewed from viewpoint A (first interval) and the other surface has roughness level

*r*

_{ b }and is viewed from viewpoint B (second interval). We assume that the observer's roughness estimate is a transformation of actual roughness that depends on viewpoint,

*σ*

^{2}is the variance when

*ρ*

_{ aA }= 1, and

*γ*yields the power transformation. If

*γ*is 1, then Weber's law holds for the arbitrary roughness scale that we use. If

*γ*is 0, then the error is invariant with roughness level.

*Δ*on each trial to decide whether the rougher patch appeared in the first or second interval,

*ɛ*is normal with a mean of 0 and a variance of

*σ*

^{2}(

*V*

_{ B }(

*r*

_{ b })

^{2y }+

*V*

_{ A }(

*r*

_{ a })

^{2y }). The observer responds “second interval” if

*Δ*> 0; otherwise, the observer responds “first interval.”

*contour of indifference*to be the (

*r*

_{ a },

*r*

_{ b }) pairs such that

*V*

_{ B }(

*r*

_{ b }) =

*V*

_{ A }(

*r*

_{ a }). These pairs are predicted to appear equally rough to the observer under the corresponding viewing conditions. We refer to this contour as the

*transfer function*connecting the two viewing conditions A and B,

*c*

_{ A, B }is as defined above. Note that if

*c*

_{ A, B }= 1, the observer's judgments of roughness are unaffected by a change of viewpoint. That is, the observer is roughness constant, at least for this pair of viewpoints. We cannot directly observe

*V*

_{ A }(

*r*) for any viewpoint condition A or estimate the constant

*c*

_{ A }in the form of

*V*

_{ A }(

*r*) we have assumed. We can, however, estimate the transfer function parameter

*c*

_{ A, B }from our data (see the appendix in Ho et al., 2006, for details).

*c*

_{ A }=

*c*

_{ B }for any two viewing angles

*A*and

*B*and the value of

*c*

_{ A, B }should equal 1. For each group of viewpoint comparisons (i.e., Groups I–IV), we fit a total of three roughness transfer parameters for each tested pair of viewpoints plus

*σ*and

*γ*for a total of 14 model parameters (4 groups × 3 roughness transfer parameters + 2 noise-scaling parameters). We estimated these parameters by maximum likelihood methods.

*z*test to test whether the value of 1 could be obtained from the bootstrapped distribution. Across conditions and observers, slightly more than two thirds of

Observer | Group I | Group II | Group III | Group IV | σ ^ | γ ^ | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

σ ^ _{90°, 70°} | c ^ _{90°, 50°} | c ^ _{90°, 30°} | c ^ _{70°, 50°} | c ^ _{50°, 30°} | c ^ _{70°, 30°} | c ^ _{90°, 110°} | c ^ _{90°, 130°} | c ^ _{90°, 150°} | c ^ _{110°, 130°} | c ^ _{130°, 150°} | c ^ _{110°, 150°} | |||

JF | 1.243 | 1.221 | 1.173 | 1.133 | 1.078 | 1.028 | 0.762 | 0.716 | 0.729 | 0.886 | 1.067 | 0.947 | 0.279 | 0.873 |

LF | 1.491 | 1.401 | 1.373 | 1.675 | 1.160 | 1.438 | 0.706 | 0.656 | 0.662 | 0.786 | 1.105 | 0.803 | 0.689 | 0.938 |

SS | 1.078 | 1.049 | 1.077 | 1.055 | 1.275 | 1.139 | 0.889 | 0.926 | 1.107 | 0.974 | 1.189 | 1.110 | 0.269 | 0.955 |

YXH | 1.309 | 1.373 | 1.422 | 1.369 | 1.189 | 1.318 | 0.783 | 0.706 | 0.825 | 0.892 | 1.035 | 0.946 | 0.368 | 0.698 |

*R*

_{d}denote the visual estimate of roughness based on all unbiased (i.e., viewpoint-invariant) cues to physical roughness. We define

*r*

_{d}=

*E*[

*R*

_{d}] and assume that

*r*

_{d}=

*r*: The viewpoint-invariant cues (e.g., binocular disparity) signal, on average, the physical roughness of the surface. If the observers used only these viewpoint-invariant cues, then they would exhibit no systematic deviation from roughness constancy with changes in viewpoint, unlike our observers.

*r*, but also with illumination direction: (1)

*r*

_{p}, the proportion of the image that is not directly lit by the punctate illuminant (the proportion of the image in shadow); (2)

*r*

_{m}, the mean luminance of nonshadowed regions (nonzero pixels); (3)

*r*

_{s}, the standard deviation of the luminance of nonshadowed regions (nonzero pixels) of the image due to differential illumination by the punctate illuminant; and (4)

*r*

_{c}, the texture contrast. Texture contrast (Pont & Koenderink, 2005) is a modified version of Michelson contrast. It is computed as the difference between the 95th and 5th percentiles of luminance divided by the median luminance and is intended to be a robust statistic for characterizing materials across lighting conditions. We denote these measures as

*r*

_{p}(

*r*,

*ϕ*

_{v}),

*r*

_{m}(

*r*,

*ϕ*

_{v}),

*r*

_{s}(

*r*,

*ϕ*

_{v}), and

*r*

_{c}(

*r*,

*ϕ*

_{v}) to emphasize this dependence on both roughness

*r*and viewpoint

*ϕ*

_{v}.

*r*

_{p},

*r*

_{m}, and

*r*

_{s}, we needed to determine which pixels in each image were not directly illuminated by the punctate source. To do this, we employed a computational trick. We rerendered our scenes with the diffuse lighting term set to 0 and surface albedo set to 1 and no interreflections among facets. We refer to these rerendered images as punctate-only images. Pixels with a value of 0 in a punctate-only image corresponded to surfaces that were not directly illuminated by the punctate source (i.e., in shadow). The proportion of the image in shadow (

*r*

_{p}) and the other terms based on nonshadowed regions (

*r*

_{m}and

*r*

_{s}) were easily computed once we knew which regions in the image were not directly illuminated by the punctate source. We determined this set of zero pixels using the left-eye images only. To distinguish our numerical estimates from the true underlying values, we write, for example,

*r*

_{p}, for the latter, and so forth.

*SD*. Measures were obtained from trapezoidal regions located centrally on the 2D image projection of the surface from each viewpoint. For any fixed viewpoint, the pseudocue

*r*

_{p}increases monotonically whereas

*r*

_{m}decreases monotonically with increases in roughness. The other two pseudocues are nonmonotonic functions of roughness. All of these pseudocues are markedly affected by changes in viewpoint (monotonically for

*r*

_{p}and

*r*

_{m}but nonmonotonically for the other two pseudocues). The pseudocues evidently confound viewpoint and surface roughness.

*R*

_{p},

*R*

_{m},

*R*

_{s}, and

*R*

_{c}, corresponding to the four physical measures just defined. Each is an unbiased estimate of the corresponding physical measure:

*E*[

*R*

_{p}] =

*r*

_{p}(

*r*,

*ϕ*

_{v}), and so forth.

*r*from a given viewpoint

*ϕ*

_{v}, the observer forms the roughness estimate

*w*

_{ i }combine the scale factors and weights and thus need not sum to 1 as weights usually do. In this study, observers compared this roughness estimate with that for a second surface patch with roughness

*r*′ viewed from a different viewpoint,

*ϕ*

_{v}′,

*R*=

*R*′. Subtracting Equations 8 and 9 yields

*R*

_{ i }=

*R*

_{ i }′ −

*R*

_{ i }. We assume that

*w*

_{d}was nonzero; therefore, we can rearrange Equation 10 as

*a*

_{p}= −

*w*

_{p}/

*w*

_{d}, and so forth. We define Δ

*r*

_{p}=

*E*[Δ

*R*

_{p}] =

*r*

_{p}−

*r*

_{p}′, Δ

*r*

_{m}=

*E*[Δ

*R*

_{m}] =

*r*

_{m}−

*r*

_{m}′, and so forth. Equation 11 effectively expresses the effect of the pseudocues in terms of the viewpoint-invariant cues. If we take expected values of both sides of Equation 11, we have

*r*

_{d}should be 0, as

*r*

_{d}=

*r*

_{d}′ =

*r*, the physical roughness of the surface. Otherwise, Δ

*r*

_{d}is the observer's systematic error in matching surfaces in roughness across viewpoints: The systematic deviations from the identity line for each condition and observer in Figure 9 are estimates of Δ

*r*

_{d}for that observer and condition. Consequently, we can treat Equation 12 as a regression equation,

*a*

_{0}, yielding the regression equation

*r*=

*r*′ = 0 to be equally rough even if they are at different orientations. By including the term in the regression, we can test whether it is 0 as expected. In using the variation in the cues from trial to trial to estimate the weight assigned to each cue, we were, in effect, applying the technique used by Ahumada and Lovell (1971) that is the basis of image classification methods.

*a*

_{p}and

*a*

_{m}were found to be significantly different from 0. However, this does not necessarily indicate that the other two cues were not used in each pairwise roughness discrimination; it only shows that they had so little effect that we could not detect it.

Observer | VAF (%) | a ^ p | a ^ m | a ^ s | a ^ c |
---|---|---|---|---|---|

JF | 50 | −2.903 | −0.073 | 0.004 | 1.759 |

LF | 49 | −6.716 | −0.108 | 0.179 | 0.143 |

SS | 41 | −0.923 | 0.015 | 0.06 | 1.008 |

YXH | 74 | −2.757 | −0.067 | 0.082 | 2.125 |

*less*rough.

*shadow-hiding opposition effect*provides a possible explanation for the asymmetry that was observed in our data. Studies in atmospheric optics show that rough surfaces like those of the moon and fir tree bark exhibit a phenomenon known as the shadow-hiding opposition effect in which surface textural elements act as occluders to their own shadows when a surface is viewed from the same or nearly the same direction as the sun when it is located obliquely in the scene (Iaquinta & Lukianowicz, 2006). This results in the surface being brightest when most shadows are occluded. As the angle between the sun and the viewer increases, shadows become more and more visible (i.e., represented here as increasing viewpoint elevation). Viewpoint-induced changes consistent with shadow hiding are evident in our measurements of

*r*

_{p}and

*r*

_{m}with increasing viewing angle (Figures 12a and 12b). For example, two surfaces of the same roughness viewed from 30° and 150° contain very different proportions of shadows, although these two angles are identical with respect to the surface normal; specifically, the surface viewed from a larger angle of elevation contains more shadows. Correspondingly, our data show that when two surfaces viewed obliquely from these angles were compared with a surface of equal physical roughness viewed frontoparallel, for example, observers perceived the surface containing a greater proportion of visible shadows to be rougher (i.e., in the latter comparison, the oblique surface was perceived to be rougher, whereas in the former comparison, the oblique surface was perceived to be

*less*rough).

*r*

_{p}increases with viewpoint angle and would lead to the failures of roughness constancy we observed. For the viewpoint angles used in Group IV, the same measure becomes noisier and less dependent on viewpoint (Figure 12a) and, hence, acts as a nearly viewpoint-invariant cue to roughness, consistent with the nearly roughness-constant results for Group IV. Although it was not the intent of this study to examine in detail the effects of shadow hiding on surface roughness judgments, our results are generally consistent with the idea that roughness judgments for obliquely illuminated surfaces depend on the visibility of shadows in the texture.

*associative learning*. That is, a new perceptual cue may be trained to elicit a reliable perceptual response if repeatedly paired with another cue that elicits that particular response. The effect of associative learning on perceptual appearance has been examined in a variety of studies (e.g., Adams, Graf, & Ernst, 2004; Haijiang, Saunders, Stone, & Backus, 2006; Jacobs & Fine, 1999; Sinha & Poggio, 1996; Wallach & Austin, 1954). One particular type of associative learning referred to as

*cue recruitment*was recently described by Haijiang et al. (2006) who trained observers to disambiguate a perceptually bistable display (i.e., a rotating Necker cube) by pairing depth cues (e.g., occlusion) with an otherwise new cue. When depth cues were removed from the scene after training, the new cue presented alone with the Necker cube stimulus caused trainees to perceive rotation in the previously signaled direction.

Observer | ĉ_{70, 60} | ĉ | ĉ | ||
---|---|---|---|---|---|

JF | 0.180 | 0.981 | |||

LF | 0.871 | 0.660 | 0.932 | ||

SS | 0.464 | 0.777 | |||

YXH | 0.299 | 0.704 |

*z*-test at the Bonferroni-corrected alpha level 0.0125 are boldfaced. Notice that values of are close to one for all observers.

Observer | VAF | _{p} | _{m} | _{s} | _{c} |
---|---|---|---|---|---|

JF | 71 | 6.904 | 0.011 | −0.341 | 14.231 |

LF | 62 | −18.75 | 0.053 | −0.982 | |

SS | 56 | −2.272 | 0.028 | −0.088 | 7.082 |

YXH | 89 | 8.204 | 0.024 | 0.221 |

*α*level of 0.0125 per test). Error bars on the PSEs represent 95% confidence intervals estimated by a bootstrap method (Efron & Tibshirani, 1993).