We examine human ability to detect changes in scene lighting. Thirteen observers viewed three-dimensional rendered scenes stereoscopically. Each scene consisted of a randomly generated three-dimensional “Gaussian bump” surface rendered under a combination of collimated and diffuse light sources. During each trial, the collimated source underwent a small, quick change of position in one of four directions. The observer's task was to classify the direction of the lighting change. All observers were above chance in performing the task. We developed a model that combined two sources of information, a *shape map* and a *shading map,* to predict lighting change direction. We used this model to predict patterns of errors both across observers and across scenes differing in shape. We found that errors in estimating lighting direction were primarily the result of errors in representing surface shape. We characterized the surface features that affected performance in the classification task.

^{1}light source from non-light transformations that shared the same local scene statistics. They found that observers were accurate in discriminating light and non-light transformations and that they were more sensitive to a concomitant change in the albedo of a surface patch under a light transformation than under a non-light transformation. In Movie 1 and Figure 1, we illustrate the pattern of changes in luminance induced on a Lambertian sphere by changing the direction to a collimated source to the left, right, up, and down. In this article, we examine human ability to judge such changes in light source direction and develop a model of human performance.

*shading map*and information about the shape of the object, the

*shape map*. The shape map is needed to disambiguate the shading map as illustrated in Figure 2. The upper image in Figure 2 is the shading map of a corrugated surface varying sinusoidally in depth. The image could have been generated by either of the two surfaces shown below, one that is a reversal in depth of the other, where concavities have flipped to convexities, and vice versa. The associated estimates of light direction are opposite: in one case, the light comes from the right, and in the other, from the left.

^{2}illuminated by a combination of collimated (directional) light source and a diffuse (non-directional) source. The collimated light source moved in one of four possible directions (left, right, up, down) during each trial, and observers classified the direction of movement by responding “left,” “right,” “up,” or “down” (Figure 3).

*p*= (

*x, y, z*). The

*xy*-plane is fronto-parallel at 60 cm from the viewer, and positive

*z*indicates coordinates nearer to the viewer than the

*xy*-plane. Positive

*x*runs horizontally to the viewer's right and positive

*y*points toward the ceiling of the room. The origin is the extension from the cyclopean point out to 60 cm in front of the viewer. Stimuli were centered on the origin. The second coordinate system in Figure 4 is spherical,

*p*= (

*ψ, φ, ρ*) with azimuth

*ψ*and elevation

*φ*analogous to latitude and longitude on an imaginary terrestrial sphere and

*ρ*equal to distance from the Cartesian origin. The imaginary sphere is centered on the origin, and the observer's line of sight passes through what would be the North Pole and the axis of rotation of the terrestrial sphere. We will use azimuth and elevation (

*ψ, φ*) to denote directions with respect to the origin. We report azimuth and elevation values using a superscript degree symbol, e.g., 5° while we report “degrees of visual angle” with respect to the observer as “DVA.”

*display area*) within the

*xy*-plane. Each Gaussian had standard deviation of 0.5 DVA. To determine the position

*μ*

_{ i }= (

*μ*

_{ i }

^{ x },

*μ*

_{ i }

^{ y }),

*i*= 1, …, 30 of the

*i*th Gaussian centered on the origin, we first selected a uniformly distributed location. After assigning each Gaussian a position, we then checked that all were separated by at least 1 standard deviation, and those that were too close together were randomly jittered in position iteratively until the separation criterion was met. We then assigned heights

*h*

_{ i },

*i*= 1, …, 30, to the 30 isotropic Gaussians. Heights were distributed uniformly on the union of the two intervals [−1.5, −0.5] and [0.5, 1.5].

^{3}After the positions and heights were determined, the 30 Gaussian functions were summed in

*z*to form a Gaussian “bump” surface patch (an example is shown in Figure 3) with depth

*z*(

*v*) at any point

*v*= (

*x, y*) within the display area:

*σ*= 0.5 DVA. The

*z*range was then normalized to be 2.625 cm, and the patch finally translated such that all

*z*-values were equal to or greater than 0. All scenes therefore had the same fixed

*z*range of 2.625 cm with the furthest point from the observer embedded in the

*xy*-plane.

*z*octants, or in other words, above the scene, effectively at infinity behind the observer. The directions were fixed so that the projections of the light paths in the

*xy*-plane would correspond to +

*x*(right), −

*x*(left), +

*y*(up), and −

*y*(down). Each light path consisted of 16 equally spaced positions along an arc of constant radius from the origin.

*y*-axis (right or left motion) or the

*x*-axis (up or down motion).

*p*

_{ i }= (

*ψ*

_{ i },

*φ*

_{ i },

*ρ*) of each light path was randomized: initial azimuth,

*ψ*

_{ i }, was uniformly distributed on [0°, 360°), initial elevation,

*φ*

_{ i }, was uniformly distributed on [70°, 90°], and

*ρ*was fixed to be large and effectively infinite.

*track*the lighting direction over the course of the trial because the final position would directly indicate the change in direction (e.g., Figure 1). The randomization effectively precluded using the final position as a cue: some “down” trials could end with a final frame where the light comes from the right or above or even diagonally up and to the right. For the same reason, the initial frame provides nearly no information as to which response is correct.

*p*

_{ i }, each of the 15 remaining positions along the trajectory of the light path was computed by rotating

*p*

_{ i }= (

*x*

_{ i },

*y*

_{ i },

*z*

_{ i }) about either the

*x*-axis (up or down motion) or

*y*-axis (right or left motion) in steps of (10/15)°. This procedure is equivalent to the following: the collimated source can be envisioned as a fixed point of light on the celestial sphere centered on the origin, and its rotation above the scene occurs as a consequence of the rotation of the celestial sphere.

*j*th triangle with albedo

*a*

^{ j }on frame

*t*was determined by

*L*

_{c}is the intensity of the collimated light source,

*L*

_{d}is the intensity of the diffuse light source, and

*θ*

_{ j }(

*t*) is the angle between the

*j*th triangle's surface normal and the direction of the collimated light source in frame

*t*. The intensity of a collimated source is defined as the luminance of a Lambertian surface with albedo equal to 1 positioned so that its surface normal is parallel to the light source's direction. In these terms, our collimated source's intensity was 25 cd/m

^{2}. We set the diffuse source's intensity at one-quarter of the collimated source's intensity to simulate a simple “sun and sky” environment. All trial frames were rendered beforehand and displayed as 8-bit portable network graphics images.

^{2}.

*SEM*equal to 0.65 ± 0.13, which indicates that all observers performed above chance (0.25) in the experiment, as shown in Figure 6A.

*d, R*), was the total number of trials on which the observer classified veridical motion direction

*d*as motion

*R*. Considering each veridical motion direction separately (300 trials total per row), observers classified the axis (up–down or right–left) of motion correctly on 77 ± 11% (mean ± 1

*SEM*) of the trials, a proportion significantly higher than chance (50%) for all observers at all motion directions, except for S01, who was at chance to discriminate the motion axis throughout the experiment. Although counterintuitive, S01 was still above chance at discriminating motion direction, as all values on the confusion matrix diagonal (range = [113, 127]) were significantly greater than chance (75) at the

*α*= 0.05 level as assessed with binomial confidence intervals. As previously stated, all other observers were also significantly better than chance at classifying the motion direction.

*anisotropy*: some scenes supported horizontal light motion discrimination better than vertical or vice versa, which we term the scene's response-based

*anisotropy*. For scene

*i,*the response-based

*anisotropy, A*

_{R}

^{ i }, is computed as the general increase in performance seen on trials with horizontal light paths, as opposed to vertical light paths:

*N*

_{ d }

^{ i }is the total number of correct classifications across all subjects (maximum possible = 13) for scene

*i*under light path

*d*. Many scenes had response-based anisotropies close to zero, but some were more extreme. See the histogram over all 300 scenes in Figure 6B. To explore the physical features of the scenes correlated with extreme response-based anisotropies, we plot the surface contours of the twenty scenes with the most extreme response-based anisotropies in Figure 7. As we might expect, the scenes for which horizontal light movements are more accurately judged tend to have roughly vertical “ridges” and “valleys” while those for which vertical light movements are more accurately judged tend to have elongated horizontal features.

*z*map image, stored its power spectrum, effectively throwing away phase information, and then summed over all rows for the horizontal power spectrum of the entire

*z*map. We calculated the vertical power spectrum in the same fashion, working across columns instead. The physically based anisotropy of scene

*i, A*

_{ f }

^{ i }, at each frequency

*f*was then:

*p*

_{ f }

^{ d }is the summed 1D Fourier power in the

*d*dimension at frequency

*f*.

*F*= 23.9,

*p*< 0.001,

*R*

^{2}= 0.45. A plot of the relative contribution of each frequency in cycles per image is shown in Figure 8. The analysis reveals that relative power differences at low spatial frequencies (1–3 cycles per image) are most predictive of anisotropies in classification performance. We will refer to this result later in the evaluation of our generative models.

*d*′ from signal detection theory, a measure that is independent of observer bias (Green & Swets, 1966/1973). For a particular motion direction, we considered signal trials as those with light paths in the correct direction, and all remaining trials as non-signal trials. Therefore, we defined the hit rate for a particular motion direction,

*p*

_{H}, to be the probability of a correct motion classification, and the false alarm rate,

*p*

_{F}, to be the probability of classifying a trial as that particular motion direction when a different motion had occurred. If Φ

^{−1}is the inverse of the cumulative unit normal distribution, then

*d*′ indicates chance performance, and

*d*′ increases with increased discrimination performance. In our task, a

*d*′ = 1 corresponds to 69% correct,

*d*′ = 2 corresponds to 84% correct, and

*d*′ = 3 corresponds to 93% correct if a symmetric criterion is adopted. Ninety-five percent confidence intervals for each

*d*′ estimate were obtained by a non-parametric bootstrap method (Efron & Tibshirani, 1993): each observer's performance in the corresponding condition was simulated 100,000 times and the 5th and 95th percentiles were calculated and used to construct 95% confidence intervals.

*d*′ 95% confidence intervals that did not include zero, and was moderately high on average with

*d*′ ± 1

*SEM*equal to 1.6 ± 0.56 across observers.

*d*′ = 1.5) than rightward (

*d*′ = 1.2), and S10 was more sensitive to both upward (

*d*′ = 2.0) and downward motions (

*d*′ = 2.1) compared to leftward motion (

*d*′ = 1.6). Significance was assessed with

*z*-tests at the

*α*= 0.05/4 level, as we corrected for the four tests on each observer's classification data.

*anisotropy*of the scenes as defined in the preceding section.

*shading intensity map, I,*of the scene, and the second is the

*shape map*of the scene. The model is illustrated schematically in Figure 10. The output of the model is a three-dimensional vector,

_{ xyz }= (

_{ x },

_{ y },

_{ z }), pointing in the direction of the illuminant. The angle of the vector

_{ xy }= (

_{ x },

_{ y }) counterclockwise above positive

*x*modulus 360° is an estimate of the azimuth component of the illuminant:

_{ xyz }= (

_{ x },

_{ y },

_{ z }) above the

*xy*-plane is an estimate of the elevation of the illuminant:

*v*∥ denotes the usual vector norm.

*xy*-plane, Δ

_{ xy }, from the model outputs, we simply subtract the initial estimate of direction (first frame) from the final estimate (last frame):

_{ xy }as shown in the inset to Figure 10 to classify the change in lighting direction as one of the four cardinal directions: left, right, up, down.

*I*= (

*dI*

_{ x }

^{ r,c },

*dI*

_{ y }

^{ r,c }), in the

*x*and

*y*directions at every point (

*r, c*) across an image, initial estimates of the

*x*and

*y*components, (

_{ x },

_{ y }), of the illuminant direction are computed as the mean gradient components under Pentland's derivation:

*z*component of the illuminant direction requires two further calculations, which Pentland derives (but see also Chojnacki, Brooks, & Gibbons, 1994). We do not describe them here because our task required estimating only the relative magnitudes of the

*x*and

*y*components of the illuminant direction.

*S,*(Koenderink, 1990, p. 320), which varies continuously between −1 and +1 and represents the shape (from convex elliptical to convex hyperbolic to flat to concave hyperbolic to concave elliptic) but not the degree of curvedness of the local region. It is derived from local estimates of principle curvatures and is negative for concave regions and positive for convex regions. By computing the Shape Index,

*S*

^{ r,c }, at every point (

*r, c*) in the image plane, we have a map varying from −1 to +1 that specifies the local shape of every visible point on the landscape. Taking the sign of this map,

*sign*(

*S*

^{ r,c }), simplifies the representation, allowing us to encode only concavity (−) and convexity (+).

*K*

^{ r,c }=

*sign*(

*S*

^{ r,c }), with the map of local luminance gradients effectively flips all the gradients in the concave regions and leaves the convex regions undisturbed:

*R*is the total number of rows in the image and

*C*is the total number of columns.

*dI*

_{ x }

^{ r,c },

*dI*

_{ y }

^{ r,c }). It is reasonable that information from the most informative scale, which depends solely on the spatial variation in the stimulus, would be the scale of interest. We consider the questions of “which derivative filter?” and “which scale?” empirically less important for this analysis, which is the first of its kind, and choose instead to focus on the predictive power of the basic form of the model. We have therefore chosen a scale and a derivative filter demonstrated to have high accuracy, the 5-tap filter

^{4}developed by Farid and Simoncelli (2004), which is shown in the gradient box of Figure 10. The size of the filter on our stimuli corresponds to 0.125 DVA on a side.

*f*) noise. The spectra of our stimuli were roughly 1/

*f,*so we expected pink noise to be effective in disturbing the luminance patterns. We further speculate that the shape of pink noise may closely resemble internal noise if spatial frequency filter outputs are normalized and then followed with additive noise.

*f*filter and took the real part of the inverse 2D Fourier Transform, which was then normalized to contain pixels with values on the interval [−1, 1]. To specify a particular level of noise in the same units as the luminance maps of the trials (cd/m

^{2}), we multiplied the pixels of the noise image by a gain

*g*specified in cd/m

^{2}. Under sufficiently high gains, adding the noise sample to a trial's luminance map could result in negative pixel values. When this occurred, negative pixel values were clipped to zero.

*t,*we added independent samples of pink noise to the first and last frames of each trial. We then processed both frames with the model and categorized the resulting change in lighting direction. This was repeated

*N*times for each trial

*t*. The resulting

*N*simulated directions were stored in

_{ n }, where the

*n*th entry indicates the lighting change direction

*d*∈ [1: “up,” 2: “down,” 3: “right,” 4: “left”] of the

*n*th simulated trial. The proportion of responses equal to

*d*were stored as

**P**

_{ g,t }(

*d*), the Monte Carlo probability of classifying trial

*t*as direction

*d*under noise level

*g*:

_{ n }≡

*d*has value 1 if left- and right-hand terms match, otherwise 0. The likelihood matrix

**L**was computed by taking the natural log of

**P**:

**c**

_{ t }, where the

*t*th entry is the direction

*d*∈ [1: “up,” 2: “down,” 3: “right,” 4: “left”], which the observer selected on trial

*t*. The likelihood of the data under gain

*g,*

_{ g }(

**c**), is the sum of each classification's likelihood under gain

*g*:

**P**

_{ g }, where the gain level

*g*was the observer's MLE.

*t*(12) = −3.5,

*p*< 0.01. Predicted 90° error rates were highly inflated for all but one observer and were on average 0.14 greater than achieved 90° error rates, a significant effect,

*t*(12) = 5.7,

*p*< 0.001. Predicted 180° error rates, on the other hand, were underestimated by the maximum likelihood fits by 0.054 on average, a significant effect,

*t*(12) = −5.1,

*p*< 0.001.

*f*noise is spatially correlated. Second, the model averages derivative responses across the image separately within each dimension, so the final estimate of static light direction is also robust to noise in the individual derivative responses. Finally, the motion direction estimate is computed as a difference in the model's estimate between two frames.

*h*

_{ i }(see Equation 1). To simulate depth noise at level

*σ*cm, we sampled 30 height noises,

*n*

_{ i }, from a normal distribution:

*h*

_{ i }, have been perturbed by,

*n*

_{ i }:

*v*= (

*x, y*) denotes any point in the display area. For each scene

*s,*this process was repeated

*N*times to generate

*N*smoothly deformed versions of scene

*s*resulting from Gaussian height noise normally distributed with standard deviation

*σ*cm.

*t,*we processed the first and last frames of each simulated trial and categorized the change in lighting direction. The resulting

*N*simulated directions were stored in

_{ n }, where the

*n*th entry indicates the direction

*d*∈ [1: “up,” 2: “down,” 3: “right,” 4: “left”] of the

*n*th simulated trial. The proportion of responses equal to

*d*were stored in

**P**

_{ σ,t }(

*d*), the Monte Carlo probability of classifying trial

*t*as direction

*d*under noise level

*σ*:

_{ n }≡

*d*has value 1 if left- and right-hand terms match, otherwise 0. The likelihood matrix

**L**was computed simply by taking the natural log of

**P**

_{ σ,t }:

**c**

_{ t }, where the

*t*th entry is the direction

*d*∈ [1: “up,” 2: “down,” 3: “right,” 4: “left”], which the observer selected on trial

*t*. The likelihood of the data under noise level

*σ,*

_{ σ }(

**c**), is the sum of each classification's likelihood under level

*σ*:

**P**

_{ σ }, where the noise level was the observer's MLE. On average, the maximum likelihood fits predicted unbiased hit rates (−0.01) and 90° error rates (−0.01), both not significant. Predicted 180° error rates were slightly biased by +0.02, an effect that was significant,

*t*(12) = 2.18,

*p*< 0.05, yet of a very small and therefore negligible magnitude.

*Contingent Ideal Observer*model of human performance. The concavity correction it performs is to process the depth map for local shape estimates indicating concavity and convexity. By flipping the gradient in concave regions before applying Pentland's algorithm, the model can then recover the true lighting direction.

*R*

^{2}were significantly different from 0, but the amount of variance accounted for is evidently modest. Further work is needed to explore the relationship between local shape and anisotropies in light field estimation.

^{2}We provide a detailed description in the Methods section for reproducing the stimuli. Interested readers may also contact the corresponding author for MATLAB code to reproduce the stereo stimuli on their own calibrated displays.

^{3}We denote intervals using both square brackets and parentheses. The parenthesis is used when the endpoint of the interval is not in the interval, i.e., (

*a, b*) does not include

*a*or

*b*but all the numbers in between. The square bracket is used when the endpoint is included, i.e., [

*a, b*] includes

*a, b,*and all the intervening numbers while (

*a, b*] includes

*b*but not

*a*.

^{4}MATLAB code for generating the filter is available from the authors at http://www.cs.dartmouth.edu/farid/research/derivative.html.