**In three-dimensional (3-D) cluttered scenes such as foliage, deeper surfaces often are more shadowed and hence darker, and so depth and luminance often have negative covariance. We examined whether the sign of depth-luminance covariance plays a role in depth perception in 3-D clutter. We compared scenes rendered with negative and positive depth-luminance covariance where positive covariance means that deeper surfaces are brighter and negative covariance means deeper surfaces are darker. For each scene, the sign of the depth-luminance covariance was given by occlusion cues. We tested whether subjects could use this sign information to judge the depth order of two target surfaces embedded in 3-D clutter. The clutter consisted of distractor surfaces that were randomly distributed in a 3-D volume. We tested three independent variables: the sign of the depth-luminance covariance, the colors of the targets and distractors, and the background luminance. An analysis of variance showed two main effects: Subjects performed better when the deeper surfaces were darker and when the color of the target surfaces was the same as the color of the distractors. There was also a strong interaction: Subjects performed better under a negative depth-luminance covariance condition when targets and distractors had different colors than when they had the same color. Our results are consistent with a “dark means deep” rule, but the use of this rule depends on the similarity between the color of the targets and color of the 3-D clutter.**

*Z*, and were separated horizontally by Δ

*X*= 10 cm. Their

*XY*positions were perturbed in a random

*XY*direction by up to 1 cm. The targets were oriented so that their normal was parallel to the

*Z*axis. Each distractor was a 1 × 1-cm square, randomly oriented in 3D and randomly positioned within a bounding box of size 20 × 20 × 20 cm.

*Z*

_{0}= 60 cm from the front and center of the clutter bounding box. The projection plane (display screen) was located at a depth of 70 cm from the subject, which corresponded to the center depth of the clutter.

*Z*axis.

*Z*as follows. For negative depth-luminance covariance (DLC−), the luminance was chosen to be proportional to

*exp*(–

*λ*[

*Z*–

*Z*

_{0}]), where

*Z*was the depth (cm) of the center of the distractor or target and

*Z*

_{0}was the depth where the clutter begins. We chose a decay factor of

*λ*= 0.125, based on the density and size of the distractors (Langer & Mannan, 2012). Thus, the intensities of the surfaces ranged from 1.0 down to

*exp*(–0.125 × 20) ≈ 0.082. For positive depth-luminance covariance (DLC+), luminance was proportional to

*exp*(–

*λ*[

*Z*

_{0}+ 20 –

*Z*]). That is, it decayed exponentially with the

*Z*distance to the back face of the volume (see Appendix B for more details on how the DLC function is chosen).

*Z*that exceeded 15 cm. The reason is that when the scenes are rendered with tunnels for high Δ

*Z*values, these tunnels become very apparent. In pilot studies we found that subjects became confused in these conditions, which led to guessing behavior. By not using a tunnel when Δ

*Z*> 15 cm, we gave these scenes a strong visibility cue, which allowed subjects to perform the task easily and ensured that the staircases were well behaved.

*Z*≤ 15 cm made the

*expected*visibilities the same for the two targets for each depth difference. However, typically there were still differences in the visibilities of the two targets in each trial since the distractors were randomly positioned and oriented. If subjects were to mistakenly use these random differences in visibility in each trial as a depth cue, then this would compromise their performance and raise depth-discrimination thresholds. To examine if subjects were indeed doing so, we carried out a second version of the experiment (Experiment 2) in which we manipulated the distractor distribution further to reduce the per-trial variability in the visibility difference. (See the highlighted area in Figure 1b that indicates the distractors that we manipulated and see Appendix C for the details of the manipulation.)

*Z*in the staircase were chosen to target a proportion correct of approximately 78% (Garcia-Pérez, 1998). When the subject answered correctly or incorrectly, the distance Δ

*Z*between targets was reduced by a factor of 0.8 or was increased by a factor of 2.19, respectively. Staircase conditions were randomly interleaved. Each staircase began at level Δ

*Z*= 5 cm and then terminated after 10 reversals. To compute the thresholds, we averaged the log of the Δ

*Z*values of the last eight reversals.

*p*values. A

*p*value smaller than 0.05 was considered to be significant. These plots pooled the background (black vs. white) conditions since this variable did not have a statistically significant effect (see below).

*F*(1, 19) = 9.6,

*p*= 0.006. A second main effect was that thresholds were lower when targets and distractors had the same surface color (diagonals) than when they had different colors (off-diagonals),

*F*(1, 19) = 18.1,

*p*= 0.0005, with means 4.5 and 5.5 cm, respectively. There was an interaction effect between DLC and color,

*F*(1, 19) = 31.1,

*p*= 0.00002. When targets and distractors had the same color, thresholds were just slightly lower for DLC− than for DLC+ with means 4.4 and 4.7 cm, respectively. When targets and distractors differed in color, thresholds were much lower for DLC− than DLC+ with means 4.2 and 6.8 cm, respectively. This suggests that subjects relied on a dark-means-deep prior more when the colors of targets differed from the colors of the distractors. This prior is correct for the DLC− stimuli but is incorrect for the DLC+ stimuli. We will return to this point in the Discussion section.

*F*(1, 19) = 0.01,

*p*= 0.64. We therefore pooled the results for the plots in Figure 4 for consistent and inconsistent backgrounds, showing 18 conditions instead of 36.

*t*test on the signed differences in the mean thresholds for the 36 conditions of Experiments 1 versus 2 revealed a significant difference (

*t*= 4.67,

*p*< 0.0001).

*F*(1, 7) = 8.1,

*p*= 0.02. Thresholds were lower for diagonals than for off-diagonals with means 3.9 and 5.0 cm, respectively,

*F*(1, 7) = 2.76,

*p*= 0.141 The interaction between DLC and diagonal/off-diagonals also was close to significant,

*F*(1, 7) = 0.35,

*p*= 0.09. Thresholds were slightly smaller for DLC− than DLC+ for the diagonals with means 3.2 and 4.6 cm, respectively. Thresholds were much lower for DLC− than DLC+ in the off-diagonal case, with means 3.3 and 6.6 cm, respectively. In general, the DLC− thresholds were similar for all conditions, but the DLC+ thresholds were greater for the off diagonals. Finally, again there was no significant differences between consistent and inconsistent backgrounds,

*F*(1, 7) = 0.6,

*p*= 0.48. The means for a consistent background versus a inconsistent one were 4.3 versus 4.6 cm.

*t*test on the signed differences in the mean thresholds for the 36 conditions of Experiments 1 versus 3 revealed a significant difference (

*t*= 15.26,

*p*< 10

^{–16}). This reassured us that judging relative luminance was not itself the limiting factor in our experiments. This also reassured us that participants were discriminating depths, rather than just luminances in the main experiments.

*F*(1, 7) = 0.58,

*p*= 0.47. Diagonal versus off-diagonal had means 1.64 and 1.53 cm, respectively,

*F*(1, 7) = 2.46,

*p*= 0.16. Consistent and inconsistent background colors had means 1.58 and 1.59 cm,

*F*(1, 7) = 0.02,

*p*= 0.89.

*X*and

*Y*limits of the clutter, but it is sometimes also visible within the clutter when one can see all the way through. In this case, it is difficult to perceive that the leaking background is indeed due to background as opposed to just another occluded surface. In the case that the background color is inconsistent with the depth-luminance covariance, a sliver of background that is visible within the clutter might just be perceived as an outlier. Therefore, we believe that it was the luminances of the clutter rather than the background that provided the main cue that subjects used. Perhaps in sparser scenes containing fewer occlusions, the background color would have more of an effect.

*Z*, where the visibility cue is present for our stimuli. Could it be that subjects achieved thresholds below 15 cm by using other cues? We believe the answer is no. We ran a model observer who guesses in conditions when the visibility cue has been removed (Δ

*Z*≤ 15) and who answers correctly when the visibility cue is present (Δ

*Z*> 15). We ran 300 staircases for such an observer. The mean threshold was approximately 12 cm, which is similar to our three observers for the equiluminant targets. These results suggest that subjects relied entirely on the DLC cues, and the prior for DLC− to perform the task in our two main experiments.

*Nature*, 402 (6764), 877.

*PLoS One*, 10 (8), e0132658.

*Journal of Neuroscience*, 34 (35), 11761– 11768.

*Vision Research*, 26 (6), 973– 990.

*Perception*, 11 (6), 671– 676.

*Perception*, 6 (3), 287– 293.

*Vision Research*, 38 (12), 1861– 1881.

*Current Biology*, 26 (9), R350– R351.

*International Journal of Computer Vision*, 34 (2–3), 193– 204.

*Color Research Application*, 26 (S1: Proceedings of the International Colour Vision Society), (pp. S218– S221). Hoboken, New Jersey: John Wiley & Sons, Inc.

*Perception*, 29 (6), 649– 660.

*Journal of the Optical Society of America A*, 29 (9), 1794– 1807.

*Journal of Vision*, 16 (11): 11, 1– 18, https://doi.org/10.1167/16.11.11. [PubMed] [Article]

*Journal of the Optical Society of America A*, 11 (2), 467– 478.

*Vision Research*, 34 (12), 1595– 1604.

*Journal of the Optical Society of America A*, 20 (7), 1292– 1303.

*Journal of Vision*, 8 (9): 3, 1– 16, https://doi.org/10.1167/8.9.3. [PubMed] [Article]

*Proceedings of the National Academy of Sciences*, 109 (16), 6313– 6318.

*Bulletin of the Psychonomic Society*, 21 (6), 456– 458.

*Journal of Vision*, 15 (15): 2, 1– 10, https://doi.org/10.1167/15.15.2. [PubMed] [Article]

*Vision Research*, 31 (11), 1923– 1929.

*Journal of Imaging Science and Technology*, 42 (4), 319– 325.

*Nature*, 410 (6829), 690.

**M**matrix below, and then from XYZ to LUV.

*Y*) as the saturated green. We first determined the maximum luminances of gray

*Y*, unsaturated green

_{Gray}*Y*and saturated green

_{UGreen}*Y*using the transformation between RGB and CIE XYZ below. We rescaled the maximum RGB of gray (1, 1, 1) and unsaturated green (0.33, 0.87, 0.33) by multiplying by

_{SGreen}*Y*/

_{SGreen}*Y*and

_{Gray}*Y*/

_{SGreen}*Y*, respectively. This ensured we had the same maximum luminance for each color. Then, during the experiment, we scaled the RGB values of each surface color according to its depth depending on the condition.

_{UGreen}*Y*values above can be obtained via the transformation between RGB and CIE XYZ space as follows,

**M**by using the spectroradiometer to measure the XYZ values for uniform RGB patches of (1, 0, 0)

*, green (0, 1, 0)*

^{T}*, blue (0, 0, 1)*

^{T}*. This gave*

^{T}*Y*) of each of the channels in cd/m

^{2}. To obtain the luminances

*Y*,

_{Gray}*Y*, and

_{UGreen}*Y*, we multiplied their corresponding (

_{SGreen}*R, G, B*)

*by the middle row of*

^{T}**M**.

*Z*-axis direction. The latter model assumes a 3-D cluttered scene that begins at depth

*Z*=

*Z*

_{0}and has infinite extent in the

*X*and

*Y*directions.

**X**= (

_{p}*X*,

_{p}*Y*,

_{p}*Z*) depends on the amount of hemispheric sky that is visible from it. The directions of the hemispheric sky are parametrized by polar angle

_{p}*θ*and azimuth

*ϕ*, and

*V*(

**X**,

_{p}*θ*,

*ϕ*) is the visibility function which equals 1 if the hemispheric sky is visible from point

**X**in direction (

_{p}*θ*,

*ϕ*), and 0 otherwise. This yields the following expression for luminance at a surface point

**X**,

_{p}*θ*to obtain the surface area of the unit hemisphere. This area is weighted by Lambert's cosine law, which states that luminance from a Lambertian surface is proportional to the cosine of the angle

*θ*between the direction of the incident light ray and the surface normal. The total surface area of the cosine-weighted hemisphere is

*π*. We integrate only over the visible portions of the hemisphere and so dividing the result by

*π*returns the luminance, a value between 0 and 1.

*V*(

**X**,

_{p}*θ*,

*ϕ*) as a random variable. We compute the probability that the hemispheric sky is visible along a ray,

*p*(

*V*[

**X**,

_{p}*θ*,

*ϕ*]), as done in Langer and Mannan (2012). To get a closed form model of

*p*(

*V*[

**X**,

_{p}*θ*,

*ϕ*]), we assume the elements of the clutter are disks of area

*A*, and we assume that the spatial distribution of the disks is a Poisson process with density

*η*which is the average number of disk centers per unit volume. The parameters

*A*and

*η*can be lumped together as a single constant

*λ*=

*ηA*.

*p*(

*V*[

**X**,

_{p}*θ*,

*ϕ*]) can be written in terms of

*Z*only as follows. The length of any ray from

_{p}**X**= (

_{p}*X*,

_{p}*Y*,

_{p}*Z*) to the edge of the clutter is (

_{p}*Z*–

_{p}*Z*

_{0}) / cos

*θ*, where the term 1 / (cos

*θ*) accounts for the greater path length of a ray with greater polar angle

*θ*. To simplify the integral, we assume the elements of the clutter have surface normal facing the

*Z*direction and so these clutter elements are foreshortened in the ray direction (

*ϕ*,

*θ*). Then, the Poisson model yields:

*θ, ϕ*) is computed as,

*Z*axis, and this illumination hemisphere has uniform luminance over all directions. Second, the clutter has infinite

*XY*extent, which is not the case for cluttered scenes in our stimuli. Third, the luminance variations consider cast shadows only, but ignore interreflections. Fourth, the elements of the clutter have normals parallel to the

*Z*direction. Because the model is based on rather strong assumptions, we cannot and do not claim that this model is photorealistic for the given scenes. Rather the model is meant to capture a qualitative shadowing effect that does occur in real scenes, namely when clutter such as foliage is illuminated under approximately diffuse light.

*X*,

*Y*, and

*Z*positions of the distractors relative to the center of each target, and we matched the slant angles, namely the angles between the

*Z*axis and the occluder's surface normal. Note that we only matched the distractors that occluded the targets. The remaining visibility differences for the two targets were due to random interactions between the direction (tilt) of each distractor's orientation, the variations in aspect ratios and angles of the targets, and parallax between the targets and distractors, namely the viewing direction from the eye to the left and right targets are different.

*Z*= {0, 2, 4, …, 20} cm and target visibilities were defined as the fraction of the target that was visible in the image (which we measured by counting pixels). In the plots, depths up to 8 cm show the visibilities for the near targets and depths beyond 12 cm show the visibilities for far targets. At depth 10 cm, there is no distinction between near and far targets. We used 1,000 scenes for each data point in the plot.

*Z*= 10 cm (Langer et al., 2016) and then a rise in visibility beyond depth 10 cm. This rise is due to the tunnel in front of the far target, namely the size of the tunnel increases as the far target depth increases (recall Figure 1). Recall also that the tunnel is only used when the depth difference is less than 15 cm, and so the near and far targets are in the depth range 10 ± 7.5 cm. This is why the visibility plot for the far target is so much lower for

*Z*= 18 and 20 cm.

*Z*. Recall that the depths in Figure 5a were chosen from

*Z*= {0, 2, 4, …, 20} cm. Similarly, the depth differences in Figure 5b were chosen from Δ

*Z*= {0, 4, 8, …, 20} cm. For Δ

*Z*≤ 15 cm, the mean absolute difference in visibility for Experiment 2 was approximately 30% less than Experiment 1. As expected, this reduction in visibility differences led to an improvement in performance from Experiment 1 to Experiment 2.