Open Access
Article  |   March 2018
Signs of depth-luminance covariance in 3-D cluttered scenes
Author Affiliations
Journal of Vision March 2018, Vol.18, 5. doi:https://doi.org/10.1167/18.3.5
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Milena Scaccia, Michael S. Langer; Signs of depth-luminance covariance in 3-D cluttered scenes. Journal of Vision 2018;18(3):5. https://doi.org/10.1167/18.3.5.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

In three-dimensional (3-D) cluttered scenes such as foliage, deeper surfaces often are more shadowed and hence darker, and so depth and luminance often have negative covariance. We examined whether the sign of depth-luminance covariance plays a role in depth perception in 3-D clutter. We compared scenes rendered with negative and positive depth-luminance covariance where positive covariance means that deeper surfaces are brighter and negative covariance means deeper surfaces are darker. For each scene, the sign of the depth-luminance covariance was given by occlusion cues. We tested whether subjects could use this sign information to judge the depth order of two target surfaces embedded in 3-D clutter. The clutter consisted of distractor surfaces that were randomly distributed in a 3-D volume. We tested three independent variables: the sign of the depth-luminance covariance, the colors of the targets and distractors, and the background luminance. An analysis of variance showed two main effects: Subjects performed better when the deeper surfaces were darker and when the color of the target surfaces was the same as the color of the distractors. There was also a strong interaction: Subjects performed better under a negative depth-luminance covariance condition when targets and distractors had different colors than when they had the same color. Our results are consistent with a “dark means deep” rule, but the use of this rule depends on the similarity between the color of the targets and color of the 3-D clutter.

Introduction
Three-dimensional (3-D) cluttered scenes consist of many small surfaces that are distributed over a volume. Examples include the foliage of bushes, shrubs, trees, as well as tall grass. Depth perception in 3-D cluttered scenes is challenging since there are many occlusions present and the surfaces that make up the clutter often are only partly visible. Despite the challenge of occlusions, the visual system often is able to judge depth in 3-D clutter. For example, motion parallax and binocular disparity cues can be used to judge the depth order of two surfaces within the clutter (Langer, Zheng, & Rezvankhah, 2016), as well as the depth range of the clutter (van Ee & Anderson, 2001). Langer et al. (2016) also showed that the visual system relies on probabilistic occlusion cues for judging depth order in 3-D clutter. The idea is that the fraction of a surface that is visible within 3-D clutter tends to decrease as the depth of the surface increases, and so the relative visibility of two surfaces within the clutter is a cue to their relative depth. 
The experiments of Langer et al. (2016) considered geometric cues only, however, namely binocular disparity, motion parallax, and occlusions (i.e., visibility). The surfaces in the clutter had random gray levels and there was no relationship between the luminance of the surfaces and their depth within the clutter. The experiments we report here take a different approach. We address scenarios in which luminance depends directly on depth. Either luminance increases with depth, or luminance decreases with depth. This manipulation explores a long-standing question in depth perception: When do brighter surfaces appear closer or farther? This question has been addressed using several different scene configurations in the past. The experiments that we report here are the first to address this question for 3-D cluttered scenes. 
Here we review previous studies that have examined depth-luminance covariance. We begin with Schwartz and Sperling (1983) who were the first to consider how depth-luminance covariance is combined with another depth cue, namely perspective. They used rotating Necker cubes consisting of lines rendered on a dark background. The luminance obeyed either a positive or negative depth-luminance covariance. When subjects judged the direction of rotation of the cube, they used a bright-means-near rule and ignored the perspective cues. In particular, when the brighter lines were deeper, subjects incorrectly perceived a deforming cube with the brighter lines in front rather than a rigid cube with brighter lines in the back. Dosher, Sperling, and Wurst (1986) went further by combining the depth-luminance covariance cue with binocular disparity, again using rotating Necker cubes under perspective. They modeled how subjects combined the cues both for static and dynamic displays. They found that darker lines tended to be perceived as deeper but that this depth-luminance covariance cue was weighted lower than the binocular disparity cue. 
What might be the “ecological optics” basis for a dark-means-deep rule for the Necker cube stimuli? Dosher et al. (1986) argued that if 3-D white scene lines were wires (cylinders) that projected to 2-D image lines in a black background, then the width of the image lines would decrease with distance and so the pixel intensity of the lines should decrease (from white to gray). The sign of this depth-luminance covariance would reverse for the case of black lines on a white background—namely, the black lines would be brighter in the image as their scene depth increased, since a more distant line would project to smaller pixel subareas and the white background would fill the remaining pixel subareas. Schwartz and Sperling (1983) also noted informally that for Necker cubes rendered as black lines on white background, subjects indeed used a bright-means-deep rule rather than dark-means-deep rule, and suggested that the sign of the depth-luminance covariance is determined more generally by the contrast of the line with the background rather than by luminance of the line per se. 
Similar observations about contrast and perceived depth have been made for square patches viewed against a black or white background. When a light gray and a dark gray square are both viewed against a black background, the light gray square appears nearer. However, when the same squares are viewed against a white background, the dark gray square appears nearer (Egusa, 1982; Farnè, 1977). It has been argued that these contrast effects are consistent with atmospheric scattering (i.e., aerial perspective)—namely, a distant object will have lower contrast with the background since the luminance of both the object and background contain a common atmospheric component that increases with depth (O'Shea, Blackburn, & Ono, 1994). 
There are other ecological optics theories of why a dark-means-deep or a bright-means-deep rule might apply in a given situation. Under diffuse lighting such as on a cloudy day, surface concavities tend to be darker because the fraction of the diffuse source that is visible tends to be lower in concavities (Langer & Zucker, 1994). This dark-means-deep effect is modulated by factors, however, such as local variations in the surface normal, interreflections, glossiness, and nonuniformity of the diffuse source. Langer and Bülthoff (2000) showed that, for smooth surfaces rendered under uniform diffuse illumination, the visual system takes account of surface normal variations to some extent when comparing depths of neighboring surface points and thus does not simply discriminate depths based on luminance. The question of when the visual system uses dark-means-deep rule for interpreting shape from shading remains controversial (Chen & Tyler, 2015; Todd, Egan, & Kallie, 2015; Tyler, 1998). 
Kim, Wilcox, and Murray (2016) investigated another aspect of the depth-luminance covariance by examining situations in which surfaces are perceived to emit or transmit light. They presented pairs of smooth terrain surfaces, one of which was rendered under diffuse lighting (“dark valley”) and the other was depth inverted and assigned the same luminance at each corresponding surface position (“bright valley”). The only difference between the two cases was the depth information given by either stereo or rotational motion cues. Subjects were asked to judge which surface appeared to glow as if it had a light source inside or behind it. Subjects consistently chose the depth-reversed surfaces (bright valley) as glowing. Thus, subjects used the geometric cues from binocular disparity and dynamic occlusion to distinguish the hills and valleys, and subjects interpreted the luminance variations consistently with the shape defined by the geometric cues. The idea is similar to an observation by Langer (1999) that, in a photographic negative of a diffusely illuminated scene, concave regions tend to be brighter. So if the geometric cues that are given by occlusions constrain the 3-D geometry, then the visual system perceives concave regions as glowing as if there were a light source and interreflections present. The key insight of Kim et al. (2016) was that this glow effect occurs even without occlusion cues being present, in particular, in their static binocular condition. 
A negative depth-luminance covariance (dark-means-deep) also has been demonstrated using images of outdoor natural scenes using joint luminance and depth statistics (Potetz & Lee, 2003; Samonds, Potetz, & Lee, 2012). These natural image statistics findings were consistent with neurophysiological findings that the disparity sensitivity of many neurons in monkey primary visual cortex covary with the neuron's local luminance contrast sensitivity. That is, cells with near or far disparities tend to be sensitive to local luminance maxima and minima respectively (Samonds et al., 2012). These effects have been reanalyzed and confirmed by Cooper and Norcia (2014) who also reported on a psychophysical study. Using natural images along with registered depth maps, they showed the perception of 3-D depth could be enhanced or diminished by biasing the image intensities toward a negative or positive depth-luminance covariance, respectively. For example, they found that for scenes that had a positive depth-luminance covariance, reducing the luminance of the far surfaces enhanced the perception of depth more than reducing the luminance of the near surfaces. This effect cannot be attributed simply to contrast enhancement, since darkening the background of a scene that has a positive depth-luminance covariance decreases the overall image contrast, yet such a manipulation was found to increase the perception of depth. 
Finally, we consider some related work that addressed relationships between perceived depth, luminance, and color. Troscianko, Montagnon, Le Clerc, Malbert, and Chanteau (1991) showed that perceived surface slant of a tiled plane can increase when distant tiles are less saturated and both hue and luminance are held constant. They also showed that slant was not enhanced when luminance and saturation were held constant and only hue varied. The ecological basis for the saturation gradient effect is similar to what we discussed above—namely, the atmospheric effect of aerial perspective. Here, colors from distant surfaces are less saturated because the color is mixed with the atmosphere, which in their case, had neutral hue. 
A different interaction between depth and color saturation arises from interreflections between surfaces. For example, in the case of diffuse lighting, surfaces that are deeper in the clutter tend to be more shadowed, and so a greater percentage of the illumination for deeper surfaces comes from light reflected off other surfaces. Such interreflection effects have been observed in more general scenes as well and can lead to hue and/or saturation gradients as a function of depth (Langer, 1999, 2001; Ruppertsberg, Bloj, & Hurlbert, 2008). Moreover, there is some evidence that when depth information is given by stereo cues, the visual system can disentangle the interreflection effects and perceive the surface color (Bloj, Kersten, & Hurlbert, 1999). This finding is similar to Kim et al. (2016)'s finding about stereo and glow discussed above. 
The experiments that we report in this paper explored the effect of a depth-luminance covariance in 3-D cluttered scenes. The geometry of the scenes was similar to that of Langer et al. (2016) and the experimental procedure was also similar. In each trial, subjects were shown two target surfaces that were embedded in a cluttered 3-D volume and the task was to decide which of two target surfaces was closer. Our experiments varied the sign of the depth-luminance covariance, the background color of the scene (white or black), and the color saturations of the targets and the clutter surfaces. The sign of the depth-luminance covariance was given from the ordinal occlusion relationships. Our main goal was to examine when subjects used the depth-luminance covariance to perform the depth-discrimination task. 
The task of judging the depth order of the two targets was challenging for a few reasons. First, the targets did not overlap in the image so there is no direct occlusion cue to determine their depth order. Second, since the targets were partly occluded by the random clutter, the target visibility was random. Moreover, we manipulated the clutter distribution to remove any systematic relation between the relative visibility of the targets and the depth order of the targets. The only information available for doing the task was the luminance differences between of the targets. There is a limit in how accurately subjects can discriminate target luminances, however, especially in the presence of clutter since the clutter can produce local simultaneous contrast effects. 
The third reason that the task is potentially difficult is that the correct answer on any trial depends on the sign of the depth-luminance covariance in the scene. This sign is readily available from occluders in the clutter. In particular, it is available from the luminance difference between the targets and the distractors that occlude the targets. However, it is unclear when subjects would take this depth-luminance covariance sign into account. In particular, in cases of a positive depth-luminance covariance, the positive sign conflicts with a default dark-mean-deep prior, which has been shown in other studies to play a role in depth perception. A key goal of our experiments is to better understand when subjects use the sign of depth-luminance covariance. 
Method
Apparatus
Images were rendered using OpenGL (Khronos Group, Beaverton, OR) and were displayed using a Dell Precision M6700 laptop (Dell, Round Rock, TX) with an NVIDIA Quadro K4000 graphics card (NVidia, Santa Clara, CA). The laptop's 17-in. LCD monitor was set to maximum brightness. A PR650 spectroradiometer (Photo Research, Syracuse, NY) was used to compute a luminance-to-RGB lookup table for each of the RGB channels. (For more details on color calibration and choice of colors, see Appendix A.) 
Stimuli
Each scene consisted of two targets and 1,000 distractors. The two targets were 4 × 4-cm squares that were separated in depth by a variable amount ΔZ, and were separated horizontally by ΔX = 10 cm. Their XY positions were perturbed in a random XY direction by up to 1 cm. The targets were oriented so that their normal was parallel to the Z axis. Each distractor was a 1 × 1-cm square, randomly oriented in 3D and randomly positioned within a bounding box of size 20 × 20 × 20 cm. 
The scene was rendered using perspective projection. The virtual subject's position was Z0 = 60 cm from the front and center of the clutter bounding box. The projection plane (display screen) was located at a depth of 70 cm from the subject, which corresponded to the center depth of the clutter. 
Because the scenes were rendered under perspective projection, there was a size cue for depth. We removed this size cue for the targets only, similarly to Langer et al. (2016), by rescaling the height and width of each target to be proportional to its inverse depth. This ensured that the visual solid angle of a target at any depth (and in the absence of occlusions) would be equal to the visual angle of a target at the middle depth in the clutter. To help hide the fact that the two targets had the same visual angle in each trial, we jittered each target's aspect ratio so that the target was slightly rectangular, and we rotated each target by a random amount about the Z axis. 
Subjects were seated so that their eye position was 70 cm directly in front on the screen corresponding to the virtual viewing position used in the rendering. We did not use a chin rest, so some variability in subject position was allowed. The task was performed monocularly with the nondominant eye covered with an eye patch. The height of each subject's seat was changed so that the subject's eye would be roughly the same height as the center of the screen. 
The luminance of the targets and distractors varied according to their depth Z as follows. For negative depth-luminance covariance (DLC−), the luminance was chosen to be proportional to exp(–λ[ZZ0]), where Z was the depth (cm) of the center of the distractor or target and Z0 was the depth where the clutter begins. We chose a decay factor of λ = 0.125, based on the density and size of the distractors (Langer & Mannan, 2012). Thus, the intensities of the surfaces ranged from 1.0 down to exp(–0.125 × 20) ≈ 0.082. For positive depth-luminance covariance (DLC+), luminance was proportional to exp(–λ[Z0 + 20 – Z]). That is, it decayed exponentially with the Z distance to the back face of the volume (see Appendix B for more details on how the DLC function is chosen). 
In natural 3-D cluttered scenes such as foliage, it is common for targets embedded in the clutter to have a different color than the clutter itself. An extreme example is a red apple among green leaves. For this reason, we examined depth-luminance covariance not just for gray level surfaces but also for a few simple combinations of surface color. Three different colors were used for the targets and distractors: gray, unsaturated green, and high-saturated green. The RGB values for the surfaces were scaled versions of the following: (1, 1, 1) for gray, (0.33, 0.87, 0.33) for unsaturated green, and (0, 1, 0) for saturated green. For any depth and for any DLC− or DLC+ condition, we scaled the RGB values so that the luminance (CIE Y) would depend on the depth, and be the same for any of the three colors (see Appendix A for more details). 
The visibility of a target is a strong cue for target depth (Langer et al., 2016). Since the goal of our experiment was mainly to investigate luminance effects, we removed the depth information from the target visibilities. We did so by modifying the distribution of the distractors in the clutter—namely, we removed distractors whose center point occluded the far target and so the tunnel's cross-sectional area was the same as the far target's area. We only removed distractors whose depths were between that of the near and far targets (see Figure 1). 
Figure 1
 
Illustration of tunnel for the DLC− condition. The viewing position is from below. (a) The tunnel removes distractors that are between the depths of the near and far target and that could occlude the far target. We used the tunnel only in front of the far target because that was sufficient to equate the expected values of visibility of the two targets for any depth difference. This illustration does not show the random variations in the distractors positions and orientations. (b) In Experiment 2, we matched the distractors that fell in front of the targets (see highlighted black square; see Appendix C for details).
Figure 1
 
Illustration of tunnel for the DLC− condition. The viewing position is from below. (a) The tunnel removes distractors that are between the depths of the near and far target and that could occlude the far target. We used the tunnel only in front of the far target because that was sufficient to equate the expected values of visibility of the two targets for any depth difference. This illustration does not show the random variations in the distractors positions and orientations. (b) In Experiment 2, we matched the distractors that fell in front of the targets (see highlighted black square; see Appendix C for details).
We did not use a tunnel for depth differences ΔZ that exceeded 15 cm. The reason is that when the scenes are rendered with tunnels for high ΔZ values, these tunnels become very apparent. In pilot studies we found that subjects became confused in these conditions, which led to guessing behavior. By not using a tunnel when ΔZ > 15 cm, we gave these scenes a strong visibility cue, which allowed subjects to perform the task easily and ensured that the staircases were well behaved. 
Using a tunnel when ΔZ ≤ 15 cm made the expected visibilities the same for the two targets for each depth difference. However, typically there were still differences in the visibilities of the two targets in each trial since the distractors were randomly positioned and oriented. If subjects were to mistakenly use these random differences in visibility in each trial as a depth cue, then this would compromise their performance and raise depth-discrimination thresholds. To examine if subjects were indeed doing so, we carried out a second version of the experiment (Experiment 2) in which we manipulated the distractor distribution further to reduce the per-trial variability in the visibility difference. (See the highlighted area in Figure 1b that indicates the distractors that we manipulated and see Appendix C for the details of the manipulation.) 
We also ran a third version of the experiment in which subjects were asked to discriminate target luminance rather than depth. That is, we asked them which of the targets was brighter. We did this for two reasons. The first reason was to reassure us that judging relative luminance was not itself the limiting factor in our experiments. The second reason was to reassure us that participants were in fact discriminating depths, rather than just luminances in the main experiments. 
Design
For each experiment, we used 36 different conditions, which were defined by the following combinations (36 = 2 × 2 × 3 × 3): 
  •  
    sign of depth-luminance covariance (DLC+, DLC−)
  •  
    background color (black, white)
  •  
    target color (gray, low-, or high-saturation green)
  •  
    distractor color (gray, low-, or high-saturation green)
Figure 2 shows examples of DLC− conditions. The rows and columns show the different distractor and target colors, respectively. Figure 3 shows examples of DLC+. For both figures, we chose backgrounds that are consistent with the depth-luminance covariance, namely black background for DLC− and white background for DLC+. 
Figure 2
 
Stimuli for dark-means-deep (DLC−) conditions with a black background. Rows represent distractor color and columns represent target color. On the diagonal, targets and distractors have the same color.
Figure 2
 
Stimuli for dark-means-deep (DLC−) conditions with a black background. Rows represent distractor color and columns represent target color. On the diagonal, targets and distractors have the same color.
Figure 3
 
Stimuli for dark-means-near (DLC+) with a white background. On the diagonal, targets and distractors have the same color.
Figure 3
 
Stimuli for dark-means-near (DLC+) with a white background. On the diagonal, targets and distractors have the same color.
Procedure
In each trial, subjects were shown a scene consisting of distractors and two target rectangles. The task was to indicate which target was closer (to the subject) or which target was brighter, depending on the experiment. Subjects responded by the pressing either the left or right arrow key on the keyboard. 
For each experimental condition and for each subject, we estimated a depth-discrimination threshold using a one-up one-down staircase. The values of ΔZ in the staircase were chosen to target a proportion correct of approximately 78% (Garcia-Pérez, 1998). When the subject answered correctly or incorrectly, the distance ΔZ between targets was reduced by a factor of 0.8 or was increased by a factor of 2.19, respectively. Staircase conditions were randomly interleaved. Each staircase began at level ΔZ = 5 cm and then terminated after 10 reversals. To compute the thresholds, we averaged the log of the ΔZ values of the last eight reversals. 
Response time was limited to 4 s. If the subject did not respond in some trial, then a random choice was made and a red X mark would show on the screen. Additionally, there was a rest period after every 100 trials for as long as the subject wanted. The experiment typically lasted around 45 min. 
Participants
For Experiment 1, 20 subjects participated with ages ranging from 19 to 44. For Experiment 2, eight new subjects participated with ages ranging from 19 to 60. For Experiment 3 (luminance discrimination), eight subjects participated with ages ranging from 28 to 60. Each subject was paid $10. Subjects had little or no experience with psychophysics experiments. Each had normal or corrected-to-normal vision. We required subjects to pass the Ishihara's Test Chart for Color Deficiency. Subjects were unaware of the purpose of the experiments. Informed consent was obtained using the guidelines of the McGill Research Ethics Board, which is consistent with the Declaration of Helsinki. 
Results
Plots of the means and standard errors of the thresholds for all three experiments are shown in Figure 4. For each, we ran three-way repeated measures analyses of variance (ANOVAs) to test the effects of depth-luminance covariance (DLC−, DLC+), background color (black, white), and equality of target and distractor hue (diagonals vs. off-diagonals in the figures). We report exact p values. A p value smaller than 0.05 was considered to be significant. These plots pooled the background (black vs. white) conditions since this variable did not have a statistically significant effect (see below). 
Figure 4
 
Mean and standard errors for Experiments 1 and 2 (depth) and Experiment 3 (luminance). The stimuli of Experiment 3 were the same as Experiment 1. The 3 × 3 layout of the plots corresponds to the conditions shown in Figures 2 and 3.
Figure 4
 
Mean and standard errors for Experiments 1 and 2 (depth) and Experiment 3 (luminance). The stimuli of Experiment 3 were the same as Experiment 1. The 3 × 3 layout of the plots corresponds to the conditions shown in Figures 2 and 3.
Experiment 1
A dark-means-deep cue would predict thresholds to be lower for DLC− than DLC+ and this is indeed what we found with means 4.3 and 5.8 cm, respectively. This difference was significant, F(1, 19) = 9.6, p = 0.006. A second main effect was that thresholds were lower when targets and distractors had the same surface color (diagonals) than when they had different colors (off-diagonals), F(1, 19) = 18.1, p = 0.0005, with means 4.5 and 5.5 cm, respectively. There was an interaction effect between DLC and color, F(1, 19) = 31.1, p = 0.00002. When targets and distractors had the same color, thresholds were just slightly lower for DLC− than for DLC+ with means 4.4 and 4.7 cm, respectively. When targets and distractors differed in color, thresholds were much lower for DLC− than DLC+ with means 4.2 and 6.8 cm, respectively. This suggests that subjects relied on a dark-means-deep prior more when the colors of targets differed from the colors of the distractors. This prior is correct for the DLC− stimuli but is incorrect for the DLC+ stimuli. We will return to this point in the Discussion section. 
We had expected performance to be better when the background was consistent with the depth-luminance covariance, namely black background in a dark-means-deep condition and a white background in a dark-means-near condition, rather than vice-versa. However, the mean thresholds for consistent and inconsistent backgrounds were nearly the same, namely 5.0 and 5.1 cm, F(1, 19) = 0.01, p = 0.64. We therefore pooled the results for the plots in Figure 4 for consistent and inconsistent backgrounds, showing 18 conditions instead of 36. 
Experiment 2
The purpose of Experiment 2 was to reduce the per-trial visibility difference between the two targets, in case subjects were mistakenly relying on the visibility difference to perform the task (see Appendix C). Thresholds indeed were lower overall in Experiment 2 than in Experiment 1, with means 4.56 and 5.25 cm, respectively. A one-tailed t test on the signed differences in the mean thresholds for the 36 conditions of Experiments 1 versus 2 revealed a significant difference (t = 4.67, p < 0.0001). 
Otherwise, the general trends were similar to Experiment 1. Some results did not reach significance, but this is unsurprising since we used only eight subjects for Experiment 2 compared to 20 for Experiment 1. Thresholds were lower for DLC− than DLC+ with means 3.3 and 5.6 cm, respectively, F(1, 7) = 8.1, p = 0.02. Thresholds were lower for diagonals than for off-diagonals with means 3.9 and 5.0 cm, respectively, F(1, 7) = 2.76, p = 0.141 The interaction between DLC and diagonal/off-diagonals also was close to significant, F(1, 7) = 0.35, p = 0.09. Thresholds were slightly smaller for DLC− than DLC+ for the diagonals with means 3.2 and 4.6 cm, respectively. Thresholds were much lower for DLC− than DLC+ in the off-diagonal case, with means 3.3 and 6.6 cm, respectively. In general, the DLC− thresholds were similar for all conditions, but the DLC+ thresholds were greater for the off diagonals. Finally, again there was no significant differences between consistent and inconsistent backgrounds, F(1, 7) = 0.6, p = 0.48. The means for a consistent background versus a inconsistent one were 4.3 versus 4.6 cm. 
Experiment 3
We ran a control experiment in which subjects were asked to discriminate target luminance instead of depth. Thresholds were lower in Experiment 3 than in Experiment 1, with means 1.59 and 5.25 cm, respectively. A one-tailed t test on the signed differences in the mean thresholds for the 36 conditions of Experiments 1 versus 3 revealed a significant difference (t = 15.26, p < 10–16). This reassured us that judging relative luminance was not itself the limiting factor in our experiments. This also reassured us that participants were discriminating depths, rather than just luminances in the main experiments. 
We did not run the luminance discrimination task again on the Experiment 2 stimuli, but we would expect performance there to be as good or better than what we found using the Experiment 1 stimuli, for the same reason that performance was better in Experiment 2 than Experiment 1. 
No significant effects were found between conditions. DLC− and DLC+ had means 1.61 and 1.56 cm, respectively, F(1, 7) = 0.58, p = 0.47. Diagonal versus off-diagonal had means 1.64 and 1.53 cm, respectively, F(1, 7) = 2.46, p = 0.16. Consistent and inconsistent background colors had means 1.58 and 1.59 cm, F(1, 7) = 0.02, p = 0.89. 
Discussion
A key finding in our experiments is that subjects behaved as if they used the sign of the depth-luminance covariance to perform the task. One natural strategy for using the sign information is as follows. Use the luminance and depth ordering of the distractors that is given by occlusions to determine the sign of the depth-luminance covariance. With this sign information, compare the two target luminances and choose the closer target according to the sign of the depth-luminance covariance of the distractors. When the target and distractor colors differ, rely less on the DLC sign information and instead rely more on a default dark-means-deep rule or “prior.” 
While the above strategy seems plausible, there is a second strategy that leads to essentially the same results. This second strategy is based on the contrast between targets and the distractors that occlude them. Recall that Schwartz and Sperling (1983) and others suggested that the perceived depth-luminance covariance is determined by the contrast of an object with the background rather than luminance per se. For our scenes, the direct relationship between depth and luminance also provides a contrast cue. However, in our case, the cue is not the contrast between the luminance of the targets with respect to the background. Rather, it is the contrast between the targets and the occluders—namely, the near target has a lower contrast with respect to its occluders than the far target has with respect to its occluders. To put it more simply, the near targets have a more similar luminance to their occluders than the far targets do to their occluders. This contrast cue is valid both in the DLC+ and DLC− conditions. If observers were to base their judgments of target depth on this contrast cue, then they would behave similarly in the task if they used the first strategy described above. Our experiment does not allow us to decide which of these two strategies better explains their behavior. Further experiments would be needed to tease apart these two strategies. 
Another key finding is that when the colors of target and distractors differ, subjects behaved as if they relied more on the dark-means-deep prior. This was somewhat surprising since the targets informally seem easier to segment from the distractors when the colors differ, and so one might expect that subjects could more easily compare them. However, Experiment 3 did not provide any evidence that subjects could discriminate the luminances of the targets better when the colors of distractors and targets differed. What seemed to happen in Experiments 1 and 2 when the targets and distractor colors differed is that the targets were seen as different types of objects than the distractors and so the target luminances were perceived as less related to the distractor DLC sign. Since there were no depth cues for the targets other than the DLC cue from the distractors, subjects may have just relied more on their prior for dark-means-deep. 
Another finding worth discussing is that the background seemed to play little role. The background color is visible outside the X and Y limits of the clutter, but it is sometimes also visible within the clutter when one can see all the way through. In this case, it is difficult to perceive that the leaking background is indeed due to background as opposed to just another occluded surface. In the case that the background color is inconsistent with the depth-luminance covariance, a sliver of background that is visible within the clutter might just be perceived as an outlier. Therefore, we believe that it was the luminances of the clutter rather than the background that provided the main cue that subjects used. Perhaps in sparser scenes containing fewer occlusions, the background color would have more of an effect. 
One might ask if there were any other cues present in our stimuli that allowed subjects to perform the task, for example, perspective cues (size cues) from the distractors. To test if subjects were using information other than the sign information from DLC, we ran Experiment 1 again using three subjects (one author and two new naive subjects) using equiluminant targets. The mean thresholds over all conditions were 11.5 cm, which is well above the thresholds for the actual Experiment 1 but below the 15 cm limit of ΔZ, where the visibility cue is present for our stimuli. Could it be that subjects achieved thresholds below 15 cm by using other cues? We believe the answer is no. We ran a model observer who guesses in conditions when the visibility cue has been removed (ΔZ ≤ 15) and who answers correctly when the visibility cue is present (ΔZ > 15). We ran 300 staircases for such an observer. The mean threshold was approximately 12 cm, which is similar to our three observers for the equiluminant targets. These results suggest that subjects relied entirely on the DLC cues, and the prior for DLC− to perform the task in our two main experiments. 
Conclusion
Our experiments contribute new ideas for investigating and understanding depth perception from luminance variations in complex scenes. In particular, we have shown that occlusions determine the sign of depth-luminance covariance in 3-D cluttered scenes and that subjects can make use of this sign information to discriminate the depths of targets embedded in the clutter. We showed that this sign information was combined with a prior for “dark means deep” (negative depth-luminance covariance). Interestingly, subjects performed better when targets and distractors had the same color saturation—namely, they relied less on a dark-means-deep cue in that case. Further studies are needed to determine if there are other interactions of color, luminance, and perceived depth in 3-D cluttered scenes—for example, if there are interactions with other depth cues such as binocular disparity or motion parallax. 
Acknowledgments
This research was supported by a doctoral grant awarded to Milena Scaccia from the Fonds de Recherche du Quebec - Nature et Technologies (FRQNT), and an FQRNT team research grant and Natural Sciences and Engineering Research Council of Canada (NSERC) discovery grant to Michael Langer. 
Commercial relationships: none. 
Corresponding author: Milena Scaccia. 
Address: School of Computer Science, McGill University, Montreal, Quebec, Canada. 
References
Bloj, M. J., Kersten, D., & Hurlbert, A. C. (1999). Perception of three-dimensional shape influences colour perception through mutual illumination. Nature, 402 (6764), 877.
Chen, C.-C., & Tyler, C. W. (2015). Shading beats binocular disparity in depth from luminance gradients: Evidence against a maximum likelihood principle for cue combination. PLoS One, 10 (8), e0132658.
Cooper, E. A., & Norcia, A. M. (2014). Perceived depth in natural images reflects encoding of low- level luminance statistics. Journal of Neuroscience, 34 (35), 11761– 11768.
Dosher, B. A., Sperling, G., & Wurst, S. A. (1986). Tradeoffs between stereopsis and proximity luminance covariance as determinants of perceived 3D structure. Vision Research, 26 (6), 973– 990.
Egusa, H. (1982). Effect of brightness on perceived distance as a figureground phenomenon. Perception, 11 (6), 671– 676.
Farnè, M. (1977). Brightness as an indicator to distance: Relative brightness per se or contrast with the background? Perception, 6 (3), 287– 293.
Garcia-Pérez, M. A. (1998). Forced-choice staircases with fixed step sizes: Asymptotic and small-sample properties. Vision Research, 38 (12), 1861– 1881.
Kim, M., Wilcox, L. M., & Murray, R. F. (2016). Perceived three-dimensional shape toggles perceived glow. Current Biology, 26 (9), R350– R351.
Langer, M. S. (1999). When shadows become interreflections. International Journal of Computer Vision, 34 (2–3), 193– 204.
Langer, M. S. (2001). A model of how interreflections can affect color appearance. In C. R. Cavonius, K. Knoblauch, B. B. Lee, & J. Pokorny (Eds.), Color Research Application, 26 (S1: Proceedings of the International Colour Vision Society), (pp. S218– S221). Hoboken, New Jersey: John Wiley & Sons, Inc.
Langer, M. S., & Bülthoff, H. H. (2000). Depth discrimination from shading under diffuse lighting. Perception, 29 (6), 649– 660.
Langer, M. S., & Mannan, F. (2012). Visibility in three-dimensional cluttered scenes. Journal of the Optical Society of America A, 29 (9), 1794– 1807.
Langer, M. S., Zheng, H., & Rezvankhah, S. (2016). Depth discrimination from occlusions in 3D clutter. Journal of Vision, 16 (11): 11, 1– 18, https://doi.org/10.1167/16.11.11. [PubMed] [Article]
Langer, M. S., & Zucker, S. W. (1994). Shape-from-shading on a cloudy day. Journal of the Optical Society of America A, 11 (2), 467– 478.
O'Shea, R. P., Blackburn, S. G., & Ono, H. (1994). Contrast as a depth cue. Vision Research, 34 (12), 1595– 1604.
Potetz, B., & Lee, T. S. (2003). Statistical correlations between two-dimensional images and three-dimensional structures in natural scenes. Journal of the Optical Society of America A, 20 (7), 1292– 1303.
Ruppertsberg, A. I., Bloj, M., & Hurlbert, A. (2008). Sensitivity to luminance and chromaticity gradients in a complex scene. Journal of Vision, 8 (9): 3, 1– 16, https://doi.org/10.1167/8.9.3. [PubMed] [Article]
Samonds, J. M., Potetz, B. R., & Lee, T. S. (2012). Relative luminance and binocular disparity preferences are correlated in macaque primary visual cortex, matching natural scene statistics. Proceedings of the National Academy of Sciences, 109 (16), 6313– 6318.
Schwartz, B. J., & Sperling, G. (1983). Luminance controls the perceived 3-D structure of dynamic 2-D displays. Bulletin of the Psychonomic Society, 21 (6), 456– 458.
Todd, J. T., Egan, E. J. L., & Kallie, C. S. (2015). The darker-is-deeper heuristic for the perception of 3D shape from shading: Is it perceptually or ecologically valid? Journal of Vision, 15 (15): 2, 1– 10, https://doi.org/10.1167/15.15.2. [PubMed] [Article]
Troscianko, T., Montagnon, R., Le Clerc, J., Malbert, E., & Chanteau, P.-L. (1991). The role of colour as a monocular depth cue. Vision Research, 31 (11), 1923– 1929.
Tyler, C. W. (1998). Diffuse illumination as a default assumption for shape-from-shading in the absence of shadows. Journal of Imaging Science and Technology, 42 (4), 319– 325.
van Ee, R., & Anderson, B. L. (2001). Motion direction, speed and orientation in binocular matching. Nature, 410 (6829), 690.
Appendix A
Color calibration
When selecting the colors for the surfaces, we wanted to span as large a range of luminances as possible. Since the green channel has the largest luminance we used gray (1, 1, 1) and saturated green (0, 1, 0) to define two of our surface colors. As a third color, we used an unsaturated green, which we defined by transforming the saturated green (0, 1, 0) from RGB to CIELUV, reducing the saturation correlate value by 50%, and then applying the inverse transformation back to RGB. This gave the RGB value (0.33, 0.87, 0.33). To transform from RGB to LUV, we first transformed from RGB to XYZ using the M matrix below, and then from XYZ to LUV. 
To ensure that luminance varied with depth in the same manner for each of the three colors, we needed to know which RGB values of gray and unsaturated green had the same luminance value (Y) as the saturated green. We first determined the maximum luminances of gray YGray, unsaturated green YUGreen and saturated green YSGreen using the transformation between RGB and CIE XYZ below. We rescaled the maximum RGB of gray (1, 1, 1) and unsaturated green (0.33, 0.87, 0.33) by multiplying by YSGreen/YGray and YSGreen/YUGreen, respectively. This ensured we had the same maximum luminance for each color. Then, during the experiment, we scaled the RGB values of each surface color according to its depth depending on the condition. 
The Y values above can be obtained via the transformation between RGB and CIE XYZ space as follows,  
\(\def\upalpha{\unicode[Times]{x3B1}}\)\(\def\upbeta{\unicode[Times]{x3B2}}\)\(\def\upgamma{\unicode[Times]{x3B3}}\)\(\def\updelta{\unicode[Times]{x3B4}}\)\(\def\upvarepsilon{\unicode[Times]{x3B5}}\)\(\def\upzeta{\unicode[Times]{x3B6}}\)\(\def\upeta{\unicode[Times]{x3B7}}\)\(\def\uptheta{\unicode[Times]{x3B8}}\)\(\def\upiota{\unicode[Times]{x3B9}}\)\(\def\upkappa{\unicode[Times]{x3BA}}\)\(\def\uplambda{\unicode[Times]{x3BB}}\)\(\def\upmu{\unicode[Times]{x3BC}}\)\(\def\upnu{\unicode[Times]{x3BD}}\)\(\def\upxi{\unicode[Times]{x3BE}}\)\(\def\upomicron{\unicode[Times]{x3BF}}\)\(\def\uppi{\unicode[Times]{x3C0}}\)\(\def\uprho{\unicode[Times]{x3C1}}\)\(\def\upsigma{\unicode[Times]{x3C3}}\)\(\def\uptau{\unicode[Times]{x3C4}}\)\(\def\upupsilon{\unicode[Times]{x3C5}}\)\(\def\upphi{\unicode[Times]{x3C6}}\)\(\def\upchi{\unicode[Times]{x3C7}}\)\(\def\uppsy{\unicode[Times]{x3C8}}\)\(\def\upomega{\unicode[Times]{x3C9}}\)\(\def\bialpha{\boldsymbol{\alpha}}\)\(\def\bibeta{\boldsymbol{\beta}}\)\(\def\bigamma{\boldsymbol{\gamma}}\)\(\def\bidelta{\boldsymbol{\delta}}\)\(\def\bivarepsilon{\boldsymbol{\varepsilon}}\)\(\def\bizeta{\boldsymbol{\zeta}}\)\(\def\bieta{\boldsymbol{\eta}}\)\(\def\bitheta{\boldsymbol{\theta}}\)\(\def\biiota{\boldsymbol{\iota}}\)\(\def\bikappa{\boldsymbol{\kappa}}\)\(\def\bilambda{\boldsymbol{\lambda}}\)\(\def\bimu{\boldsymbol{\mu}}\)\(\def\binu{\boldsymbol{\nu}}\)\(\def\bixi{\boldsymbol{\xi}}\)\(\def\biomicron{\boldsymbol{\micron}}\)\(\def\bipi{\boldsymbol{\pi}}\)\(\def\birho{\boldsymbol{\rho}}\)\(\def\bisigma{\boldsymbol{\sigma}}\)\(\def\bitau{\boldsymbol{\tau}}\)\(\def\biupsilon{\boldsymbol{\upsilon}}\)\(\def\biphi{\boldsymbol{\phi}}\)\(\def\bichi{\boldsymbol{\chi}}\)\(\def\bipsy{\boldsymbol{\psy}}\)\(\def\biomega{\boldsymbol{\omega}}\)\(\def\bupalpha{\unicode[Times]{x1D6C2}}\)\(\def\bupbeta{\unicode[Times]{x1D6C3}}\)\(\def\bupgamma{\unicode[Times]{x1D6C4}}\)\(\def\bupdelta{\unicode[Times]{x1D6C5}}\)\(\def\bupepsilon{\unicode[Times]{x1D6C6}}\)\(\def\bupvarepsilon{\unicode[Times]{x1D6DC}}\)\(\def\bupzeta{\unicode[Times]{x1D6C7}}\)\(\def\bupeta{\unicode[Times]{x1D6C8}}\)\(\def\buptheta{\unicode[Times]{x1D6C9}}\)\(\def\bupiota{\unicode[Times]{x1D6CA}}\)\(\def\bupkappa{\unicode[Times]{x1D6CB}}\)\(\def\buplambda{\unicode[Times]{x1D6CC}}\)\(\def\bupmu{\unicode[Times]{x1D6CD}}\)\(\def\bupnu{\unicode[Times]{x1D6CE}}\)\(\def\bupxi{\unicode[Times]{x1D6CF}}\)\(\def\bupomicron{\unicode[Times]{x1D6D0}}\)\(\def\buppi{\unicode[Times]{x1D6D1}}\)\(\def\buprho{\unicode[Times]{x1D6D2}}\)\(\def\bupsigma{\unicode[Times]{x1D6D4}}\)\(\def\buptau{\unicode[Times]{x1D6D5}}\)\(\def\bupupsilon{\unicode[Times]{x1D6D6}}\)\(\def\bupphi{\unicode[Times]{x1D6D7}}\)\(\def\bupchi{\unicode[Times]{x1D6D8}}\)\(\def\buppsy{\unicode[Times]{x1D6D9}}\)\(\def\bupomega{\unicode[Times]{x1D6DA}}\)\(\def\bupvartheta{\unicode[Times]{x1D6DD}}\)\(\def\bGamma{\bf{\Gamma}}\)\(\def\bDelta{\bf{\Delta}}\)\(\def\bTheta{\bf{\Theta}}\)\(\def\bLambda{\bf{\Lambda}}\)\(\def\bXi{\bf{\Xi}}\)\(\def\bPi{\bf{\Pi}}\)\(\def\bSigma{\bf{\Sigma}}\)\(\def\bUpsilon{\bf{\Upsilon}}\)\(\def\bPhi{\bf{\Phi}}\)\(\def\bPsi{\bf{\Psi}}\)\(\def\bOmega{\bf{\Omega}}\)\(\def\iGamma{\unicode[Times]{x1D6E4}}\)\(\def\iDelta{\unicode[Times]{x1D6E5}}\)\(\def\iTheta{\unicode[Times]{x1D6E9}}\)\(\def\iLambda{\unicode[Times]{x1D6EC}}\)\(\def\iXi{\unicode[Times]{x1D6EF}}\)\(\def\iPi{\unicode[Times]{x1D6F1}}\)\(\def\iSigma{\unicode[Times]{x1D6F4}}\)\(\def\iUpsilon{\unicode[Times]{x1D6F6}}\)\(\def\iPhi{\unicode[Times]{x1D6F7}}\)\(\def\iPsi{\unicode[Times]{x1D6F9}}\)\(\def\iOmega{\unicode[Times]{x1D6FA}}\)\(\def\biGamma{\unicode[Times]{x1D71E}}\)\(\def\biDelta{\unicode[Times]{x1D71F}}\)\(\def\biTheta{\unicode[Times]{x1D723}}\)\(\def\biLambda{\unicode[Times]{x1D726}}\)\(\def\biXi{\unicode[Times]{x1D729}}\)\(\def\biPi{\unicode[Times]{x1D72B}}\)\(\def\biSigma{\unicode[Times]{x1D72E}}\)\(\def\biUpsilon{\unicode[Times]{x1D730}}\)\(\def\biPhi{\unicode[Times]{x1D731}}\)\(\def\biPsi{\unicode[Times]{x1D733}}\)\(\def\biOmega{\unicode[Times]{x1D734}}\)\begin{equation}\left[ {\matrix{ X \hfill \cr Y \hfill \cr Z \hfill \cr } } \right] = {\bf{M}}\left[ {\matrix{ R \hfill \cr G \hfill \cr B \hfill \cr } } \right].\end{equation}
We obtained the columns of M by using the spectroradiometer to measure the XYZ values for uniform RGB patches of (1, 0, 0)T, green (0, 1, 0)T, blue (0, 0, 1)T. This gave  
\begin{equation}{\bf{M}} = \left[ {\matrix{ {44.5} \hfill&{43.9} \hfill&{20.1} \hfill \cr {23.5} \hfill&{86.2} \hfill&{7.16} \hfill \cr {1.08} \hfill&{8.14} \hfill&{107} \hfill \cr } } \right],\end{equation}
where the middle row is the luminance (Y) of each of the channels in cd/m2. To obtain the luminances YGray, YUGreen, and YSGreen, we multiplied their corresponding (R, G, B)T by the middle row of M.  
Appendix B
Model of Depth-Luminance Covariance
Our DLC− model combines the ideas of Langer and Zucker (1994)'s cloudy day rendering model and Langer and Mannan (2012)'s probabilistic model of surface visibilities in 3-D clutter. The former model assumes the scene is illuminated by a uniform hemispheric sky, centered in the Z-axis direction. The latter model assumes a 3-D cluttered scene that begins at depth Z = Z0 and has infinite extent in the X and Y directions. 
Under the cloudy day rendering model, illumination is assumed to be diffuse and nondirectional. The luminance variations consider cast shadows only, but ignore interreflections. Thus, the luminance of a surface point Xp = (Xp, Yp, Zp) depends on the amount of hemispheric sky that is visible from it. The directions of the hemispheric sky are parametrized by polar angle θ and azimuth ϕ, and V(Xp, θ, ϕ) is the visibility function which equals 1 if the hemispheric sky is visible from point Xp in direction (θ, ϕ), and 0 otherwise. This yields the following expression for luminance at a surface point Xp,  
\begin{equation}L({{\bf{X}}_{\bf{p}}}) = {1 \over \pi }\int_0^{2\pi } {\int_0^{\pi /2} {V({{\bf{X}}_{{\bf{p}}}},\theta ,\phi )\cos \theta \sin \theta \,d\theta \,d\phi .} } \end{equation}
The double integral above integrates over concentric circles of radius sin θ to obtain the surface area of the unit hemisphere. This area is weighted by Lambert's cosine law, which states that luminance from a Lambertian surface is proportional to the cosine of the angle θ between the direction of the incident light ray and the surface normal. The total surface area of the cosine-weighted hemisphere is π. We integrate only over the visible portions of the hemisphere and so dividing the result by π returns the luminance, a value between 0 and 1.  
For a 3-D cluttered scene, we consider visibility V(Xp, θ, ϕ) as a random variable. We compute the probability that the hemispheric sky is visible along a ray, p(V[Xp, θ, ϕ]), as done in Langer and Mannan (2012). To get a closed form model of p(V[Xp, θ, ϕ]), we assume the elements of the clutter are disks of area A, and we assume that the spatial distribution of the disks is a Poisson process with density η which is the average number of disk centers per unit volume. The parameters A and η can be lumped together as a single constant λ = ηA
The function p(V[Xp, θ, ϕ]) can be written in terms of Zp only as follows. The length of any ray from Xp = (Xp, Yp, Zp) to the edge of the clutter is (ZpZ0) / cos θ, where the term 1 / (cos θ) accounts for the greater path length of a ray with greater polar angle θ. To simplify the integral, we assume the elements of the clutter have surface normal facing the Z direction and so these clutter elements are foreshortened in the ray direction (ϕ, θ). Then, the Poisson model yields:  
\begin{equation}p(V({Z_p},\theta ,\phi )) = \exp\{ - \lambda \cos \theta ({Z_p} - {Z_0})/\cos \theta \} .\end{equation}
The Display Formula\(\cos \theta \) terms cancel out and so Display Formula\(p(V({Z_p}))\) depends only on depth,  
\begin{equation}p(V({Z_p})) = \exp\{ - \lambda ({Z_p} - {Z_0})\} .\end{equation}
Then, the expected visibility of the sky in direction (θ, ϕ) is computed as,  
\begin{equation}E[V({Z_p})] = 1 \cdot p(V({Z_p})) + 0 \cdot (1 - p(V({Z_p}))) = \exp\{ - \lambda ({Z_p} - {Z_0})\} .\end{equation}
 
Using the definition of luminance of a surface point Display Formula\(L({{\bf{X}}_{\bf{p}}})\), we use the above to compute the expected luminance of a surface point Display Formula\(E(L({{\bf{X}}_{\bf{p}}}))\). The linearity property of expectation allows us to integrate over the expected visibilities of rays along individual directions,  
\begin{equation}\eqalign{ E[L({{\bf{X}}_{\bf{p}}})]&= E\left[{1 \over \pi }\int_0^{2\pi } {\int_0^{\pi /2} V } ({{\bf{X}}_{\bf{p}}},\theta ,\phi )\cos \theta \sin \theta \,d\theta \,d\phi \right], \cr&= {1 \over \pi }\int_0^{2\pi } {\int_0^{\pi /2} E } [V({{\bf{X}}_{\bf{p}}},\theta ,\phi )]\cos \theta \sin \theta\, d\theta\, d\phi , \cr&= {1 \over \pi }\int_0^{2\pi } {\int_0^{\pi /2} E } [V({Z_p})]\cos \theta \sin \theta\, d\theta \,d\phi . \cr} \end{equation}
The resulting expected luminance of point Display Formula\({{\bf{X}}_{\bf{p}}}\), written in terms of Display Formula\({Z_p}\) only, is given by,  
\begin{equation}E[L({Z_p})] = {1 \over \pi }\int_0^{2\pi } {\int_0^{\pi /2}} \exp\{ - \lambda ({Z_p} - {Z_0})\} \cos \theta \sin \theta \,d\theta\, d\phi .\end{equation}
Integrating the last equation yields,  
\begin{equation}E[L({Z_p})] = \exp\{ - \lambda ({Z_p} - {Z_0})\} .\end{equation}
This is the rendering model that we use for DLC−, namely luminance is chosen to be proportional to Display Formula\(E[L({Z_p}){]}\). The rendering model is based on several assumptions. First, the illumination arrives from a hemisphere centered at the Z axis, and this illumination hemisphere has uniform luminance over all directions. Second, the clutter has infinite XY extent, which is not the case for cluttered scenes in our stimuli. Third, the luminance variations consider cast shadows only, but ignore interreflections. Fourth, the elements of the clutter have normals parallel to the Z direction. Because the model is based on rather strong assumptions, we cannot and do not claim that this model is photorealistic for the given scenes. Rather the model is meant to capture a qualitative shadowing effect that does occur in real scenes, namely when clutter such as foliage is illuminated under approximately diffuse light.  
Appendix C
Manipulating visibility cue in Experiment 2
In Experiment 1, the expected visibility was the same between the two targets, but there was variation in the visibilities of the random distribution of the clutter. This random variation often led to one target being more visible than the other. Visibility is a cue for depth discrimination 3-D cluttered scenes (Langer et al., 2016) and so it is possible that subjects were using the difference in visibility between targets to perform the task, even though in our stimuli this visibility difference contained no information for doing the task since the two targets had the same expected visibility for each depth condition. 
In Experiment 2, we therefore attempted to reduce the subject's use of the visibility cue by reducing the random visibility differences between targets. We did so in each trial by matching the number of distractors falling in front of both targets. Specifically, we matched the number of distractors falling in front of the far target to that falling in front of the near target. We also matched the X, Y, and Z positions of the distractors relative to the center of each target, and we matched the slant angles, namely the angles between the Z axis and the occluder's surface normal. Note that we only matched the distractors that occluded the targets. The remaining visibility differences for the two targets were due to random interactions between the direction (tilt) of each distractor's orientation, the variations in aspect ratios and angles of the targets, and parallax between the targets and distractors, namely the viewing direction from the eye to the left and right targets are different. 
In Figure 5a, we compare the mean target visibilities as a function of depth for Experiments 1 and 2. Target depths were chosen from Z = {0, 2, 4, …, 20} cm and target visibilities were defined as the fraction of the target that was visible in the image (which we measured by counting pixels). In the plots, depths up to 8 cm show the visibilities for the near targets and depths beyond 12 cm show the visibilities for far targets. At depth 10 cm, there is no distinction between near and far targets. We used 1,000 scenes for each data point in the plot. 
Figure 5
 
(a) The visibility plot compares the mean target visibilities as a function of depth for Experiments 1 and 2. One thousand scenes were used for each data point. Target depths were chosen from {0, 2, 4, …, 20} and target visibilities were defined as the fraction of the target that was visible in the image (which we measured by counting pixels). The error bars show standard deviations (and not standard error). (b) The plot compares the mean absolute difference of target visibilities for Experiments 1 and 2 using 1,000 scenes for each data point. Target depth differences ZfarZnear = ΔZ were chosen from {0, 4, 8, …, 20}. The standard deviations are almost identical for Experiments 1 and 2, so to avoid overlap they are shown only for Experiment 1 (with error bars).
Figure 5
 
(a) The visibility plot compares the mean target visibilities as a function of depth for Experiments 1 and 2. One thousand scenes were used for each data point. Target depths were chosen from {0, 2, 4, …, 20} and target visibilities were defined as the fraction of the target that was visible in the image (which we measured by counting pixels). The error bars show standard deviations (and not standard error). (b) The plot compares the mean absolute difference of target visibilities for Experiments 1 and 2 using 1,000 scenes for each data point. Target depth differences ZfarZnear = ΔZ were chosen from {0, 4, 8, …, 20}. The standard deviations are almost identical for Experiments 1 and 2, so to avoid overlap they are shown only for Experiment 1 (with error bars).
The result in Figure 5a shows that indeed the two targets had same expected visibility for each depth condition. The plots show an exponential fall in visibility up to depth Z = 10 cm (Langer et al., 2016) and then a rise in visibility beyond depth 10 cm. This rise is due to the tunnel in front of the far target, namely the size of the tunnel increases as the far target depth increases (recall Figure 1). Recall also that the tunnel is only used when the depth difference is less than 15 cm, and so the near and far targets are in the depth range 10 ± 7.5 cm. This is why the visibility plot for the far target is so much lower for Z = 18 and 20 cm. 
Figure 5b shows the mean absolute difference in visibility between the near and far targets as a function of their depth difference ΔZ. Recall that the depths in Figure 5a were chosen from Z = {0, 2, 4, …, 20} cm. Similarly, the depth differences in Figure 5b were chosen from ΔZ = {0, 4, 8, …, 20} cm. For ΔZ ≤ 15 cm, the mean absolute difference in visibility for Experiment 2 was approximately 30% less than Experiment 1. As expected, this reduction in visibility differences led to an improvement in performance from Experiment 1 to Experiment 2. 
Figure 1
 
Illustration of tunnel for the DLC− condition. The viewing position is from below. (a) The tunnel removes distractors that are between the depths of the near and far target and that could occlude the far target. We used the tunnel only in front of the far target because that was sufficient to equate the expected values of visibility of the two targets for any depth difference. This illustration does not show the random variations in the distractors positions and orientations. (b) In Experiment 2, we matched the distractors that fell in front of the targets (see highlighted black square; see Appendix C for details).
Figure 1
 
Illustration of tunnel for the DLC− condition. The viewing position is from below. (a) The tunnel removes distractors that are between the depths of the near and far target and that could occlude the far target. We used the tunnel only in front of the far target because that was sufficient to equate the expected values of visibility of the two targets for any depth difference. This illustration does not show the random variations in the distractors positions and orientations. (b) In Experiment 2, we matched the distractors that fell in front of the targets (see highlighted black square; see Appendix C for details).
Figure 2
 
Stimuli for dark-means-deep (DLC−) conditions with a black background. Rows represent distractor color and columns represent target color. On the diagonal, targets and distractors have the same color.
Figure 2
 
Stimuli for dark-means-deep (DLC−) conditions with a black background. Rows represent distractor color and columns represent target color. On the diagonal, targets and distractors have the same color.
Figure 3
 
Stimuli for dark-means-near (DLC+) with a white background. On the diagonal, targets and distractors have the same color.
Figure 3
 
Stimuli for dark-means-near (DLC+) with a white background. On the diagonal, targets and distractors have the same color.
Figure 4
 
Mean and standard errors for Experiments 1 and 2 (depth) and Experiment 3 (luminance). The stimuli of Experiment 3 were the same as Experiment 1. The 3 × 3 layout of the plots corresponds to the conditions shown in Figures 2 and 3.
Figure 4
 
Mean and standard errors for Experiments 1 and 2 (depth) and Experiment 3 (luminance). The stimuli of Experiment 3 were the same as Experiment 1. The 3 × 3 layout of the plots corresponds to the conditions shown in Figures 2 and 3.
Figure 5
 
(a) The visibility plot compares the mean target visibilities as a function of depth for Experiments 1 and 2. One thousand scenes were used for each data point. Target depths were chosen from {0, 2, 4, …, 20} and target visibilities were defined as the fraction of the target that was visible in the image (which we measured by counting pixels). The error bars show standard deviations (and not standard error). (b) The plot compares the mean absolute difference of target visibilities for Experiments 1 and 2 using 1,000 scenes for each data point. Target depth differences ZfarZnear = ΔZ were chosen from {0, 4, 8, …, 20}. The standard deviations are almost identical for Experiments 1 and 2, so to avoid overlap they are shown only for Experiment 1 (with error bars).
Figure 5
 
(a) The visibility plot compares the mean target visibilities as a function of depth for Experiments 1 and 2. One thousand scenes were used for each data point. Target depths were chosen from {0, 2, 4, …, 20} and target visibilities were defined as the fraction of the target that was visible in the image (which we measured by counting pixels). The error bars show standard deviations (and not standard error). (b) The plot compares the mean absolute difference of target visibilities for Experiments 1 and 2 using 1,000 scenes for each data point. Target depth differences ZfarZnear = ΔZ were chosen from {0, 4, 8, …, 20}. The standard deviations are almost identical for Experiments 1 and 2, so to avoid overlap they are shown only for Experiment 1 (with error bars).
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×