**Objects such as trees, shrubs, and tall grass consist of thousands of small surfaces that are distributed over a three-dimensional (3D) volume. To perceive the depth of surfaces within 3D clutter, a visual system can use binocular stereo and motion parallax. However, such parallax cues are less reliable in 3D clutter because surfaces tend to be partly occluded. Occlusions provide depth information, but it is unknown whether visual systems use occlusion cues to aid depth perception in 3D clutter, as previous studies have addressed occlusions for simple scene geometries only. Here, we present a set of depth discrimination experiments that examine depth from occlusion cues in 3D clutter, and how these cues interact with stereo and motion parallax. We identify two probabilistic occlusion cues. The first is based on the fraction of an object that is visible. The second is based on the depth range of the occluders. We show that human observers use both of these occlusion cues. We also define ideal observers that are based on these occlusion cues. Human observer performance is close to ideal using the visibility cue but far from ideal using the range cue. A key reason for the latter is that the range cue depends on depth estimation of the clutter itself which is unreliable. Our results provide new fundamental constraints on the depth information that is available from occlusions in 3D clutter, and how the occlusion cues are combined with binocular stereo and motion parallax cues.**

*visibility*of a target to the fraction of the target that is visible, that is, not occluded. Assuming the elements of the 3D clutter are uniformly distributed over the cube volume, the probability that a point is visible decreases exponentially with depth in the clutter (Langer & Mannan, 2012). It follows that the expected value of the visibility of a target decreases exponentially with depth as well. Figure 2b shows the average visibility of short bar targets (Figure 1a) as a function of depth within the clutter over many randomly generated scenes. The error bars in Figure 2b show the standard deviation of visibility. Visibility is a cue for depth discrimination because, when two targets are presented, the one that is less visible is likely to be deeper. Because of the variability in the relationship between depth and visibility, however, this cue does not always produce the correct response in comparing the depths of two targets, and the visual system would benefit from using other cues as well.

*d*

_{1}cm and right target's occluder was at depth

*d*

_{2}cm, where

*d*

_{1}<

*d*

_{2}. Given this information about the occluders, the observer should infer that the right target is more likely to be deeper. The reason is that the right target must lie at a depth beyond

*d*

_{2}whereas the left target could lie at depths between

*d*

_{1}and

*d*

_{2}or beyond

*d*

_{2}. More generally, in a cluttered scene, if an observer can perceive the depth ranges of multiple occluders of each of two targets, then the observer should infer that the deeper target is the one with the deepest occluder.

*Z*in a cluttered scene (see Figure 1a). There is a strong correlation between the maximum occluder depth and the target depth. Note that to use this range cue, the observer must be able to estimate accurately the depth of the occluders, for example, from binocular stereo or motion parallax cues.

*Z*, namely they were positioned at depths

*Z*was chosen using a staircase procedure that will be described as follows. The short bar targets then were separated horizontally (X) by 10 cm, and the XY position of each was randomly perturbed by up to 1 cm in X direction and up to 1.5 cm in Y direction. The long bar targets always were separated vertically by 6.7 cm. The ends of the long bar targets were hidden behind two large flanking occluders (Figure 1b).

^{3}) distractors. Each distractor was a square of width 12 mm, and was assigned a random gray level and random 3D orientation. The position of each distractor within the XYZ bounding volume was chosen according to one of the four probability distributions, which are illustrated in Figure 5.

- “visibility + range”: The distractor distribution is uniform over the 3D bounding volume. The occlusions provide a depth cue because the expected visibility of the target decreases with depth (Figure 2). Occlusions provide a range cue because the expected maximum occluder depth of a target increases with target depth (Figure 3).
- “visibility” (and no range): The distractor distribution is spatially nonuniform. The probability that a distractor appears in the left versus right half volume each is 0.5 as in the uniform case. For the half volume that contains the near target, the probability in the depth interval [
*Z*_{near},*Z*_{far}] from (a) is moved to the interval [*Z*_{far},*Z*_{max}]. For the half volume containing the far target, the distractor probability in the depth interval [*Z*_{near},*Z*_{far}] is moved to the interval [*Z*_{min},*Z*_{near}]. This new distribution provides a visibility cue as in (a), but the range cue is removed since the depth range of foreground occluders is approximately the same for the near and far targets. We will discuss this distribution again later in the paper when we examine ideal observers. - “range” (and no visibility): Here, we make the probability density nonuniform but in a different way from (b). For each left/right half volume, the probability that a distractor has depth less than the target is 0.5 and the probability that the distractor has depth greater than the target is 0.5. There is no visibility cue since the expected visibilities of the near and far targets are the same. There is a range cue since the distractors that can occlude the near target all must have depth less than
*Z*_{near}whereas the distractors that can occlude the far target can have depths up to*Z*_{far}. - “neither” (neither visibility nor range): Here, each distractor has depth less than
*Z*_{near}with probability 0.5 and depth greater than*Z*_{far}with probability 0.5, regardless of in which half volume the distractor lies. There is neither a visibility nor range cue since the distractor distributions are the same in the two half volumes.

*Z*between targets was reduced by a factor 0.8. When the observer answered incorrectly, Δ

*Z*was increased by a factor 2.19. This ratio aimed for approximately 78% correct. If Δ

*Z*increased beyond 20 cm, which normally would put the targets outside the bounding box of the clutter, the near target was presented just in front of the front face at Zmin and the far target just beyond the back face at Zmax. This configuration made the task trivial since the near target was unoccluded and the far target was highly occluded. If the observer still answered incorrectly in this case, the same rule as aforementioned was used for choosing the next staircase level but in the next trial the targets were displayed at the same depths, that is, just below

*Z*

_{min}and just beyond

*Z*

_{max}. Each staircase began at level Δ

*Z*= 12 cm and terminated after 12 reversals. To compute the threshold for a given staircase, we averaged the log of the Δ

*Z*values for the last 10 reversals.

*Z*was 12 cm and a staircase was used to determine the next level. Since the purpose of the practice session was merely to familiarize the subjects with the requirements of the task, we kept the session short: Each condition terminated with the first incorrect answer.

*Z*) thresholds for all 3D clutter conditions, namely for the two types of display (upper vs. lower rows) and for the short and long bar conditions (left vs. right columns). For conditions that we did not test, we plotted a threshold of 20 cm. These were conditions in which the task was considered impossible for Δ

*Z*values less than 20 cm and the task was considered trivial when Δ

*Z*was greater than 20 cm since in that case the near target was entirely visible. We verified by simulation that an observer who guesses when Δ

*Z*< 20 and who is always correct when Δ

*Z*≥ 20 achieves a threshold of over 20. Specifically, we ran 15 simulated observers for each of these conditions, and mean thresholds for each condition were between 20 and 22 (not shown).

*Z*values in cm. To convert these thresholds to stereo disparities, we use:

*Z*= 2 and 20 and these correspond to 9 and 90 arcmin of disparity, respectively. The conversion assumes the observer is at 60 cm from the front face of the clutter.

*t*tests (Microsoft Excel). We tested the null hypothesis that the means of two conditions across 15 observers are the same. Rather than choosing a particular

*p*value threshold to reject the null hypothesis, we just state the

*p*values.

*Z*= 1.6 cm, which corresponds to a binocular disparity of about 7.3 arcmin. This disparity is slightly greater than the nominal interpixel distance (5.5 arcmin) of the Rift, although the Rift's resolution may be slightly worse than this nominal value because of chromatic aberration, which can occur if the eye position is misaligned with the center of the Rift lens. For the fishtank VR display, the mean baseline threshold was 0.2 cm, which corresponds to a binocular disparity of 0.9 arcmin. This disparity is slightly lower than the interpixel distance for the Acer monitor (1.55 arcmin at a screen viewing distance of 60 cm).

*t*= 1.94;

*p*= 0.07).

*p*< 0.05 level, except for one case namely the range cue with the fishtank VR display and with stereo + motion. In this case the

*p*value was 0.09, which is close to significant. We did not expect interactions between the visibility and range cues, and indeed

*p*values for interactions were greater than 0.05 in all but one of the six ANOVAs.

*t*= 4.35;

*p*= 0.0007; Fishtank VR:

*t*= 4.63;

*p*= 0.0004). This is not a trivial result, since the targets have no texture on them and only the left and right edges of the short targets could provide accurate disparity or motion parallax information, and these left and right target edges often were occluded. As an aside, we note that adding texture to the targets could lower thresholds somewhat, but limited target visibility would still be a problem. For example, Figure 2c (gray curve) shows that when Δ

*Z*is small, namely when both targets are near the center of the clutter at

*Z*= 10, only about one fifth of each target's area is binocularly visible.

*p*values for

*t*tests being less than 0.04 in all four cases. The reason for this result may be that stereo is less reliable than motion parallax when depth differences are large, because of stereo fusion limits (Blakemore, 1970; Wilcox & Allison, 2009). That is, stereo suffers beyond Panum's fusional area, but motion parallax does not.

*Z*value of these occlusion rays. This maximum

*Z*value is a lower bound on the target depth. The range-based observer chose the target with the smaller maximum

*Z*value to be the closer target.

*Z*and 5,000 example scenes for each level.

*Z*value for the pooled occlusion rays for each target.

*Z*. For small Δ

*Z*, the expected visibilities of the two targets are nearly the same, and so the monocular visibility-based observer performs at near chance. The binocular observer has little advantage here, since averaging the visibilities of the two eyes reduces the variability of the visibilities only slightly, and the variability remains much greater than Δ

*Z*(recall error bars in Figure 2b). For example, if the visibilities of the two eyes were independent, then averaging the visibilities of the two eyes would reduce the standard deviation of visibility only by a factor 1 /

*Z*increases, the performance of both the monocular and binocular visibility-based ideal observers increase. However, the binocular observers gain less from the second view than they did in the case that Δ

*Z*was small since, as Δ

*Z*increases, the disparity differences between the near target and the occluders of the near target become smaller, because the near target on average is closer to its occluders. It follows that the visibilities of the left and right views of the near target become more similar, and hence the second view of the near target becomes more correlated with the first. A different argument holds for the far target but the result is the same. As Δ

*Z*increases, the visibility of the far target approaches zero in both left and right views, so again the binocular visibility-based ideal observer benefits little from the second view.

*Z*is large, even in monocular viewing since the edge of the foreground region is fuller in the denser half. If one looks for this cue, one can perform the task at a level above chance in the “range” condition by choosing the close target to be the one behind the denser occluders. Similarly, in the “visibility” condition, there is also a difference in density in front of the near targets. However, in this case, to perform above chance one would need to choose the target behind the less dense occluders. We believe it is unlikely that our observers used these perspective cues, since the observers were naïve, blocks were short (12 reversals only), the density differences just described were only apparent for large Δ

*Z*, and the relationship between density and the correct answer in each condition is not obvious. Nonetheless, such density differences from perspective could have played some role.

*Z*, but below 78% threshold. Thus we believe that this cue provided no benefit for human observers.

*Z*is large, even under monocular viewing. In this case, observers may have ignored the visibility cue or range cue (whichever is present) and instead selected the closer target as the one behind the denser (or less dense) foreground clutter. Although we cannot rule out such a strategy, we think the strategy is unlikely since it would have led to systematically incorrect responses when there was only a visibility cue (Figure 5b) and to systematically correct responses when there was only a range cue (Figure 5c) and we did not observe such behavior. Nonetheless, the possibility exists and should be considered when designing future experiments.

*, 5, 421–432.*

*Perception & Psychophysics**, 2, 363–371.*

*Journal of Experimental Psychology: Human Perception and Performance**, 3, 239–265.*

*ACM Transactions on Information Systems**, 33 (12), 1723–1737.*

*Vision Research**, 211, 599–622.*

*Journal of Physiology**, 36 (21), 3457–3468.*

*Vision Research**, 30, 7269–7280.*

*The Journal of Neuroscience**, 254, 756–767.*

*Journal of Theoretical Biology**, 38 (12), 1861–1881.*

*Vision Research**, 49, 2666–2685.*

*Vision Research**, 11, 1–16. doi:10.1167/14.12.11. [PubMed] [Article]*

*Journal of Vision*,*14*(12)*, 34, 2259–2275.*

*Vision Research**, 38, 1655–1682.*

*Vision Research**, 35, 389–412.*

*Vision Research**, 9, 1794–1807.*

*Journal of the Optical Society of America A**, 187–194.*

*IEEE International Conference on Robotics and Automation (ICRA)**, 11, 1811–1825.*

*Vision Research*30*, 3, 483–499.*

*Human Factors: The Journal of the Human Factors and Ergonomics Society**, (pp. 757–764).*

*Proceedings of the AFIPS Fall Joint Computer Conference**, 5, 1–10. doi:10.1167/8.5.5. [PubMed] [Article]*

*Journal of Vision*,*8*(5)*, 410, 690–694.*

*Nature**, 49, 2653–2665.*

*Vision Research*