October 2019
Volume 19, Issue 12
Open Access
Article  |   October 2019
Density discrimination with occlusions in 3D clutter
Author Affiliations
Journal of Vision October 2019, Vol.19, 10. doi:https://doi.org/10.1167/19.12.10
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Milena Scaccia, Michael S. Langer; Density discrimination with occlusions in 3D clutter. Journal of Vision 2019;19(12):10. doi: https://doi.org/10.1167/19.12.10.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

We examined how well human observers can discriminate the density of surfaces in two halves of a rotating three-dimensional cluttered sphere. The observer's task was to compare the density of the front versus back half or the left versus right half. We measured how the bias and sensitivity in judging the denser half depended on the level of occlusion and on the area and density of the surfaces in the clutter. When occlusion level was low, observers in the front-back task were biased to judge the back as denser, and when occlusion level was high they were biased to judge the front as denser. Weber fractions decreased as density increased for both the front-back and left-right tasks, consistent with previous findings for two-dimensional density discrimination. Weber fractions did not vary significantly with area for the front-back task, but increased with area for the left-right task, and we attribute this difference to occlusions that have different effects in the two tasks. We also ran model observers that compared the image occupancies of the two halves against a known expected difference. As the occlusion level increased, this expected difference followed a similar trend as the biases of the human observers, with a roughly constant offset between them. Weber fractions for human and model observers followed some similar trends, but there were discrepancies as well that can be partly explained by the information available to human versus model observers in carrying out their respective tasks.

Introduction
Three-dimensional (3D) clutter consists of many small surfaces that are distributed randomly in a volume. Examples include foliage and branches of a tree, tall grass, etc. Little is known about how well the human visual system can judge the spatial distribution of 3D clutter. Most studies of 3D clutter have investigated clutter that is composed of many points or thin lines. Such studies have typically concentrated on depth cues such as motion and binocular disparity, and have ignored other cues such as occlusions and luminance. Occlusions are an important cue for perceiving the spatial distribution of 3D clutter since points that are deeper in the volume are less likely to be visible, and so occlusions provide probabilistic information about depth. For example, Langer, Zheng, and Rezvankhah (2016) examined 3D cluttered scenes consisting of squares that were randomly distributed in a volume, and showed that observers could use occlusion cues to discriminate the depth of target surfaces embedded in the clutter. Scaccia and Langer (2018) used similar scenes and showed there was an interaction between occlusion cues and color + luminance cues. Both of these studies investigated how well observers can discriminate the depths of two target surfaces within the clutter. In this paper, we address a different question, namely, how well can observers discriminate the density in the two halves of a 3D cluttered scene? We use the term “density” rather than “number” for our experiments, mainly because this is the term we used to explain the task to the subjects. Otherwise, there is no meaningful difference between the terms for our stimuli, since the two quantities are directly related in our stimuli. 
Early studies measured how well human observers can directly estimate the number of dots presented. Taves (1941) and Kaufman, Lord, Reese, and Volkmann (1949) found that when the dot number was below seven, observers could count the dots, but for greater dot numbers the accuracy in estimating the number of dots decreased. A similar effect was found in studies that measured observers' ability to detect target patterns of dots against a noisy random-dot background (French, 1954; Barlow, 1978). As the number of background dots increased relative to the number of target dots, the observer's ability to detect the target dots decreased. 
Subsequent studies considered the relationship between perceived two-dimensional (2D) density and numerosity. There is some controversy about whether the mechanisms for perceiving the two are the same or not. Burr and Ross (2008) found that numerosity is a primal visual property because it is sensitive to adaptation and that observers estimate numerosity independently of density. This finding was challenged by Durgin (2008) who had shown that adaptation is determined by texture density and not numerosity (Durgin, 1995). Other studies claimed that numerosity and density are not independent. In a task comparing two separate 2D patches, Dakin, Tibber, Greenwood, Kingdom, and Morgan (2011) and Tibber, Greenwood, and Dakin (2012) found similar just noticeable differences (JNDs) in density and numerosity discrimination. They also showed perceived density and number were biased by increases in area, such that larger areas were perceived as denser and more numerous. Bell, Manson, Edwards, and Meso (2015) also found that perceived density was biased by increases in area, but did not find that perceived numerosity was biased by increases in area. They also showed that neither perceived density nor numerosity were biased by increases in volume. Several groups measured Weber fractions for numerosity discrimination. Burr and Ross (2008) and Ross and Burr (2010) showed that numerosity obeys Weber's law for low densities. For high densities, Anobile, Cicchini, and Burr (2014), Anobile, Turi, Cicchini, and Burr (2015) and Cicchini, Anobile, and Burr (2016) found that the thresholds increased only with the square root of density, corresponding to a decreasing Weber fraction. From this, they inferred that there exist different mechanisms for perceiving numerosity and density at low and high densities. 
Other studies have examined density and number perception in depth layers. The results of these studies are pertinent to our current work as they explore density perception of the front versus back layers, which is related to our task of judging the front versus back volume. Tsirlin, Allison, and Wilcox (2012) showed that for subjects to segregate two stereo-transparent planes, a greater interplane disparity is needed if the front plane is sparser than the back plane rather than vice versa. They also found that for observers to perceive the front and back planes to be equally dense, the front plane needed to be denser. They proposed that this bias was due to a figure-ground effect where the area between dots is assigned to the back plane. This back plane bias was also found in a moving dots study by Schütz (2012). Aida, Kusano, Shimono, and Tam (2015) also showed that multiple depth layers viewed in stereo were perceived as having more dots than only one depth layer, even if the total number of dots was the same in the two cases. Again, this was thought to be due to the overestimation of the back surface. 
Our current study considers volumes with many occluding surfaces. Occlusions have been neglected in previous studies of density estimation in volumes, namely these studies assumed the 3D clutter consisted of lines and dots with little or no occlusions. Harris (2014) examined the perceived depth of cluttered scenes consisting of line elements. Both disparity gradients and number of elements were varied. Subjects judged a pair of stereo-transparent planes to have a greater range of depth than a cluttered volume, even if in both cases the volume had the same depth and the same number of line elements. There was no significant effect from using a small versus large number of line elements. Goutcher and Wilcox (2016) also examined disparity volumes and tested how subjects discriminated the spread in depth and location in depth of the volume. They found that subjects used only the extreme disparity values to make their judgments. Sun, Baker, and Kingdom (2018) showed that binocular disparity affects perceived simultaneous density contrast where center and surround dots of a texture are presented at different disparities. They showed that simultaneous density contrast was reduced with larger plane separation or larger volumes. 
Another variable that has been studied in density estimation is luminance. In non-overlapping stimuli, Ross and Burr (2010) showed that perceived numerosity varied inversely with luminance, whereas perceived 2D texture density did not. Tibber et al. (2012) found that varying the luminance contrast had no effect on numerosity and density discrimination in 2D. For overlapping surfaces presented with motion and with or without disparities, Schütz (2012) showed that the bias to see the back plane as more numerous was reduced when the front and back surfaces were assigned opposite contrast, i.e., black or white dots on a gray background. This was presumably because it was easier to segment the front and back when they had different luminance than when they had the same luminance. However, they found no evidence that a luminance difference facilitated density discrimination, as the JNDs were not affected. 
The studies discussed so far involved density or number discrimination of scenes consisting of dots or lines with little or no occlusion. Our study is fundamentally different in that we examine density discrimination in 3D clutter in which occlusion effects are significant. We varied the amount of occlusion of surfaces in the 3D clutter by manipulating both the size and density of the surfaces. The scenes were presented monocularly. However, a static monocular view of 3D clutter yields only a weak percept of 3D volume. To provide richer depth cues, we rotated each scene back and forth by a small angle. This gave a strong kinetic depth effect, and the dynamic occlusion cues typically provided enough information to resolve the twofold rotation ambiguity. 
Our goal was to examine the effects of occlusion, comparing the density in two halves of a cluttered volume. A brief overview of the experiments is as follows. Experiments 1 and 2 measure performance in discriminating density in the front and back halves of a cluttered volume. Experiment 1 uses black and white surfaces only, and Experiment 2 uses a luminance gradient namely a depth-luminance covariance. The black-white representation in Experiment 1 is similar to many 2D texture studies and to layer studies such as Schütz (2012), except that they used two planes rather than two volume halves. The luminance gradient in Experiment 2 is similar to what we used in our previous work where we examined depth discrimination (Scaccia & Langer, 2018). Experiment 2 allows us to compare performance in two types of luminance variation. 
Experiments 3 and 4 examine how well observers can discriminate density in the left versus right halves of a volume. Both Experiments 3 and 4 use black and white surfaces. Experiments 3 and 4 were done to establish a baseline since the left-right task should be easier than the front-back task, as we will argue later. They also allow us to probe whether observers use similar strategies in the front-back and left-right tasks, by comparing how the performance varies over scene parameters in the two different tasks. 
We also present results for a model observer that counts pixels that correspond to visible surfaces in the two halves, and that compares the number of pixels or “image occupancy” of the two halves. Our motivation for studying this model observer was twofold. First, we wanted to understand the information that is available for doing that version of the task—namely a pixel-level comparison of the two halves, rather than a surface density comparison. In particular, we wanted to understand how (if at all) this information varied between density and area conditions of our experiment. Second, we wanted to compare results of the human and model observers to see if human observers' performance followed that of the model observers, which could indicate that human observers might be influenced by this image occupancy information. 
Method
Subjects
Ten subjects participated in Experiments 1 to 3, with ages ranging from 19 to 71. The order of experiments 1 to 3 was randomized for each subject. Experiment 4 was run with six new subjects ages 19 to 61. Each subject was paid $10. Subjects had little or no experience with psychophysics experiments and were unaware of the purpose of the experiments. Each had normal or corrected-to-normal vision. Informed consent was obtained using the guidelines of the McGill Research Ethics Board, which is consistent with the Declaration of Helsinki. 
Apparatus
Images were rendered using OpenGL (Khronos Group, Beaverton, OR) and were displayed using a Dell Precision T7610 workstation (Dell, Round Rock, TX) with an NVIDIA Quadro K4000 graphics card (NVIDIA, Santa Clara, CA). A 27-in. Apple monitor was used (Apple, Cupertino, CA). The display was gamma-corrected so that the displayed luminance values were proportional to the rendered gray level values. 
Stimuli
The clutter in each scene consisted of a set of squares, which were positioned and oriented randomly within two halves of a sphere of fixed diameter 24 cm. Each scene was defined by a mean density η, namely the number of surfaces per cm3, and the area A of each surface. The total number of surfaces in each scene was N = ηV, where V is the fixed volume of the sphere. The value N was rounded to the nearest integer. See Table 1 for the values of η, A, and N that were used in the different conditions of the experiments. 
Table 1
 
The 3 × 3 table shows values of mean density η, area A, and number N of surfaces for stimuli in our experiments. The occlusion factor λ = ηA is constant in each column and increases from left to right. The area A is constant on the main diagonal and increases on the cross diagonal. The mean density η is constant on the cross diagonal and increases on the main diagonal.
Table 1
 
The 3 × 3 table shows values of mean density η, area A, and number N of surfaces for stimuli in our experiments. The occlusion factor λ = ηA is constant in each column and increases from left to right. The area A is constant on the main diagonal and increases on the cross diagonal. The mean density η is constant on the cross diagonal and increases on the main diagonal.
The values of η and A in Table 1 were chosen such that their product  
\(\def\upalpha{\unicode[Times]{x3B1}}\)\(\def\upbeta{\unicode[Times]{x3B2}}\)\(\def\upgamma{\unicode[Times]{x3B3}}\)\(\def\updelta{\unicode[Times]{x3B4}}\)\(\def\upvarepsilon{\unicode[Times]{x3B5}}\)\(\def\upzeta{\unicode[Times]{x3B6}}\)\(\def\upeta{\unicode[Times]{x3B7}}\)\(\def\uptheta{\unicode[Times]{x3B8}}\)\(\def\upiota{\unicode[Times]{x3B9}}\)\(\def\upkappa{\unicode[Times]{x3BA}}\)\(\def\uplambda{\unicode[Times]{x3BB}}\)\(\def\upmu{\unicode[Times]{x3BC}}\)\(\def\upnu{\unicode[Times]{x3BD}}\)\(\def\upxi{\unicode[Times]{x3BE}}\)\(\def\upomicron{\unicode[Times]{x3BF}}\)\(\def\uppi{\unicode[Times]{x3C0}}\)\(\def\uprho{\unicode[Times]{x3C1}}\)\(\def\upsigma{\unicode[Times]{x3C3}}\)\(\def\uptau{\unicode[Times]{x3C4}}\)\(\def\upupsilon{\unicode[Times]{x3C5}}\)\(\def\upphi{\unicode[Times]{x3C6}}\)\(\def\upchi{\unicode[Times]{x3C7}}\)\(\def\uppsy{\unicode[Times]{x3C8}}\)\(\def\upomega{\unicode[Times]{x3C9}}\)\(\def\bialpha{\boldsymbol{\alpha}}\)\(\def\bibeta{\boldsymbol{\beta}}\)\(\def\bigamma{\boldsymbol{\gamma}}\)\(\def\bidelta{\boldsymbol{\delta}}\)\(\def\bivarepsilon{\boldsymbol{\varepsilon}}\)\(\def\bizeta{\boldsymbol{\zeta}}\)\(\def\bieta{\boldsymbol{\eta}}\)\(\def\bitheta{\boldsymbol{\theta}}\)\(\def\biiota{\boldsymbol{\iota}}\)\(\def\bikappa{\boldsymbol{\kappa}}\)\(\def\bilambda{\boldsymbol{\lambda}}\)\(\def\bimu{\boldsymbol{\mu}}\)\(\def\binu{\boldsymbol{\nu}}\)\(\def\bixi{\boldsymbol{\xi}}\)\(\def\biomicron{\boldsymbol{\micron}}\)\(\def\bipi{\boldsymbol{\pi}}\)\(\def\birho{\boldsymbol{\rho}}\)\(\def\bisigma{\boldsymbol{\sigma}}\)\(\def\bitau{\boldsymbol{\tau}}\)\(\def\biupsilon{\boldsymbol{\upsilon}}\)\(\def\biphi{\boldsymbol{\phi}}\)\(\def\bichi{\boldsymbol{\chi}}\)\(\def\bipsy{\boldsymbol{\psy}}\)\(\def\biomega{\boldsymbol{\omega}}\)\(\def\bupalpha{\unicode[Times]{x1D6C2}}\)\(\def\bupbeta{\unicode[Times]{x1D6C3}}\)\(\def\bupgamma{\unicode[Times]{x1D6C4}}\)\(\def\bupdelta{\unicode[Times]{x1D6C5}}\)\(\def\bupepsilon{\unicode[Times]{x1D6C6}}\)\(\def\bupvarepsilon{\unicode[Times]{x1D6DC}}\)\(\def\bupzeta{\unicode[Times]{x1D6C7}}\)\(\def\bupeta{\unicode[Times]{x1D6C8}}\)\(\def\buptheta{\unicode[Times]{x1D6C9}}\)\(\def\bupiota{\unicode[Times]{x1D6CA}}\)\(\def\bupkappa{\unicode[Times]{x1D6CB}}\)\(\def\buplambda{\unicode[Times]{x1D6CC}}\)\(\def\bupmu{\unicode[Times]{x1D6CD}}\)\(\def\bupnu{\unicode[Times]{x1D6CE}}\)\(\def\bupxi{\unicode[Times]{x1D6CF}}\)\(\def\bupomicron{\unicode[Times]{x1D6D0}}\)\(\def\buppi{\unicode[Times]{x1D6D1}}\)\(\def\buprho{\unicode[Times]{x1D6D2}}\)\(\def\bupsigma{\unicode[Times]{x1D6D4}}\)\(\def\buptau{\unicode[Times]{x1D6D5}}\)\(\def\bupupsilon{\unicode[Times]{x1D6D6}}\)\(\def\bupphi{\unicode[Times]{x1D6D7}}\)\(\def\bupchi{\unicode[Times]{x1D6D8}}\)\(\def\buppsy{\unicode[Times]{x1D6D9}}\)\(\def\bupomega{\unicode[Times]{x1D6DA}}\)\(\def\bupvartheta{\unicode[Times]{x1D6DD}}\)\(\def\bGamma{\bf{\Gamma}}\)\(\def\bDelta{\bf{\Delta}}\)\(\def\bTheta{\bf{\Theta}}\)\(\def\bLambda{\bf{\Lambda}}\)\(\def\bXi{\bf{\Xi}}\)\(\def\bPi{\bf{\Pi}}\)\(\def\bSigma{\bf{\Sigma}}\)\(\def\bUpsilon{\bf{\Upsilon}}\)\(\def\bPhi{\bf{\Phi}}\)\(\def\bPsi{\bf{\Psi}}\)\(\def\bOmega{\bf{\Omega}}\)\(\def\iGamma{\unicode[Times]{x1D6E4}}\)\(\def\iDelta{\unicode[Times]{x1D6E5}}\)\(\def\iTheta{\unicode[Times]{x1D6E9}}\)\(\def\iLambda{\unicode[Times]{x1D6EC}}\)\(\def\iXi{\unicode[Times]{x1D6EF}}\)\(\def\iPi{\unicode[Times]{x1D6F1}}\)\(\def\iSigma{\unicode[Times]{x1D6F4}}\)\(\def\iUpsilon{\unicode[Times]{x1D6F6}}\)\(\def\iPhi{\unicode[Times]{x1D6F7}}\)\(\def\iPsi{\unicode[Times]{x1D6F9}}\)\(\def\iOmega{\unicode[Times]{x1D6FA}}\)\(\def\biGamma{\unicode[Times]{x1D71E}}\)\(\def\biDelta{\unicode[Times]{x1D71F}}\)\(\def\biTheta{\unicode[Times]{x1D723}}\)\(\def\biLambda{\unicode[Times]{x1D726}}\)\(\def\biXi{\unicode[Times]{x1D729}}\)\(\def\biPi{\unicode[Times]{x1D72B}}\)\(\def\biSigma{\unicode[Times]{x1D72E}}\)\(\def\biUpsilon{\unicode[Times]{x1D730}}\)\(\def\biPhi{\unicode[Times]{x1D731}}\)\(\def\biPsi{\unicode[Times]{x1D733}}\)\(\def\biOmega{\unicode[Times]{x1D734}}\)\begin{equation}\tag{1}\lambda \equiv \eta A\end{equation}
is constant within each column and increases from left to right. The parameter λ is called the occlusion factor (Langer & Mannan, 2012). It is the expected total area of the surfaces in the clutter per cm3. The greater the value of λ, the more occlusions tend to occur. Note that the area A decreases and mean density η increases as one goes down each column of Table 1, and these variations exactly cancel to keep λ constant in each column. Also, the mean density η increases and area A is constant on the main diagonal (top left to bottom right), and area increases and mean density is constant on the cross-diagonal (bottom left to top right). These variations in λ, η, and A will be indicated by three arrows in subsequent figures.1  
Our experiments measure how well observers can discriminate the density of surfaces in the front versus back halves and the left versus right halves. The levels of Δη in the two halves were chosen separately for each mean density value η, namely we defined nine density difference levels:  
\begin{equation}\tag{2}\Delta \eta = \left\{ 0, \pm {\eta \over 4}, \pm {\eta \over 2}, \pm {{3\eta } \over 4}, \pm \eta \right\} .\end{equation}
 
Thus the density of the two halves of each scene was Display Formula\(\eta \pm {{\Delta \eta } \over 2}\). Examples will be shown later in Figure 2
We next consider the image formation model. Each cluttered scene sphere was rendered using perspective projection. The minimum and maximum depths of the clutter were defined to be at Z0 = 58 and Zmax = 82 cm from the virtual subject's position, that is, the center of projection. This depth range corresponds to the diameter of the sphere. The projection plane or display screen was defined at Zmid = 70 cm, which was the center depth of the sphere. Perspective effects were present, but were relatively weak since many surfaces were partly occluded and so there was large variation in image sizes of visible portions of the surfaces. 
Scenes were rotated back and forth about the X-axis with an amplitude of 10° and a rotational velocity of ±10°/s. Each stimulus was presented for 4 seconds, namely two periods of motion. The motion gave a strong kinetic depth effect. Moreover, dynamic occlusions between surfaces typically specified their ordinal depth, and so there was less of a tendency for depth reversals than what one typically has in 3D cluttered scenes, namely if one uses small random dots. The motion may also help segment the front and back halves because the rotation yields opposite motion directions in the front versus back halves. 
In Experiments 1 and 2, the observer's task was to indicate whether the front or back half was denser. In Experiment 1, the squares in the front half had a different color than squares in the back half (black versus white). WB means white for the front half and black for the back half, and BW means black for the front and white for the back. We did not expect a difference between the WB and BW conditions, based on (Schütz, 2012) and (Tibber et al., 2012), but we included the two conditions just to be sure. Figure 1 shows examples of WB conditions; also see supplementary materials for examples of WB conditions (Supplementary File S1 [high occlusion λ = 0.08], Supplementary File S2 [medium occlusion λ = 0.04], and Supplementary File S3 [low occlusion λ = 0.02]). The different density and area combinations in the figure correspond to the entries in Table 1. Figure 2a shows examples of the Experiment 1 stimuli for different densities in the two halves. Specifically, the four rows illustrate density differences Display Formula\(\Delta \eta = \pm {\eta \over 2}, \pm \eta \) for each of the four Experiments. 
Figure 1
 
Example stimuli for Experiments 1 and 3 (front-back and left-right, WB) at level Δη = 0. The 3 × 3 layout corresponds to Table 1. The arrows indicate the direction in which the variables A, λ, η increase.
Figure 1
 
Example stimuli for Experiments 1 and 3 (front-back and left-right, WB) at level Δη = 0. The 3 × 3 layout corresponds to Table 1. The arrows indicate the direction in which the variables A, λ, η increase.
Figure 2
 
Examples of stimuli for Experiments 1-4 at levels \(\Delta \eta = \pm {\eta \over 2}, \pm \eta \). In each case, the values of density η and area A correspond to the middle condition in Table 1.
Figure 2
 
Examples of stimuli for Experiments 1-4 at levels \(\Delta \eta = \pm {\eta \over 2}, \pm \eta \). In each case, the values of density η and area A correspond to the middle condition in Table 1.
In Experiment 2, we chose the luminance of the squares to vary continuously with depth Z. For negative depth-luminance covariance (DLC−), the luminance was chosen to be proportional to Display Formula\({({{{Z_{max}} - Z} \over {{Z_{max}} - {Z_0}}})^3}\), where Z was the depth (cm) of the center of the square, Z0 and Zmax were the near and far limits of the clutter, as defined earlier. We chose this power law based on a simple Display Formula\({Y^{{1 / 3}}}\) approximation to the CIELUV's luminance factor, so that equal steps in normalized depth Display Formula\({{{Z_{max}} - Z} \over {{Z_{max}} - {Z_0}}}\) in [0,1] would have roughly equal steps in brightness (perceived luminance). Similarly, for positive depth-luminance covariance (DLC+), we chose luminance to be proportional to Display Formula\({({{Z - {Z_0}} \over {{Z_{max}} - {Z_0}}})^3}\). We used a green background for Experiment 2 so that subjects could more easily distinguish between the surfaces and the background. Figure 2b shows examples of the Experiment 2 stimuli. 
Note that in Experiment 2, surfaces that are near the middle depth all have roughly the same luminance, regardless of whether they fall in the front half or back half. Thus, it is inherently difficult from the luminance information to decide if such surfaces belong to one half versus the other. Although the rotational motion cue provides some information for segmenting the front and back halves, this motion information is also weakest for surfaces near the middle depth of the volume, as these surfaces hardly move. For both of these reasons, for Experiment 2, observers must base their front-back judgment more on surfaces that are well short of or well beyond the middle depth. Since there is less information available in Experiment 2, we expect the Experiment 2 task to be more difficult and performance to be worse than in the Experiment 1 task where surfaces in the two halves are either white or black. As we will see later, performance was indeed worse for Experiment 2. 
In Experiments 3 and 4, the observer's task was to indicate whether the left or right half of the clutter was denser. The performance in this task gave us a baseline against which to compare the front-back performance, and also allowed us to compare human performance to that of a model observer who used the simple strategy of counting pixels that correspond to visible surfaces in the two halves. In Experiment 3, the surfaces in the front and back were WB and BW as in Experiment 1, but now the density was varied between the left and right halves. In Experiment 4, surfaces on the left and right were white and black (WB), respectively, or vice versa (BW). Experiment 4 was added to test whether having black and white on the left and right halves would give different performance from having black and white on the front and back halves as in Experiment 3. See Figure 2c and 2d for examples of Experiments 3 and 4 stimuli. 
The WB and BW scenes in Experiments 1, 3, and 4 were presented on a gray (0.5, 0.5, 0.5) background. The luminance gradient scenes in Experiment 2 were presented on a green background, which had the same Y value as the gray background. See the Appendix in Scaccia and Langer (2018) for more details on display calibration and linearization. 
Design
Our independent variables within each experiment were area A and density η and their product λ, and the sign of luminance variation across the volume (WB/BW or DLC−/DLC+). 
For each of the four experiments and for each combination of independent variables, the stimulus levels Δη from trial to trial were chosen using a staircase procedure. We used a 1-up/1-down staircase with the nine stimulus levels Δη, which were described above in Equation 2. The staircases targeted a proportion of choosing each of the two halves in 50% of the trials, i.e., the point of subjective equality (PSE). For each of the four experiments, the staircases for the different conditions were randomly interleaved. Each staircase began at a randomly chosen Δη level from the set of nine levels and then terminated after 14 reversals. 
Our dependent variables were bias α, slope β and derived quantities JND and Weber fraction, which were defined as follows. Each staircase yielded the fraction of trials in which the subject chose the front half in Experiments 1 and 2 or right half in Experiments 3 and 4 as being more dense. We used the Palamedes toolbox (Prins & Kingdom, 2018) to fit a logistic function to these fractions  
\begin{equation}p(x;\alpha ,\beta ) = {1 \over {1 + \exp ( - \beta (x - \alpha ))}}\end{equation}
where x is one of the nine density difference levels Display Formula\(\Delta \eta ,\alpha \) is the bias, and β is the slope.  
We defined the JND of density in the two halves to be value of xα such that x is the 75% threshold for choosing front or right, so Display Formula\({\rm{JND}} \equiv {{\ln(3)} \over \beta }.\) We then measured the density discrimination performance in terms of the following quantity:  
\begin{equation}\tag{3}{{{\rm{JND}}} \over {\eta + {\alpha \over 2}}}\end{equation}
which we will refer to as a Weber fraction. For the front-back task, the denominator Display Formula\(\eta + {\alpha \over 2}\) is the density of the front half at the PSE since, when Display Formula\(\Delta \eta = \alpha \), the front and back half densities are Display Formula\(\eta + {\alpha \over 2}\,\,{\rm{and}}\,\eta - {\alpha \over 2}\) respectively. We used this density of the front half at the PSE as an estimate of the observer's perceived mean density, reasoning that the perceived density should be more reliable for the front half than the back half since the surfaces in the front are more visible. The Weber fraction is then the JND between the densities of the two halves, relative to the perceived mean density. Note that for the left-right task there is no bias and so the perceived mean is identical to the true mean in that case, and so the Weber fraction would simply be Display Formula\({{{\rm{JND}}} \over \eta }\).  
Procedure
Subjects were seated so that their eye position was 70 cm directly in front of the screen, which corresponded to the virtual viewing position used in the rendering. The height of each subject's seat was adjusted so that the subject's eye would be roughly the same height as the center of the screen. We did not use a chin rest, so some slight variability in subject position was allowed. Subjects viewed the stimuli monocularly through the dominant eye. The non-dominant eye was covered with an eye patch. 
In Experiments 1 and 2, subjects responded by pressing the up or down arrow key to choose back or front half as denser, respectively. In Experiments 3 and 4, subjects responded by pressing either the left or right arrow keys to choose the left or right half as denser. Response time was limited to the 4-second duration of the stimulus. If the subject did not respond in some trial, then a random choice was made and a red X would show on the screen. There was a rest period after every 100 trials for as long as the subject wanted. Prior to the experiments, subjects were given a practice session where they viewed 10 trials from each experiment. This was done to ensure they were comfortable with the task and answering within four seconds. Each of the four experiments typically lasted around 15 minutes. For the subjects that ran Experiments 1 to 3, these experiments were run in random order. 
Results
For Experiments 1 and 2, we ran two-way repeated measures analyses of variance (ANOVAs) for bias and Weber fractions to test the effects of the occlusion factor λ, density η, area A, and the sign and type of luminance variation. For Experiments 3 and 4, we ran one-way ANOVAs for each of them rather than two-way ANOVAs that combined them, because these experiments had a different number and different set of subjects. 
For all experiments, we found that the sign of the luminance variation had no effect. Therefore, we do not report this factor. Instead we pool these conditions within each experiment. The lack of effect from sign of depth-luminance covariance is not entirely surprising, since this information specifies the ordinal depth of the two halves from the occlusions (Scaccia & Langer, 2018). Although this information helps to segment the two halves (Schütz, 2012), this segmentation information is roughly the same for the two signs. 
For all the statistical tests, we report exact p values except if p values are very small. A p value smaller than 0.05 is considered to be significant. 
Experiments 1 and 2 (front-back)
Figure 3 shows the mean of the biases α in Experiments 1 and 2 (front-back). There was a strong main effect for the occlusion factor λ, F(2, 174) = 59.11, p < 10−19), namely a back bias (α > 0) for small λ, a near-zero bias for the middle λ, and a front bias (α < 0) for large λ. There was also a main effect for the type of luminance variation, F(1, 174) = 14.64, p < 10−3, with a more negative mean bias for Experiment 2 than Experiment 1, i.e., a greater bias to see the front as denser for Experiment 2 than for Experiment 1. There was also an interaction between the factors of λ and the type of luminance variation, F(2, 174) = 4.28, p = 0.015. 
Figure 3
 
Biases α for human observers in Experiments 1 and 2. The nine plots correspond to the conditions in Table 1. Error bars indicate standard error of the mean. The gray lines indicate the Δη levels in Equation 2, which were the levels used in the experiment. The black lines indicate the Δη levels for which the mean number of pixels of front and back halves are equal. As we discuss later, these levels correspond to thresholds τ of a model observer that compares the image occupancies of the two halves.
Figure 3
 
Biases α for human observers in Experiments 1 and 2. The nine plots correspond to the conditions in Table 1. Error bars indicate standard error of the mean. The gray lines indicate the Δη levels in Equation 2, which were the levels used in the experiment. The black lines indicate the Δη levels for which the mean number of pixels of front and back halves are equal. As we discuss later, these levels correspond to thresholds τ of a model observer that compares the image occupancies of the two halves.
To gain some insight into how the biases depend on the occlusion factor λ, we consider occupancy fractions, that is, the fraction of the pixels that correspond to surfaces in each half. Figure 4a shows mean image occupancy fractions for the nine conditions of Table 1, specifically for the case of uniform density, i.e., Δη = 0. For each experiment and within each condition, there is little variation in these values from scene to scene. The error bars show the standard deviations multiplied by 10 to better illustrate their relative magnitudes. 
Figure 4
 
Mean image occupancy of (a) front and back halves for Experiments 1 and 2, and (b) left and right halves for Experiments 3 and 4, for each of the nine conditions of Table 1 and for equal density in the two halves (Δη = 0). The means are over 10,000 scenes. Occupancy is defined by the number of pixels of the corresponding front, back, left, or right half sphere, divided by the number of pixels occupied by the image projection of the spherical bounding volume. Error bars show the standard deviation (not standard error) multiplied by a factor of 10 to better illustrate their relative magnitudes.
Figure 4
 
Mean image occupancy of (a) front and back halves for Experiments 1 and 2, and (b) left and right halves for Experiments 3 and 4, for each of the nine conditions of Table 1 and for equal density in the two halves (Δη = 0). The means are over 10,000 scenes. Occupancy is defined by the number of pixels of the corresponding front, back, left, or right half sphere, divided by the number of pixels occupied by the image projection of the spherical bounding volume. Error bars show the standard deviation (not standard error) multiplied by a factor of 10 to better illustrate their relative magnitudes.
Within each row of Figure 4a, the means of front and back image occupancy fractions each increase with λ, and the difference between the means increases as well. Another way to view this trend in the image occupancy of the two halves is to let Δη vary instead of just taking Δη = 0, and to ask the question: what would Δη need to be for the two halves to have the same image occupancy? The answer depends on the occlusion factor: for a larger occlusion factor λ, the density difference Δη would need to be more negative. This effect is illustrated in Figure 3 by the black horizontal lines that show the level of density difference Δη that would yield equal image occupancy for the front and back halves for each of the conditions. Note how these Δη values decrease as the occlusion factor λ increases. 
Interestingly, the observer biases α follow a similar trend. However, there is a crucial difference between these two trends, namely the observer biases are shifted relative to the density differences that give equal occupancy. In particular, the observer biases are closer to zero. This implies that observers are not merely judging the relative image occupancy of the two halves or the relative density of visible surfaces in the two halves, but rather they are judging the relative density of surfaces in the scene and taking account of the occlusion effects, albeit with a bias that depends on the amount of occlusion. 
As the ANOVAs showed already, luminance type (Experiment 1 vs. 2) affected bias as well. There was a main effect for this factor and there was also an interaction between luminance type and occlusion factor λ. Such effects are not surprising since the task in Experiment 2 is inherently more difficult than in Experiment 1, as discussed earlier in the Stimuli section, and the task is even more difficult when λ is larger. Observers may change their bias when the task is more difficult, and possibly rely more on perceived image occupancy than on perceived density in that case. 
Figure 5 shows the means of the Weber fractions (recall Equation 3) for all four experiments. For Experiments 1 and 2, the occlusion factor λ effect was near significant, F(1, 174) = 2.78, p = 0.057, with Weber fractions decreasing as λ increased. There was a strong main effect for the type of luminance variation, F(2, 174) = 17.71, p < 10−4), with greater Weber fractions for Experiment 2 (DLC) than Experiment 1 (WB). The greater Weber fractions in Experiment 2 is not surprising because the task in Experiment 2 is inherently more difficult. 
Figure 5
 
Mean Weber fractions for human observers for all four experiments. The nine plots correspond to the conditions in Table 1. Error bars show standard error of the mean.
Figure 5
 
Mean Weber fractions for human observers for all four experiments. The nine plots correspond to the conditions in Table 1. Error bars show standard error of the mean.
To explore the effect of the occlusion factor λ, we examined specific combinations of density η and area A. There was a main effect for changes in density along the main diagonal conditions, F(2, 54) = 6.34, p = 0.003, and again a main effect for the type of luminance (Experiment 1 vs. Experiment 2), F(1, 54) = 7.18, p = 0.009. On the cross diagonal, there was no main effect from changes in area, F(2, 54) = 0.87, p = 0.42, but there was again a main effect for the type of luminance, F(2, 54) = 8.78, p = 0.004. There was also a main effect within columns, F(2, 174) = 8.1, p < 10−3, with lower Weber fraction as we move down each column where density increases and area decreases, as well as type of luminance effect, F(1, 174) = 18.8, p < 10−4. This result is consistent with the result on the main diagonal where density had an effect. We conclude that the near-significant Weber fraction effect of the occlusion factor λ in both Experiments 1 and 2 was primarily due to a strong density effect, not to occlusion per se. We will discuss these results further in the Discussion section. 
Experiments 3 and 4 (left-right)
In Experiments 3 and 4, the task was to discriminate density for the left and right halves (see Figure 2c, d). The left-right biases were all close to zero as expected, and so we do not show them in Figure 3
For the Weber fractions, we ran one-way ANOVAs for Experiments 3 and 4, as they had a different number and different set of subjects. There was no significant effect for the occlusion factor λ, neither for Experiment 3, F(2, 87) = 2.55, p = 0.08, nor for Experiment 4, F(2, 51) = 1.8, p = 0.18. However, there was a strong main effect for density (on the main diagonal where area was fixed and density varied) for both Experiment 3, F(2, 27) = 24.3, p < 10−6, and for Experiment 4, F(2, 15) = 10.9, p = 0.001. There was a weaker but still significant effect of area (on the cross diagonal where density was fixed and area varied) both for Experiment 3, F(2, 27) = 5.24, p = 0.01, and Experiment 4, F(2, 15) = 7.6, p = 0.005. The density and area effects were in opposite direction, with Weber fractions decreasing as density increased, and Weber fractions increasing as area increased. These opposing effects might be the reason why there was no significant effect from the occlusion factor λ, which is the product of density and area. There was also a main effect within columns, both for Experiment 3, F(2, 87) = 14.62, p < 10−5, and Experiment 4, F(2, 51) = 35.5, p < 10−9, namely Weber fractions decrease moving down each column of Table 1 as density increases and area decreases. This is consistent with the density and area results found on the main and cross diagonals, respectively, namely that Weber fractions increased as density increased and as area decreased. 
We compared Weber fractions for Experiment 3 versus Experiment 4 using a t-test. Recall that Experiment 3 had black and white separated in the front and back and Experiment 4 had black and white separated in the left and right halves. We suspected that Experiment 4 would be easier since subjects would not need to disentangle white versus black in each half. However, a one-tailed t-test on the signed differences of the Weber fractions showed no significant difference, t = 0.81, p = 0.22. 
Finally, we compared Weber fractions for Experiments 1 versus 3. This is an interesting and important comparison because the stimuli for the Δη = 0 level in the two experiments were the same. The crucial difference between the two is how observers deal with the more challenging occlusion effects in the front-back task versus the left-right task. In particular, we expected performance to be worse in Experiment 1 since observers need to account for the different image occupancies of the front versus back, which are due to occlusion, whereas in Experiment 3 (and Experiment 4) observers could perform the left versus right task in principle by just comparing the overall image occupancy in the two halves. A one-tailed t-test indeed showed that Weber fractions were much higher for Experiment 1 than Experiment 3 (t = 6.95, p < 10−6). 
Model observers
To gain more insight into the effect of occlusions on the difficulty of the task, we present model observers that only compare the image occupancy of the two halves. We assume that the model observer in each condition is unbiased (α = 0), namely it knows the expected difference in pixel counts in the two halves. We refer to this expected difference in each condition as τ. We define the front-back model observers to respond “front” when the number of front pixels exceeds the number of back pixels by this threshold τ. We computed τ for each condition in advance, namely it was the mean difference in the number of pixels from visible front versus back surfaces over 10,000 scenes for that condition and for the Δη = 0 level. These τ values were computed using the data in Figure 4a, and the density differences that correspond to the τ values are plotted as black lines in Figure 3. Similarly, the left-right model observers compare the number of pixels corresponding to surfaces in the left and right halves. In this case the expected difference for the two halves is zero (τ = 0). 
Before we discuss the performance of the model observers, we examine the data in Figure 4a.Within each of the three columns of Figure 4a, the mean image occupancy of the front half and back half is roughly constant. This is because the expected value of the number of the image occupancy of the front and back depends only on the occlusion factor λ, which is constant within each column. However, the standard deviations of image occupancy are not constant within each column but rather they decrease from the top to bottom, typically by over 30%. The reason is that having a larger number of smaller surfaces drives the pixel counts of front and back surfaces to be closer to their expected values. (A similar effect was described in Figure 3 of Langer & Mannan, 2012.) It follows that the mean difference in the number of pixels of the front and back also will be constant within each column, and the standard deviations of the front-back difference will decrease from top to bottom within each column as well. Similarly, for Figure 4b, the means for left versus right are the same within each column, but again the standard deviation decreases moving down each column, and so the standard deviations of the difference will decrease moving down each column. Based on these observations about the standard deviations, we predicted that the model observer should be more sensitive to density differences moving down each column, both for the front-back task and the left-right task. We will see as follows that this prediction holds. 
Another factor that affects the variation in the visible number of each half is the motion in the stimuli. Recall that in each trial, the stimulus is not just a static image, but rather it is a sequence of images, i.e., the clutter rotates. Having a sequence of images increases the chances that any surface point will be visible at some frame during the sequence. To explore this factor, we compared performance of model observers that used a single frame to model observers that used multiple frames. The idea is that, by averaging the counts of the two halves over multiple frames, the model observer can reduce the variance (or standard deviation) in the pixel counts in the two halves and improve its performance. This is similar to how human observers can perform the task more easily with rotation present than with only a static image, as the rotation creates dynamic occlusions that help the human observer resolve the depth reversal ambiguity. 
We compared the Weber fractions of a model observer that used just one frame corresponding to 0° rotation with a model observer that used three frames, namely, 0° and ±5° rotation about the X-axis. Using a similar method as in Figure 4, we found that the standard deviations for the image occupancies for each half of the sphere and for each condition were reduced by approximately 40% in each condition for three-frame observers relative to the one-frame observers. We predicted that these reduced standard deviations in the image occupancy relative to the fixed differences in the mean image occupancies would decrease the model observer's Weber fraction. 
Although the model observers only count pixels in the two halves in each frame, these counts are precise and they turn out to be sufficient for performing the task quite well. To measure exactly how well, we used a much more refined Δη range for the model observer experiments. We defined density difference levels for the model observer, by dividing each of the human observer's stimulus levels by 10, so the model observer's levels were  
\begin{equation}\Delta \eta = {\eta _f} - {\eta _b} = \left\{ {0, \pm {\eta \over {40}}, \pm {{2\eta } \over {40}}, \pm {{3\eta } \over {40}}, \pm {{4\eta } \over {40}}} \right\}.\end{equation}
 
For each of the conditions of Experiments 1 and 4, we ran model observers on 100 staircases and fit psychometric functions to each staircase. 
Figure 6 shows the mean of the model observer Weber fractions for the front-back task (white bars) and the left-right task (black bars). Two observations can be made, and both follow the predictions above. First, for each column of Figure 6, the Weber fractions indeed decrease as we move down the column. The reason is that when density η is larger and area A is smaller, there is less variability in the number of front pixels and back pixels for each condition and so the observer can detect more reliably whether the difference in front versus back pixel counts is greater than the expected value τ of that difference. Second, the three-frame model observer had lower Weber fractions than the one frame model observer. Thus, although the rotational motion itself (i.e., the image velocities of the surfaces) was not used by the model observer, the motion information did provide a significant benefit to the model observer, namely by providing more samples for comparing pixel counts in the two halves. 
Figure 6
 
Mean Weber fractions for model observers using one frame and three frame for front-back and left-right tasks. The nine plots correspond to the conditions in Table 1. Error bars shows standard error of the mean.
Figure 6
 
Mean Weber fractions for model observers using one frame and three frame for front-back and left-right tasks. The nine plots correspond to the conditions in Table 1. Error bars shows standard error of the mean.
Finally, the model observer also shows a density effect on main diagonal and an area effect on cross diagonal. The model observer has no notion of the individual surfaces in the stimuli. The effects from density and area are due to the occlusion effects just discussed. 
Discussion
Comparison between human and model observers
We have compared the human observer biases to the model observer parameters in the Results section, and so we concentrate our discussion here only on the Weber fractions. For the front-back task, Weber fractions for the human and model observers decreased moving down each column of the 3 × 3 plots. For model observers, this trend was explained by a reduced variability in the pixel counts of front and back. For human observers, the decrease in the Weber fractions was mainly due to density. This density effect has been shown previously by Anobile and colleagues, namely Weber fractions for density decrease as density increases. One might have expected an area effect as well, since increasing the area and holding density fixed creates more occlusions that should make the task more difficult because the deeper surfaces would be less visible. However, we did not find an area effect. The reason may be that varying area has a second occlusion effect that works in the opposite direction. Although occlusions typically resolve the depth reversal ambiguity, they do not always do so: in particular, depth reversals are more likely to occur when the area of elements is small than when their area is large, that is, when there are fewer occlusions. When a depth reversal does occur, it tends to produce an incorrect response and so it follows that the depth reversals tend to produce more incorrect responses when the area of the elements is smaller. As the two occlusion effects work in opposite directions, we would expect these effects to partly cancel out—at least for the front-back task where depth reversals lead to incorrect responses. 
For the left-right task, Weber fractions decreased moving down each column of the 3 × 3 plots, both for model and human observers. For model observers, this trend was explained by a reduced variability in the pixel counts of each half. For human observers, pixel count variability is unlikely to have played a role since it requires a precision in estimating image occupancy of the left and right halves that is presumably far greater than what humans are capable of. Rather, human observers seemed to show effects for both density and area, with a stronger main effect for density than for area. Weber fractions decreased with density, which has been shown before by Anobile and colleagues. Weber fractions increased with area, with density held fixed, presumably because occlusions interfered with the visibility of deeper surfaces, as discussed already. Note that in the left-right task, the depth reversals can also occur when the area of the elements is smaller but these would not lead systematically to incorrect responses, unlike for the front-back task. 
A key difference between the human and model observers is that the model observers have roughly the same Weber fraction for the front-back task as for the left-right task within each of the nine conditions, whereas the human observer Weber fractions are larger for the front-back task than for the left-right task. One reason is presumably that the model observer knows the expected difference in image occupancy for the two halves in each condition, whereas the human observer does not. (Recall that the staircases for all the different conditions were randomly interleaved.) In the front-back tasks, in particular, human observers not only have to assign the bright and dark luminances to the front and the back halves, but they also need to compare their densities to some internal standard that takes account of occlusions. Since there must be some uncertainty in what this internal standard should be on each trial in the front-back task, human observers naturally perform worse in this task than in the left-right task. 
Finally, we saw that the single-frame model observers had higher Weber fractions than the multiple-frame model observers. We did not run our experiment on single-frame stimuli for the human observers. The reason is that the one frame task is too difficult. As one can see by the single-frame examples in Figure 1, even the front-back ordering can sometimes be difficult to discern, especially at the lowest values of the occlusion factor λ
Comparison with previous work
Previous studies used a variety of parameters, including intrinsic parameters such as the density and area of the elements, and extrinsic parameters such as the overall area of the stimulus, the number of elements, and eccentricity. 
With regard to the front versus back biases, our back bias finding with small occlusion factor λ was consistent with the back bias found in studies with depth layers (Schütz, 2012; Tsirlin et al., 2012; Aida et al., 2015). These studies used a range of parameters and typically did not allow occlusions. For example, Schütz (2012) used a 2D density range of 0.25-2 dots/deg2, an N range of about 20 to 150 dots, and a fixed 10° diameter stimulus. Tsirlin et al. (2012) used a fixed density of about 20 dots/deg2, 3,000 dots, and a fixed stimulus size of about 13° × 13°. For our stimuli, the density range was approximately from 1.5 to 6 squares/deg2. Our smallest number of elements was 434 and our largest was 1,737, and the circle bounding our volume had a diameter of 20°. These values are thus in a similar range as studies already mentioned. This suggests that the back bias is robust to differences in densities, number of elements, and size of stimulus, as well as differences to the arrangement of the 3D stimulus (layers or volume), at least when the amount of occlusion is low. 
Regarding Weber fractions, we found them to decrease as density increased for both front-back and left-right human observers. We compare our results to Anobile et al. (2014) who were the first to show how Weber fractions varied with density. Using patches that were centered 13° left and right of fixation and densities ranging from 0.02 to 4 dots/deg2, they found constant Weber fractions for lower densities and decreasing Weber fractions for higher densities, where the switch occurred at about 0.2 dots/deg2, depending on patch area. Anobile et al. (2015) subsequently showed that the switching point from constant to decreasing Weber fractions depends on eccentricity. For example, using centrally-presented patches of diameter 8° and presented in sequence, they found that the switch from constant to decreasing Weber fractions occurred at much higher densities, namely about 2 dots/deg2, and that the switching point decreased when the patches were presented more peripherally. Our stimuli had diameter of 20° and we did not control eye movements, and so they are a mix of central and peripheral presentation. Moreover, while our mean 2D densities varied from 1.5 to 6 elements/deg2, our 2D densities varied within each stimulus, namely greatest at the center of the projected sphere and diminishing to zero at the circular edge. Overall though, our 2D densities and eccentricities were in the range that was similar to where Anobile and colleagues found decreasing Weber fractions as density increased, so our results on decreasing Weber fractions are consistent with their findings. 
As for comparing to previous work on luminance, our stimuli used a combination of bright and dark elements on a gray background, whereas Ross and Burr (2010) used equiluminant dots on a black background. They found that perceived numerosity increases with decreasing luminance, but perceived density does not. There was no obvious comparison to make because of the differences in stimuli between ours and theirs. Tibber et al. (2012) varied the luminance so that one patch had twice the contrast of the other, and they used a gray background. They did not find that such luminance variations affected density sensitivity. Our results were consistent with this finding. 
Conclusion
Our results provide new insights into the perception of density of 3D clutter. Most previous studies ignored occlusion effects, whereas our study addressed occlusion effects directly. The biases that we found depended on the level of occlusion. The bias to see the back as more dense in our low occlusion scenes was consistent with previous findings, which shows that the back bias previously found for overlaid planes with no occlusions extends to volumes with a low occlusion factor λ. The bias crossed over to the front when occlusion was higher. This dependence of bias on the level of occlusion has not been previously reported. We also found that these front and back biases did not depend on the density η and the area A of the elements per se, but rather they depended on the product of these two variables, namely the occlusion factor. Finally, we found that the biases roughly followed the difference in image occupancy: as the occlusion factor increased, there was a greater bias to see the front as more dense, and this bias roughly followed the difference in occupancy of the front and back surfaces. 
We used several cues to help observers perceive the relative depth of surfaces in the clutter in the presence of occlusions. We used rotating 3D clutter rather than static clutter so that the rotation would provide a strong kinetic depth effect, and the occlusions generally resolved the twofold ambiguity in the rotation direction, especially with larger occlusion levels. The motion also provided a cue for density discrimination, namely it provided multiple views of the volume, which revealed more points that were deeper in the volume. The rotation also may have helped observers segment the surfaces into front and back halves since the rotation yields opposite motion directions in front versus back halves. We also used differences in luminance in the two halves to help observers to perceptually segment the front and back halves of the 3D clutter. For the front-back task in particular, we compared a white-black luminance condition versus a depth-luminance covariance condition. Using black and white in the two halves reduced the front bias and also increased the sensitivity to density differences; in particular, it decreased the Weber fractions overall. 
We found differences in Weber fractions for the front-back versus left-right tasks. For the front-back task, Weber fractions decreased when density increased, which is consistent with previous studies. However, the area of the elements of the clutter did not seem to have an effect. We believe this is because increasing the area leads to two competing effects. On the one hand, it creates more occlusions, which makes the task more difficult because the back half is less visible, but on the other hand it also reduces the likelihood of depth reversals and this reduced likelihood of errors makes the task easier. Future work could address the trade-off of these two competing effects. Note that, for the left-right task, a depth reversal would itself not lead to an error, and so the only area effect for the left-right task should be that an increase in area leads to more occlusions, which should make the task more difficult. This would explain why human Weber fractions for the left-right task increased when area increased. 
In sum, we have seen that human observers use a combination of image cues to discriminate density in 3D clutter. The effects we found for bias generalize previous results, and the effects we found for sensitivity (Weber fractions) seem to be consistent with what has been found for 2D stimuli in previous studies, where occlusions were not studied. In 3D clutter the combination of cues is more complicated because the density cues interact with other cues that are present, in particular, the area and the occlusion factor, which is the product of density and area. One interesting topic for future work is how and why the Weber fraction varies with combinations of variables. One could also address these effects with different distributions of clutter and with different viewing conditions as well, for example, with motion parallax and stereo. There may be a subtle interplay between these variables, and it would be interesting to explore observers' strategies to deal with these subtleties in different tasks and over different ranges of scene and viewing parameters. 
Acknowledgments
This research was supported by a doctoral grant awarded to Milena Scaccia from the Fonds de Recherche du Quebec—Nature et Technologies (FRQNT), and an FRQNT team research grant and Natural Sciences and Engineering Research Council of Canada (NSERC) discovery grant to Michael Langer. 
Commercial relationships: none. 
Corresponding author: Milena Scaccia. 
Address: School of Computer Science McGill University Montreal, Quebec, Canada. 
References
Aida, S., Kusano, T., Shimono, K., & Tam, W. J. (2015). Overestimation of the number of elements in a three-dimensional stimulus. Journal of Vision, 15 (9): 23, 1–16, https://doi.org/10.1167/15.9.23. [PubMed] [Article]
Anobile, G., Cicchini, G. M., & Burr, D. C. (2014). Separate mechanisms for perception of numerosity and density. Psychological Science, 25 (1), 265–270.
Anobile, G., Turi, M., Cicchini, G. M., & Burr, D. C. (2015). Mechanisms for perception of numerosity or texture-density are governed by crowding-like effects. Journal of Vision, 15 (5): 4, 1–12, https://doi.org/10.1167/15.5.4. [PubMed] [Article]
Barlow, H. B. (1978). The efficiency of detecting changes of density in random dot patterns. Vision Research, 18 (6), 637–650.
Bell, J., Manson, A., Edwards, M., & Meso, A. I. (2015). Numerosity and density judgments: Biases for area but not for volume. Journal of Vision, 15 (2): 18, 1–14, https://doi.org/10.1167/15.2.18. [PubMed] [Article]
Burr, D. & Ross, J. (2008). A visual sense of number. Current Biology, 18 (6), 425–428.
Cicchini, G. M., Anobile, G., & Burr, D. C. (2016). Spontaneous perception of numerosity in humans. Nature Communications, 7, 12536.
Dakin, S. C., Tibber, M. S., Greenwood, J. A., Kingdom, F. A. A., & Morgan, M. J. (2011). A common visual metric for approximate number and density. Proceedings of the National Academy of Sciences, USA, 108 (49), 19552–19557.
Durgin, F. H. (1995). Texture density adaptation and the perceived numerosity and distribution of texture. Journal of Experimental Psychology: Human Perception and Performance, 21 (1), 149.
Durgin, F. H. (2008). Texture density adaptation and visual number revisited. Current Biology, 18 (18), R855–R856.
French, R. S. (1954). Pattern recognition in the presence of visual noise. Journal of Experimental Psychology, 47 (1), 27.
Goutcher, R. & Wilcox, L. M. (2016). Representation and measurement of stereoscopic volumes. Journal of Vision, 16 (11): 16, 1–17, https://doi.org/10.1167/16.11.16. [PubMed] [Article]
Harris, J. M. (2014). Volume perception: Disparity extraction and depth representation in complex three-dimensional environments. Journal of Vision, 14 (12): 11, 1–16, https://doi.org/10.1167/14.12.11. [PubMed] [Article]
Kaufman, E. L., Lord, M. W., Reese, T. W., & Volkmann, J. (1949). The discrimination of visual number. The American Journal of Psychology, 62 (4), 498–525.
Langer, M. S., & Mannan, F. (2012). Visibility in three-dimensional cluttered scenes. Journal of the Optical Society of America A, 29 (9), 1794–1807.
Langer, M. S., Zheng, H., & Rezvankhah, S. (2016). Depth discrimination from occlusions in 3D clutter. Journal of Vision, 16 (11): 11, 1–18, https://doi.org/10.1167/16.11.11. [PubMed] [Article]
Prins, N. & Kingdom, F. A. (2018). Applying the model-comparison approach to test specific research hypotheses in psychophysical research using the Palamedes toolbox. Frontiers in Psychology, 9, 1250.
Ross, J., & Burr, D. C. (2010). Vision senses number directly. Journal of Vision, 10 (2): 10, 1–8, https://doi.org/10.1167/10.2.10. [PubMed] [Article]
Scaccia, M., & Langer, M. S. (2018). Signs of depth-luminance covariance in 3-D cluttered scenes. Journal of Vision, 18 (3): 5, 1–13, https://doi.org/10.1167/18.3.5. [PubMed] [Article]
Schütz, A. C. (2012). There's more behind it: Perceived depth order biases perceived numerosity/density. Journal of Vision, 12 (12): 9, 1–16, https://doi.org/10.1167/12.12.9. [PubMed] [Article]
Sun, H.-C., Baker, C. L.,Jr., & Kingdom, F. A. A. (2018). Simultaneous density contrast and binocular integration. Journal of Vision, 18 (6): 3, 1–12, https://doi.org/10.1167/18.6.3. [PubMed] [Article]
Taves, E. H. (1941). Two mechanisms for the perception of visual numerousness. Archives of Psychology (Columbia University), 265, 47.
Tibber, M. S., Greenwood, J. A., & Dakin, S. C. (2012). Number and density discrimination rely on a common metric: Similar psychophysical effects of size, contrast, and divided attention. Journal of Vision, 12 (6): 8, 1–19, https://doi.org/10.1167/12.6.8. [PubMed] [Article]
Tsirlin, I., Allison, R. S., & Wilcox, L. M. (2012). Perceptual asymmetry reveals neural substrates underlying stereoscopic transparency. Vision Research, 54, 1–11.
Footnotes
1  Note the range of values of density and areas is larger in the central column, namely a factor of 4 instead of a factor of 2 range. A slightly cleaner design would have had (η = 0.08, A = 0.5, N = 579) and (η = 0.18, A = 0.25, N = 1,158) in the top and bottom elements of the middle column. This would have yielded a gradual change in all the parameters within the top and bottom rows as well.
Figure 1
 
Example stimuli for Experiments 1 and 3 (front-back and left-right, WB) at level Δη = 0. The 3 × 3 layout corresponds to Table 1. The arrows indicate the direction in which the variables A, λ, η increase.
Figure 1
 
Example stimuli for Experiments 1 and 3 (front-back and left-right, WB) at level Δη = 0. The 3 × 3 layout corresponds to Table 1. The arrows indicate the direction in which the variables A, λ, η increase.
Figure 2
 
Examples of stimuli for Experiments 1-4 at levels \(\Delta \eta = \pm {\eta \over 2}, \pm \eta \). In each case, the values of density η and area A correspond to the middle condition in Table 1.
Figure 2
 
Examples of stimuli for Experiments 1-4 at levels \(\Delta \eta = \pm {\eta \over 2}, \pm \eta \). In each case, the values of density η and area A correspond to the middle condition in Table 1.
Figure 3
 
Biases α for human observers in Experiments 1 and 2. The nine plots correspond to the conditions in Table 1. Error bars indicate standard error of the mean. The gray lines indicate the Δη levels in Equation 2, which were the levels used in the experiment. The black lines indicate the Δη levels for which the mean number of pixels of front and back halves are equal. As we discuss later, these levels correspond to thresholds τ of a model observer that compares the image occupancies of the two halves.
Figure 3
 
Biases α for human observers in Experiments 1 and 2. The nine plots correspond to the conditions in Table 1. Error bars indicate standard error of the mean. The gray lines indicate the Δη levels in Equation 2, which were the levels used in the experiment. The black lines indicate the Δη levels for which the mean number of pixels of front and back halves are equal. As we discuss later, these levels correspond to thresholds τ of a model observer that compares the image occupancies of the two halves.
Figure 4
 
Mean image occupancy of (a) front and back halves for Experiments 1 and 2, and (b) left and right halves for Experiments 3 and 4, for each of the nine conditions of Table 1 and for equal density in the two halves (Δη = 0). The means are over 10,000 scenes. Occupancy is defined by the number of pixels of the corresponding front, back, left, or right half sphere, divided by the number of pixels occupied by the image projection of the spherical bounding volume. Error bars show the standard deviation (not standard error) multiplied by a factor of 10 to better illustrate their relative magnitudes.
Figure 4
 
Mean image occupancy of (a) front and back halves for Experiments 1 and 2, and (b) left and right halves for Experiments 3 and 4, for each of the nine conditions of Table 1 and for equal density in the two halves (Δη = 0). The means are over 10,000 scenes. Occupancy is defined by the number of pixels of the corresponding front, back, left, or right half sphere, divided by the number of pixels occupied by the image projection of the spherical bounding volume. Error bars show the standard deviation (not standard error) multiplied by a factor of 10 to better illustrate their relative magnitudes.
Figure 5
 
Mean Weber fractions for human observers for all four experiments. The nine plots correspond to the conditions in Table 1. Error bars show standard error of the mean.
Figure 5
 
Mean Weber fractions for human observers for all four experiments. The nine plots correspond to the conditions in Table 1. Error bars show standard error of the mean.
Figure 6
 
Mean Weber fractions for model observers using one frame and three frame for front-back and left-right tasks. The nine plots correspond to the conditions in Table 1. Error bars shows standard error of the mean.
Figure 6
 
Mean Weber fractions for model observers using one frame and three frame for front-back and left-right tasks. The nine plots correspond to the conditions in Table 1. Error bars shows standard error of the mean.
Table 1
 
The 3 × 3 table shows values of mean density η, area A, and number N of surfaces for stimuli in our experiments. The occlusion factor λ = ηA is constant in each column and increases from left to right. The area A is constant on the main diagonal and increases on the cross diagonal. The mean density η is constant on the cross diagonal and increases on the main diagonal.
Table 1
 
The 3 × 3 table shows values of mean density η, area A, and number N of surfaces for stimuli in our experiments. The occlusion factor λ = ηA is constant in each column and increases from left to right. The area A is constant on the main diagonal and increases on the cross diagonal. The mean density η is constant on the cross diagonal and increases on the main diagonal.
Supplement 1
Supplement 2
Supplement 3
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×