Open Access
Article  |   October 2024
Implied occlusion and subset underestimation contribute to the weak-outnumber-strong numerosity illusion
Author Affiliations
Journal of Vision October 2024, Vol.24, 14. doi:https://doi.org/10.1167/jov.24.11.14
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Eliana G. Dellinger, Katelyn M. Becker, Frank H. Durgin; Implied occlusion and subset underestimation contribute to the weak-outnumber-strong numerosity illusion. Journal of Vision 2024;24(11):14. https://doi.org/10.1167/jov.24.11.14.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Four experimental studies are reported using a total of 712 participants to investigate the basis of a recently reported numerosity illusion called “weak-outnumber-strong” (WOS). In the weak-outnumber-strong illusion, when equal numbers of white and gray dots (e.g., 50 of each) are intermixed against a darker gray background, the gray dots seem much more numerous than the white. Two principles seem to be supported by these new results: 1) Subsets of mixtures are generally underestimated; thus, in mixtures of red and green dots, both sets are underestimated (using a matching task) just as the white dots are in the weak-outnumber-strong illusion, but 2) the gray dots seem to be filled in as if partially occluded by the brighter white dots. This second principle is supported by manipulations of depth perception both by pictorial cues (partial occlusion) and by binocular cues (stereopsis), such that the illusion is abolished when the gray dots are depicted as closer than the white dots, but remains strong when they are depicted as lying behind the white dots. Finally, an online investigation of a prior false-floor hypothesis concerning the effect suggests that manipulations of relative contrast may affect the segmentation process, which produces the visual bias known as subset underestimation.

Introduction
The perception of relative numerosity has long been of interest to vision science (Allik & Tuulmets, 1991; Burgess & Barlow, 1983; Durgin, 1995; Ross & Burr, 2010), with much current interest in the levels of perceptual representation involved in encoding numerosity (Franconeri, Bemis, & Alvarez, 2009), as well as in the question of how numerosity interacts with the perception of magnitudes generally (Aulet & Lourenco, 2021; Dakin, Tibber, Greenwood, & Morgan, 2011; Gebuis & Reynvoet, 2012; Morgan, Raphael, Tibber, & Dakin, 2014), and whether it involves estimation based on sampling (Solomon & Morgan, 2018). Recently, Lei and Reeves (2018) introduced a new and interesting illusion of numerosity to the literature that seems to appeal both to low- and high-level processes: When 50 white and 50 gray dots are scattered against a dark gray background without overlapping, the number of white dots seems, subjectively, to be much fewer than the number of gray dots Lei & Reeves 2018; Lei & Reeves 2022. This illusion can be observed in Figure 1. Lei and Reeves (2018) reported that the effect works for black and gray dots against a light gray background as well, but does not occur when the background gray falls anywhere between the dot brightnesses (i.e., it requires dots of the same polarity, but different contrast). This observation seems quite surprising, and Lei and Reeves (2022) have dubbed it the “weak-outnumber-strong” (WOS) numerosity illusion. 
Figure 1.
 
The four images on the left depict one trial display from each of the four between-subject conditions in experiment 1. Note that the illusion that there are more gray than white dots is quite strong in the intermixed displays in the top two images. All intermixed images contain 50 dots of each color. These images are from trials where the number of dots in the unmixed comparison patch was closest to the mean points of subjective equality (PSEs) (gray = 50, and white, red, and green = 40). The graph on the right shows the mean matches (with 95% confidence intervals) to subsets of 50 in mixed fields with variable single-color fields based on fitting psychometric functions to the data of individual participants.
Figure 1.
 
The four images on the left depict one trial display from each of the four between-subject conditions in experiment 1. Note that the illusion that there are more gray than white dots is quite strong in the intermixed displays in the top two images. All intermixed images contain 50 dots of each color. These images are from trials where the number of dots in the unmixed comparison patch was closest to the mean points of subjective equality (PSEs) (gray = 50, and white, red, and green = 40). The graph on the right shows the mean matches (with 95% confidence intervals) to subsets of 50 in mixed fields with variable single-color fields based on fitting psychometric functions to the data of individual participants.
Using a series of ingenious experiments, Lei and Reeves (2018) have argued that the effect is due to a low-level glitch in the computation of numerosity from contrast energy. Specifically, they argue that the glitch is caused by using one value of contrast (white relative to the gray of the gray dots) to compute the total energy of the white dots, but a different value (contrast relative to the background gray) as the denominator of the division process to extract numerosity, resulting in an underestimation of the white dots. We will call this the false floor hypothesis, and it is founded in the notion that the representation of number is based in a fairly low-level computation of contrast energy that is nonetheless high-level enough to be disrupted by information that segregates the white and gray dots, such as motion or binocular disparity. 
Lei and Reeves (2018) used a method of comparing an unmixed set of dots to a mixed set of 50 white and 50 gray to see whether the white dots were seen as fewer, or the gray dots as more. They found that the gray dots in the mixture were matched correctly (or slightly overestimated), but that the white dots were substantially underestimated. Lei and Reeves (2022) showed that verbal estimates of separate fields of gray or white dots were about the same. Both showed substantial underestimation, as is typical (e.g., Krueger, 1972), but explicit numeric estimates of subsets of white dots among gray dots were lower than those of gray dots among white. One of the more impressive results Lei and Reeves (2018) observed was that it seemed that the WOS illusion was generally stronger when the gray dots were made a little higher in contrast (up to a point), consistent with their theory of the processing glitch. The strength of the WOS illusion makes providing an explanation for the illusion seem quite important for constraining models of human number perception, and the explanation provided by Lei and Reeves (2018) involves both low-level effects of luminance contrast as well as mid-level effects of segmentation. 
Although Lei and Reeves 2018; Lei and Reeves 2022 explicitly seemed to have ruled out several alternative accounts of the effect, in the present work we have reconsidered some of those rejected accounts, and we will propose a tentative two-part explanation of the WOS effect based on these alternatives. On the one hand, we will seek to demonstrate, in our experiment 1, that the reduced estimate of the white dots is a more general consequence of what others have dubbed a subset underestimation effect (Cordes, Goldstein, & Heller, 2014; see also Halberda, Sires, & Feigenson, 2006). Subset underestimation may reflect a partial failure of segmentation, but we will only investigate that here indirectly. In contrast, we will also seek to show that the gray dots are spared this subset underestimation effect because they are, for many people at least, perceived as partly occluded, and thus filled in (Men, Altin, & Schütz, 2023). Kanisza (1979) as well as Michotte, Thines, and Crabbe (1964) have established that filling in of unseen structure is a fundamental perceptual process. The apparently asymmetrical relationship between gray and white dots might be motivated by the probabilistic use of contrast as a relative distance cue in combination with a possible compensation for an anticipated higher likelihood of underestimation of the dimmer (harder to see) set of dots when there is the possibility of partial occlusion as a result of the brighter (and possibly nearer) dots. 
Lei and Reeves (2018) reported that any cue that segregated the gray and white dots, including stereoscopic depth separation, eliminated the WOS. However, our investigations show a different pattern. Specifically, in experiments 2 and 3, we will provide strong evidence suggesting that the apparent difference in numerosity between gray and white dots (i.e., the WOS) is only eliminated when depth cues (from partial occlusions and, separately from binocular stereopsis) suggest that the gray dots are in front, but remain quite strong when those same depth cues suggest that the gray dots are behind the white dots. Previous reports have implicated depth-order as affecting perceived number (Schütz, 2012); here we show this effect can depend on dimmer dots merely appearing to be in the background. Finally, in experiment 4, we sought to replicate critical tests of the role of relative luminance contrast on the WOS. Our data regarding luminance contrast suggest a complex relationship that is compatible with our alternative explanation of the effect. The main implication is that perceived number seems to take potential occlusion into account in a manner consistent with generating an estimate from samples. This result may be due to sensitivity to lowered contrast as a depth cue and to the prevalence of partial occlusion in our typical visual experience of collections, as well as asymmetries in segregation. 
General methods
The seven experiments presented here are organized into four studies that address distinct empirical questions concerning the WOS. Study 1 addresses subset underestimation. Study 2 addresses pictorial depth cues. Study 3 addresses binocular depth information. Study 4 reexamines how variations in luminance contrast affect the WOS. 
Open science: Preregistration and data availability
Five online experiments (1, 2, 4A, 4B, and 4C) and two laboratory experiments (3A and 3B) are reported here. All, except experiment 4A, were preregistered on aspredicted.org, and all preregistrations are available at https://researchbox.org/2182. The complete data of all the experiments, along with analysis files, are available on the Open Science Foundation at: https://osf.io/5c4h8/?view_only=93db86190559472989f63333f6f7e00b
Participant demographics
All participants were tested in the United States. Combined across the 668 online participants (19–79 years old, mean 42) and 44 in-person participants (18–22 years old, mean 19) whose data are included in our analyses, 55% identified as men, 44% as women, and 1% either identified as nonbinary, genderqueer, or preferred not to answer. Overall (allowing for selection of multiple identities), 82% identified as White, 8.5% as Black or of African descent, 7.2% as Asian or of Asian descent, 7.0% as Latino or Hispanic, 1.2% as Native American, and 0.2% as Hawaiian or Pacific Islanders; 1.6% selected none of these identities. 
Number stimuli generation and presentation
For the online experiments, all number stimuli were pregenerated using PsychToolbox (Brainard & Vision, 1997; Kleiner, Brainard, & Pelli, 2007) and MATLAB, and presented online using PsyToolkit software (Stoet 2010; Stoet 2017), with subject recruitment on Mechanical Turk using Cloud Research (formerly TurkPrime; Litman, Robinson, & Abberbock, 2017). Unless specified otherwise (see experiment 2), in these online experiments the dots fields were 300 × 300 pixels, with dots that were 16 pixels in diameter, scattered randomly except for the constraint of a minimum center-to-center distance of 20 pixels to avoid overlap between dots. Stimuli were limited to a 1-second duration or until response. 
For the in-person experiments (3A and 3B), the dot fields were generated on the fly and presented using PsychToolbox and MATLAB. Details of the stimuli are given elsewhere in this article. 
Measurement methods and analysis of number data
Two different types of measurement methods are used to assess perceived number in this article. In both methods, instructions to participants placed an emphasis on the fact that this was a perception task, and that judgments should be based on their perceptual experience. 
  • (1) Subset estimation. In this method, the mixed color images, composed of two different colors of dots, always contained 50 of each color. On each trial, a differently randomized mixed-color image was presented side by side with a single test image that contained a variable number of dots with a color that matched the color of the subset of dots that it was to be compared with. Participants simply indicated with a left button or right button whether the single-color test patch or the same-color subset in the mixed-color image contained more dots. The number of dots in the test patches were predetermined (method of constant stimuli) and represented numbers ranging from 25 to 70 in steps of 5. This method is used in experiments 1, 2, and 4C. It allows for the separate estimation of the perceived number of each of the tested subsets.
  • (2) Subset comparison. In this method, a single experimental stimulus was presented on each trial with a mixture of dots of two different colors. From trial to trial, one of the two colors always had 50 dots, while the other color had a number from 25 to 70 (in steps of 5). On each trial, the participant indicated which of the colors was more numerous. This method only allows for estimation of the relative perceived number of the two subsets defined by color when intermixed. It is the method used in experiments 3A, 3B, 4A, and 4B.
Fitting psychometric data
In both measurement methods, psychometric functions were fit to the choice data. For the subset estimation experiments (experiments 1, 2, and 4C), cumulative normal curves were fit in linear space, with the point of subjective equality (PSE) defined as the 50% point on the curve and the just noticeable differences as the numeric distance from the 50% point to the 75% point. For subset comparison experiments (experiments 3A, 3B, 4A, and 4B), the psychometric functions were fit in log number space, and the PSEs are converted to deviations from accuracy in log space (with the just noticeable differences in log space being interpreted as Weber fractions) to represent the strength of the WOS effect. For these analyses and plots, positive values represent a WOS-consistent bias. Note that the Weber fraction, which is a normalized representation of how variable the judgments are (defined as the just noticeable difference/PSE), was used as a preregistered criterion for data exclusion in all of the experiments, because it measures quality of performance independent of bias. In experiment 4B, where measuring differences in Weber fractions was itself of interest, a relative exclusion criterion (retain the 75% of the data with the lowest Weber fraction in each condition) was preregistered to allow for the comparison of mean Weber fractions across conditions. 
Estimating luminance contrast for online experiments
The luminance contrast of dots can be defined by the deviation in luminance from the background, divided by the background luminance. This is normally called Weber contrast. However, because we will also talk about task variability in terms of Weber fractions, which are an entirely different concept, we have chosen to refer simply to luminance contrast in our exposition, although our method of computing luminance contrast is as described. 
Three of the four studies were conducted online. The gray values used (i.e., RGB specification on a scale of 0 to 255) have different luminance values on different displays, so it is not possible to know the luminance contrasts of the white and gray dots in our displays for each of our online participants. However, because luminance contrast values were of theoretical importance for experiments 4A, 4B, and 4C, we developed a method for estimating the average physical luminance contrast (the ratio of luminance relative to a fixed background gray) for our various gray values across a sample of the population being tested in our experiments. That method was implemented in conjunction with the administration of experiment 4B and is described in the section presenting experiment 4B. 
Study 1: Most subsets are underestimated
Lei and Reeves (2018) focused their theory on explaining why the white dots appear to be fewer than they are in the WOS effect. But maybe their theory is aimed at the wrong question. Perhaps it is really the apparently preserved numerosity of the gray dots that requires explanation. There is past evidence that any subset of dots within a larger collection is underestimated (Cordes et al., 2014). However, that study was limited to numbers lower than 50 and used verbal estimation concerning masked displays, so we sought to measure subset estimation with red and green subsets of dots (in the absence of gray dots) to compare any observed perceptual subset underestimation effect for intermixed red and green dots with that for intermixed gray and white dots when there are 50 of each. Our results help to motivate what we think is the right question: Why are the low-contrast gray dots not underestimated as well? 
Specific methods for experiment 1
Participants
A total of 197 online participants completed experiment 1, of whom 154 participants were retained, based on preregistered criteria of a) no more than 5% skipped trials and b) Weber fractions of less than 0.25. Some online participants provided very poor data during piloting, so we had set this criterion based on pilot data. Payment was $0.75 with a $1.50 bonus for a good performance (i.e., meeting the preregistered inclusion criteria). 
Experimental design
Four between-subject conditions tested the perceived number of dots in a subset. Subset estimation trials representing the four conditions are shown in Figure 1. In two of these conditions, the dots were gray and white, and participants judged either the gray or the white. In the other two conditions, the dots were red and green, and participants judged either the red or the green. On each of 100 randomly ordered trials (10 for each comparison value), participants judged whether the intermixed patch or the comparison patch contained more dots of the target color. The intermixed patch always consisted of 100 total dots (50 of the target color and 50 of the irrelevant color). 
Procedure
Online participants completed 5 practice trials followed by 100 experimental trials. They responded to the side that had more of the target color by pressing the F key for the left image and the J key for the right image. During the practice trials, the patch displays were presented for 1.5 seconds followed by a 5-second period in which participants could enter their key response. During the experimental trials, the comparison displays were presented for only 1 second to prevent counting. Demographic information was collected after the experiment. 
Stimulus specifications
The background gray had an RGB value of 85; the gray dots, 160; and the white dots, 255. Red and green dots had RGB values of [255, 0, 0] and [0, 255, 0], respectively. As shown in Figure 1, the background gray extended beyond both patches. The same 100 unique green/red intermixed patches were used in both the red and green conditions. Likewise, the same 100 unique white/gray intermixed patches were used in both the white and gray conditions. The 100 single-color comparison stimuli for each color condition were generated independently. 
Results of experiment 1
The mean matches for each condition are shown in the right side of Figure 1. Our first preregistered analysis was to ensure that our method replicated the WOS illusion. Indeed, when intermixed, the perceived number of gray dots (mean of 49.9) was higher than the perceived number of white dots (mean of 41.2), Welch's t(66.7) = 6.95, p < 0.0001, Cohen's D = 1.61. These results show that we have replicated the observation of Lei and Reeves (2018), which matches to gray dots were fairly accurate, whereas matches to white dots substantially underestimate the white subset. 
Our second preregistered analysis asked whether the estimates for white dot subsets differed from those for red or green. Because we were predicting a null effect, we used Bayesian analysis of variance (ANOVA) (BayesFactor library) (Morey, Rouder, Jamil, & Morey, 2015), to test for evidence of the null using the default priors. Consistent with the subset-reduction hypothesis, the odds of the null hypothesis were favored 1.70 to 1.00 ± 0.04% over the hypothesis that white, green, or red matches differed from each other. As shown in Figure 1, for all 3 of these colors, approximately 40 dots presented unmixed were perceived as matching 50 dots in a mixed field. 
Conversely, our third preregistered analysis conducted the same Bayesian ANOVA comparing gray matches to red and green, and found that the odds of there being a real underlying difference among these, was favored 370,288 to 1 over the null hypothesis (±0.01%). 
Discussion of study 1
Although our main focus is on the WOS, it is worth noting that we obtained evidence of subset underestimation using unmasked displays and perceptual matching, rather than verbal estimation. In other words, the subset underestimation observed by Cordes et al. (2014) using verbal reports from memory can be shown even with perceptual matching methods, with no memory or explicit verbal coding involved. 
The results of experiment 1 support the hypothesis that the decrease in the perceived number of white dots in the WOS represents the more general case of subset underestimation that applies to both red and green dots as well when they are intermixed. It was the gray dots that seemed to be uniquely unaffected by their subset status. This observation suggests that an explanation of the WOS might need to focus more on why the gray dots seem to be less subject to underestimation rather than why the white dots are underestimated. 
Study 2: Pictorial depth order can toggle the WOS illusion
One possible explanation for why the gray dots show less underestimation than white dots is that dimness is associated with greater viewing distance, which might lead to the implicit filling-in of gray-dot texture “behind” the white dots. To test this hypothesis, we created displays in which the dots of one color were allowed to occlude the dots of the other color partially. In a preregistered between-subject design, we measured separately matches to the white subset and the gray subset for each of the two implied depth orders. We hypothesized that higher estimates would occur for gray dots when some of them were occluded partly by the white dots (consistent with them being farther away) than when they occluded the white dots (inconsistent with them being farther away). 
Specific methods for experiment 2
Participants
A total of 200 new participants were tested. Using the preregistered criteria for data inclusion, 186 participants were retained. The payment system and inclusion criteria replicated that of experiment 1. 
Experimental design
Four between-subject conditions estimated the perceived number of each subset of dots, as in experiment 1. Subset estimation trials from the four conditions are shown in Figure 2 (left). The number of trials, number of dots in each patch, randomization of the side each patch was displayed on, and the nature of the judgments were identical to experiment 1. 
Figure 2.
 
The four images on the left depict trial displays from each of the four conditions of experiment 2. All mixed displays contained 50 dots of each color; single-color images depict the approximate observed points of subjective equality (PSEs) by condition. The mean PSEs and 95% confidence intervals are plotted on the right.
Figure 2.
 
The four images on the left depict trial displays from each of the four conditions of experiment 2. All mixed displays contained 50 dots of each color; single-color images depict the approximate observed points of subjective equality (PSEs) by condition. The mean PSEs and 95% confidence intervals are plotted on the right.
Procedure
Experiment 2 followed the same procedure as experiment 1. 
Stimulus specifications
The dot displays for experiment 2 were similar to the gray and white stimuli in experiment 1, including their gray values. The dots were made slightly larger (20 pixels in diameter), to make partial occlusions clearly visible. To allow overlap to occur only between different colors, within each color set, the dots had a minimum interdot distance of 24 pixels to prevent overlap of same color dots, however, between dot colors, the minimum interdot distance was only 12 pixels, which meant that an overlap of nearly one-half of the diameter of each dot was possible between colors. 
The coordinates used for the 100 unique intermixed patches used for each condition were the same across all four conditions, with the order of color drawing swapped to render either the gray in front of the white or the white in front of the gray. The 100 single color comparison stimuli for each color condition were generated independently. 
Results of experiment 2
The mean matches for each condition are shown in Figure 2. Our first preregistered analysis was done to test the hypothesis that the same gray dots would seem to be fewer when gray dots sometimes occlude the white dots compared with when partial occlusion indicates they are behind the white dots (consistent with the implied occlusion hypothesis for why gray dots appear as more). Indeed, the mean estimation of the gray dots when they were “behind” the white dots (mean of 51.8) was significantly larger than when the gray dots were “in front” of the white dots (mean of 45.6), Welch's t(90.8) = 4.53, p < 0.0001, Cohen's D = 0.95. Our second preregistered analysis asked whether the white dots would appear more numerous when they were partly occluded by gray dots (mean of 45.5) than when the white dots were in front of the gray ones (mean of 42.3), and this too was found, in support of the implied-occlusion hypothesis, Welch's t(90.0) = 3.35, p = 0.0012, Cohen's D = 0.70. 
The final preregistered analysis was simply a confirmation that the WOS was present when the gray dots were in back, so this compared matches to white dots in front to matches to gray dots in back (left two bars in Figure 2). This analysis confirmed that there was a difference between these matches, Welch's t(84.7) = 8.6, p < 0.0001, Cohen's D = 1.22, showing that the WOS is replicated in the conditions where the gray dots are explicitly rendered behind the white dots as signaled by partial occlusions. 
In contrast, an exploratory Bayesian analysis showed that when gray dots were in front of the white dots, the odds favoring the null hypothesis that there was no difference between the matches to the white dots and the gray dots was 4.6 to 1.0 (±0.02%). This null effect is illustrated by the right two bars in Figure 2 (right). Note that, consistent with the subset underestimation hypothesis, both gray and white dots were underestimated in these conditions. 
Discussion of study 2
In this experiment, a simple pictorial depth-order cue of partial occlusion was used to test whether higher estimates of gray than white dots might be due to implied occlusion. After all, it might be easy to overlook partly occluded dots that are dimmer than their occluders, whereas partly occluded objects of high contrast might less often go unnoticed. In support of the implied occlusion interpretation, there was clear evidence of a WOS only when the gray dots were partially occluded by the white dots, with an underestimation of white that is typical of subsets, as was shown in experiment 1. Our results do not support the hypothesis put forward by Lei & Reeves (2018) (and originally by Lei, 2015) that any cue that segregates the two sets of dots seems to abolish the WOS effect. Instead, the segregation effect from partial occlusion only diminished the WOS illusion when it was inconsistent with implied occlusion of the gray dots. It might be argued that partial occlusion was simply an insufficient source of segregation. However, when the gray dots partially occluded the white dots, the perceived numbers of gray and white were the same (i.e., there seems not to be a WOS effect). Because the illusion does not reverse with the implied depth reversal, it seems best to think of the pictorial information specifying gray in front as blocking the illusion (caused by a contrast-based depth cue), rather than as pictorial information about gray in back as producing the illusion. 
The present study, like experiment 1, relied exclusively on subset estimation (matching an unmixed set to a subset), so it might be objected that we have not measured directly the effect of partial occlusion on the original WOS illusion. In study 3, which concerns depth-order specified by binocular disparity, we used the method of subset comparison, in which the two subsets within a single display were compared directly with one another. Our results in that case are convergent with those of the present study, inasmuch as the same asymmetry of depth orders was found, which supports our interpretation that depth segregation, per se, is insufficient to eliminate the WOS. 
Study 3: Binocular depth order can toggle the WOS illusion
If implied occlusion is a general source of the WOS, it might also be supported or contradicted by stereoscopic depth cues. Lei (2015) reported that any stereoscopic depth between the white and gray dots eliminated the WOS. However, he tested a relatively small number of participants (n = 10) and did not confirm that they were aware explicitly of the depth order on each trial. Because of the challenges of stereoscopic testing online, two experiments were undertaken in the lab. In experiment 3A, we manipulated the stereoscopic depth between subjects, so that the depth order of gray and white dots was known and consistent throughout. In experiment 3B we manipulated depth order within-subjects, but we also required that a depth order judgment precede each relative number judgment, so that we could be sure that participants were actively perceiving the depth relationship. Because our stereoscopic field of view was relatively small (<20°) we used the subset comparison method where the numbers of gray and/or white dots were varied within the mixed display and only the relative number of gray and white was judged. 
Specific methods of experiments 3A and 3B
Open science and transparency
Although both experiments 3A and 3B were preregistered, the number of participants recruited and retained for 3A exceeded our preregistered plan. The preregistration for experiment 3B was adhered to. 
Participants
A total of 62 undergraduate students were recruited (for pay or for credit in introductory psychology). Eighteen of these participants were excluded based on preregistered criteria: nine participants had to be excluded because they failed to reliably discriminate relative depth from the stereoscopic displays during a pretest. Nine other participants were excluded because their Weber fractions were too high. In the end, 24 datasets were included in experiment 3A and 20 in experiment 3B. 
Apparatus
A haploscope, made with four first-surface mirrors, was used to present stereoscopic images of dot fields (with left eye and right eye images presented on a single screen, with a 6.35-cm physical separation). All stimuli were randomly intermixed white (31.9 cd/m2) and gray (10.8 cd/m2) dots against a dark gray (2.51 cd/m2) background (measurements made through the mirrors), resulting in (Weber) luminance contrasts of 11.7 (white dots) and 3.3 (gray dots). The effective viewing distance through the stereoscope was 48 cm. 
Stimulus specifications
Stimuli were created on the fly in MatLab using an algorithm like that in experiment 1, except that the dot patches were 362 × 362 pixels (10° × 10° in visual angle), and the dots had a diameter of 18 pixels (0.5°) with a minimum interdot distance of 24 pixels in the left eye image. The initial random selection of dot locations was applied equally to white and gray dots, but all dots in the color that was to be in the foreground of the cyclopean percept were moved 2 pixels (3.3 arcmin) to the left in the corresponding right eye image. This small shift meant that there were never any partial occlusions of dots. The color of the remainder of the screen was the same as the background gray. The dot displays were presented for up to 2.5 seconds or until response. 
Stereoscopic pretest
In both experiments, an initial block of 10 trials was used in which participants had merely to judge the depth order of the colors, with a goal of 8 correct. If the participant got fewer than 8 of the 10 trials correct, 10 additional trials were administered. If there were again fewer than eight correct in this second set, the participant was not run in the main experiment. For experiment 3B, feedback was given (a beep) on each trial during the pretest where an incorrect answer was given. 
Design and procedure for experiment 3A
The stereoscopic depth order of gray and white was manipulated between subjects in the number trials. That is, during the number comparison task, half the participants saw only displays with white dots in the foreground and gray dots in the background, and half saw the opposite. The design included 190 subset comparison trials. In 10 of these trials, there were 50 gray dots and 50 white dots; in 90 trials there were 50 gray dots, but either more or fewer white dots (25, 30, 35, 40, 45, 55, 60, 65, or 70); in 90 trials, there were 50 white dots and the gray dots varied (25, 30, 35, 40, 45, 55, 60, 65, or 70). All 190 of these trials were interleaved randomly. There were five randomly selected practice trials at the outset to accustom participants to the procedure. The trials took approximately 15 minutes to complete. 
Design and procedure for experiment 3B
The relative depth was varied within subjects during the number trials, and participants additionally made a depth order judgment on each trial prior to making a number comparison judgment, to ensure that depth order was registered. They made the depth judgment with the left hand using the F and R keys (front and rear), and then the relative number judgment (more white or gray) with the right hand using the left and right arrow keys. Making the depth response reset the stimulus timer, so that, following the depth response, the dot stimulus remained visible for up to 2.5 seconds or until the relative-number response was made. 
For even-numbered participants, depth judgments referred to the depth of the gray dots and it was the number of gray dots that varied from trial to trial (25 to 70 by 5 seconds); there were always 50 white dots in the dot display. For odd-numbered participants, depth judgments referred to the depth of the white dots and it was the number of white dots that varied from trial to trial. The right arrow key always referred to there being more of the variable dot color, and the left arrow to the color that was, in fact, constantly 50. Participants were not informed of these details of the design. There were 200 trials with 100 for each of the depth orders, randomly interleaved. If the depth order judgment was incorrect on a given trial (which occurred less than 1% of the time on average), the trial was reshuffled with the remaining trials and repeated later. There were five randomly selected practice trials at the outset to accustom participants to the procedure. The trials took approximately 20 minutes to complete. 
Results of experiments 3A and 3B
Experiment 3A
Although the preregistration had called for 20 participants, intersubject variance was greater than anticipated, and all 24 participants who had actually been run successfully were included in the analysis. Two of these participants had been run just before completing the preregistration, and two participants were accidentally scheduled beyond the required number. For each participant, we separately computed a PSE for the trials where there were 50 white dots (and gray varied) and the case where there were 50 gray dots (and white varied). The preregistered ANOVA required that we code each PSE as the log ratio of white to gray dots, which ought to be positive if the WOS effect were present. 
As predicted, there was a significant, between-subject effect of depth order, such that the WOS effect was significantly stronger for participants for whom the white dots were stereoscopically presented in front of the gray dots, mean 0.15, 95% confidence interval 0.09 to 0.20, than for those for whom the gray dots were in front, mean 0.05, 95% confidence intervcal −0.02 to 0.12), F(1, 22) = 5.54, p = 0.028, ges = 0.19. The mean log ratios (WOS) are plotted in Figure 3 (left). Note that a magnitude of 0 corresponds with no illusion and positive values correspond with a standard WOS effect. 
Figure 3.
 
(Top) Stereoscopic stimuli. Fused uncrossed, the left image pair shows the white dots in the foreground, while the right pair shows the gray dots in the foreground (cross-fusing produces the opposite depth orders). All images have 50 white and 50 gray dots. (Bottom) The results of experiment 3A (left), where depth order was varied between participants, and of experiment 3B (right), where depth order was varied within participants and depth-order judgments preceded number comparisons on each trial. The weak-outnumber-strong (WOS) effect is measured as the natural log of the ratio of white to gray dots at the point of subjective equality (PSE) in an intermixed display. Error bars represent 95% confidence intervals.
Figure 3.
 
(Top) Stereoscopic stimuli. Fused uncrossed, the left image pair shows the white dots in the foreground, while the right pair shows the gray dots in the foreground (cross-fusing produces the opposite depth orders). All images have 50 white and 50 gray dots. (Bottom) The results of experiment 3A (left), where depth order was varied between participants, and of experiment 3B (right), where depth order was varied within participants and depth-order judgments preceded number comparisons on each trial. The weak-outnumber-strong (WOS) effect is measured as the natural log of the ratio of white to gray dots at the point of subjective equality (PSE) in an intermixed display. Error bars represent 95% confidence intervals.
From the results of experiment 3A, it seemed possible that either 1) there remained a weak, although unreliable effect even when the gray dots were clearly in front, or 2) depth order was not salient to all participants equally, resulting in high variability across participants. Because including the additional participants failed to adhere to the preregistration, the statistical analyses as reported should be regarded as exploratory. 
Experiment 3B
When the manipulation of depth order was within subjects, and participants had to report the depth order on each trial, the effects of depth order were clearer and stronger. In this case, the preregistered ANOVA confirmed that there was a highly significant effect of depth order, F(1, 18) = 25.1, p < 0.0001, ges = 0.53. As shown in Figure 3 (bottom right), there was a strong WOS effect when the gray dots were in the background, but there was no longer any evidence of a WOS effect when the gray dots were in the foreground. 
An exploratory Bayesian t test was conducted on the WOS scores when gray was specified by stereopsis as being in front. Across the 20 participants, the null hypothesis that there was no WOS was 4.03% ± 0.02% times as likely, given the data, as the hypothesis that there was a WOS in this situation where the possibility of implicit occlusion was contradicted by binocular depth information. 
Discussion of study 3
Clear evidence that the WOS is modulated by the depth order of the gray and white dots was observed in two experiments using binocular depth separation, and no actual occlusion. The modulation is consistent with the implied occlusion account of the apparently greater number of gray dots. That is, the WOS was decreased or eliminated when the gray dots were presented stereoscopically in a nearer depth plane, but was strong when they were in a farther depth plane than the white dots. Thus, stereoscopic depth segregation (like partial occlusion) was sufficient to abolish the effect only when it contradicted the depth order implied by the difference in the brightness of the dots. As in experiment 2, the illusion was not reversed when the white dots were clearly behind the gray dots, so it seems that there remains an asymmetry between white and gray, such that only for dimmer dots does evidence for implied occlusion increase or preserve the apparent number. Compensation for potentially unseen low contrast dots in a perceptual background seems like a reasonable prior in this case. 
Study 4: Varying contrast varies ease of segregation
Experimental studies 1 through 3 suggest that implicit occlusion is what increases the perceived number of gray dots in the WOS illusion. However, it still seems possible that the false floor hypothesis of Lei and Reeves, 2018; Lei and Reeves, 2022 could be viable, but is itself triggered by the perception that the dimmer dots are in the background. It therefore seemed important to replicate the evidence most indicative of the false-floor hypothesis of Lei and Reeves, given that some of our results seem to differ from those described by Lei (2015)
In a first experiment (4A), individual online observers were tested using the subset comparison method with three different gray levels of dots (i.e., in a within-subject comparison) in an attempt to replicate the basic phenomenon. Lei and Reeves (2018) reported that lower contrast gray dots showed weaker WOS effects. Because participants were tested online, we did not know the actual contrast values of the dots on their displays, but we could assume safely that higher gray values represented higher contrasts for each participant. As will be described elsewhere in this article, we did not observe the pattern Lei and Reeves (2018) had observed, but their study had used a blocked design in which the brightness of the gray dots was fixed throughout the block, whereas in our version, trials of various brightnesses (higher and lower luminance contrasts) were interleaved. 
In a second experiment (4B), individual observers were tested on one gray level each, but six different gray levels were tested between participants. Moreover, a method for estimating the luminance contrasts online was also implemented for these 240 participants. This experiment, which used a blocked design, better replicated some aspects of the general patterns found by Lei and Reeves. 
Finally, in experiment 4C, a subset estimation task was used to measure the perceived numbers of white and gray dots for the two representative gray levels to confirm that this effect primarily affected the white dots. This finding was confirmed, showing that we can replicate the observations that Lei and Reeves (2018) used to argue for the false floor effect. However, because of the patterns of observations observed in studies 1 through 3, an alternative theory to the false floor effect seems to be required. We speculate that the basis for the patterns in the present study might be explained by easier segregation of the white dots (and thus less subset underestimation) when the gray dots were quite dim. 
Specific methods of experiments 4A, 4B, and 4C
Open science and transparency
Experiment 4A was exploratory and was not preregistered. Experiments 4B and 4C were preregistered, although the number of participants recruited for 4C accidentally exceeded our preregistered plan. 
Participants
A total of 455 online participants completed experiment 4 (50 completed experiment 4A, 240 completed experiment 4B, and 168 completed experiment 4C). Based on the criteria for inclusion similar to those of experiment 1, 31 of the 50 participants who completed experiment 4A were retained. Preregistered exclusion criteria meant that 180 of the 240 participants who completed experiment 4B were retained, and that 113 of the 168 participants in experiment 4C were retained. 
Design and procedure of experiment 4A
The subset comparison method was used in this experiment, with a within-subject manipulation of relative luminance contrast. This method is similar to that used by Lei and Reeves (2018), except they used a blocked design, whereas here the contrast of the gray dots varied from trial to trial. There were always 50 white dots in the field, and the number of gray dots varied from 25 to 70 in increments of 5. Each participant completed 60 trials (6 for each of the 10 numbers of gray dots) at each of the 3 gray values. The 180 trials were randomized fully. Participants judged whether there were more gray or white dots on each trial using the F and J keys. One-half of the participants were instructed to press F for white and J for gray, and the other one-half were instructed to do the opposite. 
Design and procedure of experiment 4B
A subset comparison task was used, but each participant saw only a single gray level for the gray dots, whereas six different gray levels were tested across participants. Each participant completed 100 trials with the number of white dots fixed at 50 and the number of gray dots varying from 25 to 70 (10 trials for each number). The prerandomized coordinates for the 100 experimental displays were identical across the 6 conditions, so that we could rule out that as a source of random variation. Note that because our interest was on the relationship between 1) the contrast of the gray dots, 2) the resulting PSEs, and 3) their accompanying Weber fractions, our criterion for inclusion in this experiment was the top 75% of participants based on skipping fewer than six trials, and having the lowest Weber fraction (30 of 40 recruited for inclusion). 
Following the number trials in experiment 4B, a procedure sought to estimate the luminance contrasts of the various gray dots used in experiment 4B on each online participant's display (as discussed elsewhere in this article). Although the use of the method was exploratory, it provided evidence concerning the range of actual contrasts tested. 
Design and procedure of experiment 4C
Based on the results of experiment 4B, two gray dot values were selected for use in an experiment using subset estimation. experiment 4C implemented a four-condition, between-subject design to estimate the gray and white subsets separately for each of the two selected gray values using a design essentially identical to that used for experiment 1. 
Stimuli
To vary luminance contrast with a goal of including a very low contrast (∼0.5 Weber contrast) and keeping all the contrasts low relative to white, we established a set of gray values based on measured similarities across several displays in our lab. In all three experiments, white was 255, and the background gray was 95; the gray dot colors used are described elsewhere in this article. 
In experiment 4A, three gray values (115, 145, and 180) that seemed likely to produce contrasts in the desired range were tested (within subjects); there were 60 trials for each gray level. Although we could not know the actual contrasts on our participants monitors, we could be sure that increasing the gray level should normally increase the contrast. A single pregenerated set of 180 unique white/gray intermixed images were used for all participants. Figure 4 shows images from experiment 4B, similar to those used in experiment 4A. 
Figure 4.
 
Illustration of some stimuli from experiment 4B. The background color is 95. The same coordinates are shown for each image for ease of comparison. Similar images were used in experiments 4A and 4C.
Figure 4.
 
Illustration of some stimuli from experiment 4B. The background color is 95. The same coordinates are shown for each image for ease of comparison. Similar images were used in experiments 4A and 4C.
In experiment 4B, we tested six different gray values including two of those tested in experiment 4A. Figure 4 shows images of these stimuli for the six gray levels in experiment 4B. Because we were testing these contrasts online and on different monitors, we also sought to measure contrast on our participants’ monitors using a behavioral test. 
Online estimation of luminance contrast
To estimate the physical luminance contrast of our various gray levels, participants did a brightness matching task. The image used for testing each gray level consisted of 12 dithered patches, which varied in the proportion of black and gray pixels. Each patch was 54 × 54 pixels, and was implicitly divided into eighty-one 6 × 6 squares in which a set number of black and gray pixels were randomly intermixed (e.g., the gray value to be measured made up 16 of the 36 pixels in each 6 × 6 square, with the other 20 pixels being black). The 12 patches, representing 12 different proportions of gray and black were presented in 3 rows of 4. Each patch was separated from the background gray by black boundaries (the background gray value, 95, was that used as the background in the number task). Although we had intended to also test white (255), a gray value of 225 was mistakenly used in the test image, and so this value is shown in Figure 5, in addition to an estimated value for white. Linear extrapolation from the five highest grays measured, linear fit R2 = 0.9998, suggests that the Weber contrast for white (255) would have been approximately 3.37. 
Figure 5.
 
Results of experiments 4A (top left), 4B (top right), 4C (bottom left) and the luminance-matching task to estimate online (Weber) luminance contrast (bottom right). Top left The weak-outnumber-strong (WOS) effect (natural log of the ratio of white to gray dots at the point of subjective equality [PSE]) as a function of gray level of gray dots (experiment 4A), as well as mean Weber fractions (light gray bars). (Top right) WOS effects and Weber fractions in experiment 4B as a function of gray value. (Bottom left) PSEs and Weber fractions in experiment 4C as a function of gray value and dot color in matching a single-color display to (estimating) 50 dots in an intermixed display (as in experiments 1 and 2). (Bottom right) Average estimated (Weber) luminance contrasts with 95% confidence intervals are shown for the six gray values used in experiment 4B, as well as for a lighter gray (225), and an extrapolated value for white (255), based on brightness matching data collected in conjunction with experiment 4B.
Figure 5.
 
Results of experiments 4A (top left), 4B (top right), 4C (bottom left) and the luminance-matching task to estimate online (Weber) luminance contrast (bottom right). Top left The weak-outnumber-strong (WOS) effect (natural log of the ratio of white to gray dots at the point of subjective equality [PSE]) as a function of gray level of gray dots (experiment 4A), as well as mean Weber fractions (light gray bars). (Top right) WOS effects and Weber fractions in experiment 4B as a function of gray value. (Bottom left) PSEs and Weber fractions in experiment 4C as a function of gray value and dot color in matching a single-color display to (estimating) 50 dots in an intermixed display (as in experiments 1 and 2). (Bottom right) Average estimated (Weber) luminance contrasts with 95% confidence intervals are shown for the six gray values used in experiment 4B, as well as for a lighter gray (225), and an extrapolated value for white (255), based on brightness matching data collected in conjunction with experiment 4B.
Participants were asked to squint their eyes (to blur the image) to try to see which patch, when blurred, best matched the background gray. Our purpose here was to measure the ratio of gray and black pixels that match the luminance of the background gray, because this would allow us to compute the luminance contrast of the gray dots on that participants screen. Each participant in experiment 4B was shown seven such images (one for each of the six tested gray levels, as well as one for an even lighter gray). They picked the matching square by letter. This was evidently a difficult task, but we found that if we limited consideration to participants whose Weber fraction on the number task was 0.30 or less (as a proxy for conscientiousness), we retained 124 sets of interpretable data. Excluding judgments that took less than 3 seconds to complete eliminated a few additional data points. The remaining data were well behaved. The average of the number (N) of gray pixels (per 36 pixels) in the selected matches was used to compute an estimate of the mean physical luminance contrast between each gray level and the background gray as (36/N) − 1. The estimated luminance contrasts from the matching data are shown in Figure 5 (bottom right). 
Because each online participants’ viewing display had different characteristics, the present estimates are merely meant to validate the idea that our manipulation of gray levels worked, on average, to produce a set of luminance contrasts that were generally clustered around our intended range. 
Number results of experiments 4A, 4B, and 4C
Experiment 4A
WOS effect scores (log ratios of white to gray dots at PSE) for each of the three gray values of 115, 145, and 180 were computed for each participant. Because we expected that variance would increase with higher gray values, we adopted a more lenient 0.30 cutoff, meaning that participants had to achieve a Weber fraction of 0.3 or less in each of the three types of trial. We used repeated measures ANOVA on the WOS scores to explore whether they differed by gray level, and found evidence that they did, F(1.28, 38.3) = 9.81, p = 0.002 (with Greenhouse–Geisser correction for sphericity). The mean PSEs for each gray level are shown in Figure 5 (top right) with 95% confidence intervals. Exploratory analyses confirmed that the WOS decreased as the gray level increased, F(1, 30) = 9.45, p = 0.004, and that, more specifically, the WOS was smaller for the highest contrast used (180), than for the other two conditions, 115: t(30) = 3.07, p = 0.004, 145: t(30) = 4.40, p < 0.001, but that those two did not differ significantly from each other, t(30) = 0.67, p = 0.51. 
Did the variance of judgments (Weber fractions) increase with luminance contrast, as Lei and Reeves (2018) had observed? An exploratory ANOVA showed that they did, F(1, 30) = 4.42, p = 0.04. The mean Weber fractions are plotted in Figure 5 (top left) alongside the WOS scores. Exploratory comparisons suggested that the Weber fractions were higher for the highest contrast used (180), than for the low contrast, (115), t(30) = 2.10, p = 0.04, and for the medium contrast, (145): t(30) = 3.31, p = 0.002. No other comparisons were significant. Thus, we have replicated that performance at the task became more variable when the contrast of the gray dots was higher, suggesting that our data are well-behaved; however, we did not observe an increased WOS as contrast increased. Instead, the WOS got weaker as contrast increased. 
Experiment 4B
For each online participant, a single gray value for the gray dots was used, and thus a single WOS value was computed. To test for evidence of increasing variability with luminance contrast, our preregistered criteria called for retaining 30 (out of 40) participants at each of the six gray levels, based on having the lowest Weber fractions in the set among those who had skipped no more than 5% of trials overall. The Weber fraction data showed clear evidence of a linear increase of variability with luminance contrast, F(1, 178) = 7.88, p = 0.006, as shown by the Weber fraction data (narrow columns) in Figure 5 (top right), consistent with greater difficulty with segregation. 
The mean WOS for each gray level (wide columns) is shown in Figure 5 (top right), with 95% confidence intervals. The WOS showed no linear trend as a function of luminance contrast, F(1, 178) = 1.75, p = 0.19. An ANOVA with gray level treated as a categorical factor indicated that the WOS did differ as a function of gray level, F(5, 174) = 3.44, p = 0.005. It seems to follow an inverted U function, which might reflect a kind of ideal condition for promoting a depth-order effect. This pattern is similar to that reported by Lei and Reeves (2018), who showed that the WOS increased over three contrast values (consistent with their false floor theory) and plummeted for their greatest contrast value. 
Based on these results, it seemed that the two most promising gray levels for further testing to further replicate their observations seem to be the two gray levels that were also used in experiment 4A: 115 and 145. The Weber fractions for judgments with these two gray levels were the lowest among the 6 levels tested, and the confidence intervals for their means are also the narrowest. Moreover, an exploratory t test confirmed that the mean WOS for the lowest contrast dots (0.11; gray level 115) was significantly lower than the mean WOS for the 145 gray level (0.22), t(57.4) = 3.44, p = 0.001. 
Experiment 4C
Having established a candidate pair of gray values for further online testing, based on the results of experiment 4B, and by separately measuring the perception of white and gray dots in mixtures, we next sought to test whether it was the white dots or the gray dots that were affected by the change in luminance contrast. In fact, as is evident in the bottom left panel of Figure 5, it was the perceived number of white dots that was affected by the contrast of the gray dots, consistent with the observations of Lei and Reeves (2018). Using gray levels where we saw a reliable difference in experiment 4B (115 vs. 145), the apparent number of white dots was indeed lower, mean 38.9, when the contrast was higher than when the contrast was lower, mean 44.4, t(47.2) = 3.56, p = 0.001. Conversely, our second preregistered analysis showed that the matched number of gray dots of low contrast, mean 54.3, did not differ significantly from the matched number of higher contrast gray dots, mean 53.2, t(58.7) = 0.75, p = 0.45. Indeed, a Bayesian t test showed that the odds were 3.0 to 1.0 (±0.01%) in favor of the null hypothesis of no difference between the apparent numbers of gray dots. An exploratory ANOVA on the PSEs confirmed that there was significant interaction between gray level and tested color, F(1, 109) = 4.38, p = 0.039. An exploratory t test also confirmed that the mean matches to the gray dots across both gray values, mean 53.7, significantly exceeded 50, consistent with the occlusion hypothesis, t(62) = 5.14, p < 0.0001. 
An exploratory ANOVA on the Weber fractions showed main effects of both color (white was more variable), F(1, 109) = 7.89, p = 0.006, and contrast (higher contrast was more variable), F(1, 109) = 7.24, p = 0.008, but no interaction. 
Discussion of study 4
In support of their false floor idea, Lei and Reeves (2018) reported data from eight naïve participants on four different contrast levels and seemed to show a linear trend of increased illusion across the lower three levels (although the highest level showed no WOS); however, they also observed increasing variance, suggesting increasing difficulty of segregation, which is an alternative explanation of the increased underestimation of the white dots. A somewhat different pattern of results was observed online when a large sample was tested in a within-subject, interleaved design. Although not tested directly, it seems possible that interleaving different kinds of gray dots from trial to trial encouraged a different perceptual approach. 
Moreover, two preregistered studies using blocked designs showed partial replications of their observations. For two gray levels, at least, with mean luminance contrasts of approximately 0.77 and approximately 1.26, clear evidence of a difference in WOS as a function of contrast was found both in experiment 4B and in experiment 4C. Experiment 4C additionally showed that this effect was indeed due to a change in the perceived number of white dots, as Lei and Reeves supposed. Thus, for these two contrasts, our data are consistent with (i.e., replicate) the observations of Lei and Reeves. 
Because the effect was not observed when these same gray values were tested in experiment 4A, in which the different gray level trials were interleaved randomly, it seems unlikely to be due to a low-level mechanism. It seems more likely that different perceptual strategies emerged in the blocked design, when the relative contrasts were predictable, which was not the case in the interleaved design of 4A. This could mean that the blocked design allows for strategic approaches not available in the interleaved design. Lei and Reeves (2018) used a within-subject blocked design. Specifically, based on differences in the variability of performance (evidence of difficulty with segregating the two sets of dots), we think the effects of relative contrast on the perceived number of white dots may be due to differences in difficulty of segmentation, though these speculations were not tested directly by our methods. 
In particular, the data from experiment 4B suggest that there is not a particularly linear relationship between the luminance contrast of the gray dots and the strength of the illusion, and there may be a plateau over the range of luminance contrasts estimated as 1.3 to 2.0 (although this would require further study). We note that the apparent decrease in the illusion strength when the gray dots have very low luminance contrasts might just as easily be attributed to the ease of segregating the white dots from very dim gray dots, thus decreasing the subset effect. Conversely, as the contrast of the gray dots gets closer to the white, it may be that the gray dots become difficult to segregate, thus restoring the balance of gray and white as measured by subset comparison. In other words, as gray gets closer to white, there may be subset underestimation for both sets, rendering the WOS smaller. This interpretation is only speculative, because we have not yet tested this hypothesis directly. Nonetheless, the fact that red and green dots show subset estimation (subsequent investigation has shown that black and white dots do as well) seems to, again, make the explanatory value of the false floor hypothesis of the underestimation of white dots seems less promising than one based on segmentation itself, which was also a part of Lei and Reeves’ (2018) account. 
General discussion
Lei and Reeves 2018; Lei and Reeves 2022 discovered a new numerosity illusion that they dubbed the weak outnumber the strong illusion. In the WOS illusion, gray dots seem to be more numerous than white dots when both sets of dots are intermixed and presented against a darker background. Lei and Reeves suggested that the WOS illusion was due to a fairly low-level process of contrast energy comparisons, where the gray dots served as a false floor for the evaluation of the white dots. This false floor theory postulates that it is the white dots that are affected by the gray dots, which Lei and Reeves seemed to have confirmed with a startling array of evidence. 
In this article, we have reconsidered that evidence and provided strong evidence for two of the ideas that Lei and Reeves had sought to rule out. First, we noted that the effect of gray dots on white dots was consistent with the subset underestimation effects suggested by the findings of Cordes et al. (2014). Specifically, we find that perceptual matches to subsets of mixtures of green and red dots both show clear evidence of underestimation, similar to those of white dots among grays. Although the reason for this underestimation was not a focus of our investigation, it could be due to simultaneous density contrast (Durgin, 2001; Sun & Baker, 2016), or, more likely, it could be due to some sort of inefficiency in segregation. The fact that varying the luminance contrast of the gray dots does alter the magnitude of the illusion is consistent with the idea that the illusion is maximized when the gray dots are both bright enough to make the white dots reasonably difficult to segregate and also dim enough for the gray dots to be perceived as partially occluded (a kind of a Goldilocks effect: neither too dim nor too bright). Although it remains possible that the false floor effect proposed by Lei and Reeves, if extended to cover the red and green case, could also provide part of the explanation, it is difficult to see how that would apply to mixtures of black and white dots, for example, so that too is a case that needs examination. Moreover, some of the results of the experiments in study 4 cast additional doubt on this false floor interpretation, and it currently seems that segregation and occlusion effects account for the data better. Further research into this effect is certainly warranted. 
The second idea we have tested concerned why it was that the gray dots seemed to be immune to this subset effect. Our hypothesis was one that Lei and Reeves had rejected based on their studies of segregation cues, specifically that the gray dots are perceived potentially as being subject to occlusion by the white dots, resulting in a kind of filling-in of unseen gray dots. In support of this interpretation, we used two different kinds of depth order information to support or oppose the possibility of occlusion in experiments 2 and 3. In both cases, the differential effect was present when the depth-order information was consistent with possible occlusion of some of the dimmer dots, but not when the depth order contradicted the possibility of such occlusion. The reason for the discrepancy between our results and some of those mentioned by Lei and Reeves is not clear, but our studies were preregistered and had a much larger number of participants overall, and the pattern they show generalized across both binocular depth cues (tested in the lab) and pictorial depth cues (tested online). 
Thus, the evidence reported here suggests that explaining the WOS illusion requires two different principles. This first is that there is generally a subset underestimation effect when different colors of dots are intermixed. This replicates an important observation that we speculate may be related to the sharp compression of perceived number in explicit number estimation tasks. This compression could tend to force subsets to appear to be less than half of the superset number, given that 100 dots does not appear to be twice as many as 50 (Portley & Durgin, 2019; Durgin & Portley, 2023). The second principle is that gray dots tend to be saved from the underestimation (or overestimated relative to their apparent number) because they are interpreted as partly occluded and, happily, their true value is restored (or sometimes overestimated). Note that the subset estimates for gray were not perfect in experiment 4C, but were actually greater than the actual number, a trend that was also evident in experiment 2. An explanation that depends on a combination of two hypotheses seems less elegant than the false floor interpretation of Lei and Reeves 2018; Lei and Reeves 2022. However, our data from pictorial and stereoscopic manipulations of apparent depth order strongly support the depth order interpretation as a fundamental component of the illusion. This finding may reflect statistical priors that the visual system has developed to account for occlusion being highly likely in situations involving mixtures of elements which differ in contrast, but not chroma, and when there is no evidence ruling out possible occlusion. Those priors may not apply for easily visible white dots where any occlusion short of full occlusion is likely to be easily detected because of the higher contrast of the white dots, so full occlusions might be contraindicated in that case. 
Although we have not explored it systematically, there is reason to believe there are individual differences in the priors people use when looking at these displays. One of the three authors of this article shows little evidence of experiencing the gray dots as partially occluded in the various tests we have run, whereas the other two authors show it strongly and consistently. Testing large numbers of different participants, both online and in the lab, suggests that the bias is fairly strong at the population level. 
The suggestion that perceived number can involve filling-in behind possible occlusion is consistent with evidence of interactions between area and number (Aulet & Lourenco, 2021; Lourenco & Aulet, 2023), because partial occlusion can lead to the overestimation of the area of a partly occluded object (Kanizsa, 1979). In this sense the WOS might be regarded as another example of an illusion of area or size that affects perceived number. For example, the Muller–Lyer size illusion with variably sized dots replacing the line between the inward and outward arrows affects both number comparison and verbal number estimation (Dormal, Larigaldie, Lefèvre, Pesenti, & Andres, 2018), as does the Ebbinghaus area illusion (Picon, Dramkin, & Odic, 2019). Size aftereffects have also been shown to affect number comparison (Zimmermann & Fink, 2016). 
We used displays modeled on those of Lei and Reeves (2018), without greatly varying the density or number of items, but Dormal et al. (2018), for example, tested size effects on perceived number for numbers between 11 and 20 after calibrating their participants’ estimates. In this range, verbal estimation of 1D number (irregularly spaced items in a line) has been shown to be nonlinear (tending toward underestimation) without calibration (Durgin et al., 2022), just as, without explicit calibration, 2D number (irregularly spaced item in a 2D area) is nonlinear (tending toward underestimation) beyond about 20 items (Portley & Durgin, 2019). Lei and Reeves (2022) have shown the WOS applies to numeric estimation of white subsets with numerosity ranging from 20 to 70 intermixed with 50 gray dots, and also noted a highly compressed estimation function proportional to the square root of N, which is consistent with the data of Portley and Durgin. Halberda et al. (2006) concluded that observers could simultaneously evaluate up to two color-coded subsets and their superset, using up to 35 items in total. The observations made here suggest that subset selection may often be incomplete. 
Some evidence suggests that number may be processed after the segregation of objects or ensembles (Franconeri et al., 2009), as shown, for example, by small reductions in perceived number when connectors are added between some of the objects in the test array (He, Zhang, Zhou, & Chen, 2009). Solomon and Morgan (2018) have shown that number estimation is likely based on sampling from a subset rather than the entire display, and it is possible that the WOS reflects something about how the process of generalizing from samples back to the full population occurs. In any event, the present evidence suggests both that a) segmentation of a salient subset of dots (white, red, or green, for example, in experiment 1) may result in significant perceptual underestimation, but also that b) the perceived number may be estimated from representations that also take the likelihood of possible occlusion into account. 
Conclusions
Using open science methods including preregistration, large N studies, and data sharing, we have provided new evidence concerning the source of a dramatic illusion of number. The evidence presented here supports both 1) the subset underestimation hypothesis for why white (and red and green) dots in mixtures appear fewer than when unmixed, and 2) the implied occlusion hypothesis for why intermixed gray dots appear more numerous than white dots. Both hypotheses in combination appear to be required to explain the patterns of illusion measured in our studies. 
Acknowledgments
The authors thank Adam Reeves for sharing unpublished results with us and acknowledge the voluminous work of Quan Lei in bringing this exciting new illusion to the attention of the scientific community. F. H. Durgin was supported in part by a faculty research grant from Swarthmore College. K. Becker and E. Dellinger were supported by grants from Swarthmore College. 
Ethical statement and author contributions: All of the research conducted here was conducted with consent procedures, and was approved by the local IRB of Swarthmore College. All authors contributed equally to the design, administration and analysis of the experiments described here as well as the production and revision of the manuscript. All approve this version of the article and agree to be accountable concerning the accuracy of the representations herein. We have no competing interests. 
Data accessibility: All data, summary files, and accompanying analysis files are available at the following OSF link: https://osf.io/5c4h8/?view_only=93db86190559472989f63333f6f7e00b
Commercial relationships: none. 
Corresponding author: Frank H. Durgin. 
Email: fdurgin1@swarthmore.edu. 
Address: Department of Psychology, Swarthmore College, 500 College Avenue, Swarthmore, PA 19081, USA. 
References
Allik, J., & Tuulmets, T. (1991). Occupancy model of perceived numerosity. Perception & Psychophysics, 49, 303–314, https://doi.org/10.3758/BF03205986.
Aulet, L. S., & Lourenco, S. F. (2021). Numerosity and cumulative surface area are perceived holistically as integral dimensions. Journal of Experimental Psychology: General, 150(1), 145–156, https://doi.org/10.1037/xge0000874.
Brainard, D. H., & Vision, S. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436, https://doi.org/10.1163/156856897X00357.
Burgess, A. E., & Barlow, H. B. (1983). The efficiency of numerosity discrimination in random dot images. Vision Research, 23(8), 811–820, https://doi.org/10.1016/0042-6989(83)90204-3.
Cordes, S., Goldstein, A. & Heller, E. (2014). Sets within sets: The influence of set membership on numerical estimates. Journal of Experimental Psychology: Human Perception and Performance, 40(1), 94–105, https://doi.org/10.1037/a0034131.
Dakin, S. C., Tibber, M. S., Greenwood, J. A., & Morgan, M. J. (2011). A common visual metric for approximate number and density. Proceedings of the National Academy of Sciences of the United State of America, 108, 19552–19557, https://doi.org/10.1073/pnas.1113195108.
Dormal, V., Larigaldie, N., Lefèvre, N., Pesenti, M., & Andres, M. (2018). Effect of perceived length on numerosity estimation: Evidence from the Müller-Lyer illusion. Quarterly Journal of Experimental Psychology, 71, 2142–2151, https://doi.org/10.1177/1747021817738720.
Durgin, F. H. (1995). Texture density adaptation and the perceived numerosity and distribution of texture. Journal of Experimental Psychology: Human Perception and Performance, 21, 149–169, https://doi.org/10.1037/0096-1523.21.1.149.
Durgin, F. H. (2001). Texture contrast aftereffects are monocular; texture density aftereffects are binocular. Vision Research, 41, 2619–2630, https://doi.org/10.1016/S0042-6989(01)00121-3.
Durgin, F. H., Aubry, E., Balisanyuka-Smith, J. J., & Yavuz, Ç. (2022). Spatial number estimation has a higher linear range than temporal number estimation: Differential affordances for subdivision might help to explain why. Journal of Vision, 22(11), 15, https://doi.org/10.1167/jov.22.11.15.
Durgin, F. H., & Portley, M. (2023). Is the approximate number system capacity limited? Extended display duration does not increase the limits of linear number estimation. Journal of Experimental Psychology: Human Perception and Performance, 49(4), 483–495, https://doi.org/10.1037/xhp0001106.
Franconeri, S. L., Bemis, D. K., & Alvarez, G. A. (2009). Number estimation relies on a set of segmented objects. Cognition, 113(1), 1–13, https://doi.org/10.1016/j.cognition.2009.07.002.
Gebuis, T., & Reynvoet, B. (2012a). The interplay between nonsymbolic number and its continuous visual properties. Journal of Experimental Psychology: General, 141, 642–648, https://doi.org/10.1037/a0026218.
Halberda, J., Sires, S. F., & Feigenson, L. (2006). Multiple spatially overlapping sets can be enumerated in parallel. Psychological Science, 17(7), 572–576, https://doi.org/10.1111/j.1467-9280.2006.01746.x.
He, L., Zhang, J., Zhou, T., & Chen, L. (2009). Connectedness affects dot numerosity judgment: Implications for configural processing. Psychonomic Bulletin & Review, 16(3), 509–517, https://doi.org/10.3758/PBR.16.3.509.
Kanizsa, G. (1979). Organization in vision. New York: Praeger.
Kleiner, M., Brainard, D., & Pelli, D., (2007). What's new in Psychtoolbox-3? Perception, 36 ECVP Abstract Supplement.
Krueger, L. E. (1972). Perceived numerosity. Perception & Psychophysics, 11, 5–9, https://doi.org/10.3758/BF03212674.
Lei, Q. (2015). When the weaker conquer: A contrast-based illusion of visual numerosity and its dependence on segregation. Boston: Northeastern University.
Lei, Q. and Reeves, A., 2018. When the weaker conquer: A contrast-dependent illusion of visual numerosity. Journal of Vision, 18(7), 8, https://doi.org/10.1167/18.7.8.
Lei, Q. and Reeves, A., 2022. Untypical contrast normalization explains the “weak outnumber strong” numerosity illusion. Frontiers in Human Neuroscience, 16, 923072, https://doi.org/10.3389/fnhum.2022.923072.
Litman, L., Robinson, J., & Abberbock, T. (2017). TurkPrime.com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior Research Methods, 49, 433–442, https://doi.org/10.3758/s13428-016-0727-z.
Lourenco, S. F., & Aulet, L. S. (2023). A theory of perceptual number encoding. Psychological Review, 130(1), 155, https://doi.org/10.1037/rev0000380.
Men, H., Altin, A., & Schütz, A. C. (2023). Underestimation of the number of hidden objects. Journal of Vision, 23(2), 1, https://doi.org/10.1167/jov.23.2.1.
Michotte, A., Thines, G., & Crabbe, G. (1964). Les complementes amodaux des structure perceptives. Studia Psychologica. Louvain, Belgium: Publications Universitaires.
Morgan, M. J., Raphael, S., Tibber, M. S., & Dakin, S. C. (2014). A texture-processing model of the ‘visual sense of number’. Proceedings of the Royal Society B: Biological Sciences, 281(1790), 20141137, https://doi.org/10.1098/rspb.2014.1137.
Morey, R. D., Rouder, J. N., Jamil, T., & Morey, M. R. D. (2015). Package ‘BayesFactor’.
Picon, E., Dramkin, D., & Odic, D. (2019). Visual illusions help reveal the primitives of number perception. Journal of Experimental Psychology: General, 148(10), 1675–1687, https://doi.org/10.1037/xge0000553.
Portley, M., & Durgin, F. H. (2019). The second number-estimation elbow: Are visual numbers greater than 20 evaluated differently? Attention, Perception, & Psychophysics, 81, 1512–1521, https://doi.org/10.3758/s13414-019-01804-6.
Ross, J., & Burr, D. C. (2010). Vision senses number directly. Journal of Vision, 10(2), 10, https://doi.org/10.1167/10.2.10.
Schütz, A. C. (2012). There's more behind it: Perceived depth order biases perceived numerosity/density. Journal of Vision, 12(12), 9, https://doi.org/10.1167/12.12.9.
Solomon, J. A., & Morgan, M. J. (2018). Calculation efficiencies for mean numerosity. Psychological Science, 29(11), 1824–1831, https://doi.org/10.1177/0956797618790545.
Stoet, G. (2010). PsyToolkit - A software package for programming psychological experiments using Linux. Behavior Research Methods, 42(4), 1096–1104, https://doi.org/10.3758/BRM.42.4.1096.
Stoet, G. (2017). PsyToolkit: A novel web-based method for running online questionnaires and reaction-time experiments. Teaching of Psychology, 44(1), 24–31, https://doi.org/10.1177/0098628316677643.
Sun, H. C., & Baker, C. L. (2016). Simultaneous density contrast is bidirectional. Journal of Vision, 16(14), 4, https://doi.org/10.1167/16.14.4.
Zimmermann, E., & Fink, G. R. (2016). Numerosity perception after size adaptation. Scientific Reports, 6(1), 1–7, https://doi.org/10.1038/srep32810.
Figure 1.
 
The four images on the left depict one trial display from each of the four between-subject conditions in experiment 1. Note that the illusion that there are more gray than white dots is quite strong in the intermixed displays in the top two images. All intermixed images contain 50 dots of each color. These images are from trials where the number of dots in the unmixed comparison patch was closest to the mean points of subjective equality (PSEs) (gray = 50, and white, red, and green = 40). The graph on the right shows the mean matches (with 95% confidence intervals) to subsets of 50 in mixed fields with variable single-color fields based on fitting psychometric functions to the data of individual participants.
Figure 1.
 
The four images on the left depict one trial display from each of the four between-subject conditions in experiment 1. Note that the illusion that there are more gray than white dots is quite strong in the intermixed displays in the top two images. All intermixed images contain 50 dots of each color. These images are from trials where the number of dots in the unmixed comparison patch was closest to the mean points of subjective equality (PSEs) (gray = 50, and white, red, and green = 40). The graph on the right shows the mean matches (with 95% confidence intervals) to subsets of 50 in mixed fields with variable single-color fields based on fitting psychometric functions to the data of individual participants.
Figure 2.
 
The four images on the left depict trial displays from each of the four conditions of experiment 2. All mixed displays contained 50 dots of each color; single-color images depict the approximate observed points of subjective equality (PSEs) by condition. The mean PSEs and 95% confidence intervals are plotted on the right.
Figure 2.
 
The four images on the left depict trial displays from each of the four conditions of experiment 2. All mixed displays contained 50 dots of each color; single-color images depict the approximate observed points of subjective equality (PSEs) by condition. The mean PSEs and 95% confidence intervals are plotted on the right.
Figure 3.
 
(Top) Stereoscopic stimuli. Fused uncrossed, the left image pair shows the white dots in the foreground, while the right pair shows the gray dots in the foreground (cross-fusing produces the opposite depth orders). All images have 50 white and 50 gray dots. (Bottom) The results of experiment 3A (left), where depth order was varied between participants, and of experiment 3B (right), where depth order was varied within participants and depth-order judgments preceded number comparisons on each trial. The weak-outnumber-strong (WOS) effect is measured as the natural log of the ratio of white to gray dots at the point of subjective equality (PSE) in an intermixed display. Error bars represent 95% confidence intervals.
Figure 3.
 
(Top) Stereoscopic stimuli. Fused uncrossed, the left image pair shows the white dots in the foreground, while the right pair shows the gray dots in the foreground (cross-fusing produces the opposite depth orders). All images have 50 white and 50 gray dots. (Bottom) The results of experiment 3A (left), where depth order was varied between participants, and of experiment 3B (right), where depth order was varied within participants and depth-order judgments preceded number comparisons on each trial. The weak-outnumber-strong (WOS) effect is measured as the natural log of the ratio of white to gray dots at the point of subjective equality (PSE) in an intermixed display. Error bars represent 95% confidence intervals.
Figure 4.
 
Illustration of some stimuli from experiment 4B. The background color is 95. The same coordinates are shown for each image for ease of comparison. Similar images were used in experiments 4A and 4C.
Figure 4.
 
Illustration of some stimuli from experiment 4B. The background color is 95. The same coordinates are shown for each image for ease of comparison. Similar images were used in experiments 4A and 4C.
Figure 5.
 
Results of experiments 4A (top left), 4B (top right), 4C (bottom left) and the luminance-matching task to estimate online (Weber) luminance contrast (bottom right). Top left The weak-outnumber-strong (WOS) effect (natural log of the ratio of white to gray dots at the point of subjective equality [PSE]) as a function of gray level of gray dots (experiment 4A), as well as mean Weber fractions (light gray bars). (Top right) WOS effects and Weber fractions in experiment 4B as a function of gray value. (Bottom left) PSEs and Weber fractions in experiment 4C as a function of gray value and dot color in matching a single-color display to (estimating) 50 dots in an intermixed display (as in experiments 1 and 2). (Bottom right) Average estimated (Weber) luminance contrasts with 95% confidence intervals are shown for the six gray values used in experiment 4B, as well as for a lighter gray (225), and an extrapolated value for white (255), based on brightness matching data collected in conjunction with experiment 4B.
Figure 5.
 
Results of experiments 4A (top left), 4B (top right), 4C (bottom left) and the luminance-matching task to estimate online (Weber) luminance contrast (bottom right). Top left The weak-outnumber-strong (WOS) effect (natural log of the ratio of white to gray dots at the point of subjective equality [PSE]) as a function of gray level of gray dots (experiment 4A), as well as mean Weber fractions (light gray bars). (Top right) WOS effects and Weber fractions in experiment 4B as a function of gray value. (Bottom left) PSEs and Weber fractions in experiment 4C as a function of gray value and dot color in matching a single-color display to (estimating) 50 dots in an intermixed display (as in experiments 1 and 2). (Bottom right) Average estimated (Weber) luminance contrasts with 95% confidence intervals are shown for the six gray values used in experiment 4B, as well as for a lighter gray (225), and an extrapolated value for white (255), based on brightness matching data collected in conjunction with experiment 4B.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×