Open Access
Article  |   October 2023
Perceiving depth and motion in depth from successive occlusion
Author Affiliations
  • Abigail R. I. Lee
    Centre for Vision Research, York University, Toronto, Ontario, Canada
    abigailrilee@googlemail.com
  • Laurie M. Wilcox
    Centre for Vision Research, York University, Toronto, Ontario, Canada
    lwilcox@yorku.ca
  • Robert S. Allison
    Centre for Vision Research, York University, Toronto, Ontario, Canada
    allison@cse.yorku.ca
Journal of Vision October 2023, Vol.23, 2. doi:https://doi.org/10.1167/jov.23.12.2
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Abigail R. I. Lee, Laurie M. Wilcox, Robert S. Allison; Perceiving depth and motion in depth from successive occlusion. Journal of Vision 2023;23(12):2. https://doi.org/10.1167/jov.23.12.2.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Occlusion, or interposition, is one of the strongest and best-known pictorial cues to depth. Furthermore, the successive occlusions of previous objects by newly presented objects produces an impression of increasing depth. Although the perceived motion associated with this illusion has been studied, the depth percept has not. To investigate, participants were presented with two piles of disks with one always static and the other either a static pile or a stacking pile where a new disk was added every 200 ms. We found static piles with equal number of disks appeared equal in height. In contrast, the successive presentation of disks in the stacking condition appeared to enhance the perceived height of the stack—fewer disks were needed to match the static pile. Surprisingly, participants were also more precise when comparing stacking versus static piles of disks. Reversing the stacking by removing rather than adding disks reversed the bias and degraded precision. In follow-up experiments, we used nonoverlapping static and dynamic configurations to show that the effects are not due to simple differences in perceived numerosity. In sum, our results show that successive occlusions generate a greater sense of height than occlusion alone, and we posit that dynamic occlusion may be an underappreciated source of depth information.

Introduction
When we think about how we perceive depth in the physical and virtual environments around us, the first thing that may come to mind is stereopsis, the well-studied phenomenon of depth perception based on the disparity between our two horizontally-offset eyes (Wheatstone, 1838). However, there are also many ways to perceive depth that do not need both eyes. For example, monocular or pictorial depth cues convey a sense of depth through features such as size, shading and shadows, and occlusions. As with binocular disparity, changing some of these pictorial depth cues over time signals motion in depth, for example changing size (looming) (Schiff, Caviness, & Gibson, 1962), or changing shadow position (Kersten, Mamassian, & Knill, 1997). 
Of particular interest in this article are occlusion cues: when a surface “A” overlaps another “B,” the typical percept is that A is closer to us than B (von Helmholtz, 1962). Unlike shadow and size cues, until recently occlusion was not thought to signal motion in depth when changed over time. However, research by Engel, Remus, & Sainath (2006) suggests that occlusions induce a motion in depth illusion when they are sequenced so that successive occluding objects are presented one after another on top of each other, like stacking pancakes on a plate. Engel et al. (2006) demonstrated that this stacking disk illusion could induce a motion aftereffect in a set of sequentially presented overlapping rings. Furthermore, they had participants compare the apparent speed of motion in depth in the illusion to a motion-in-depth stimulus defined by looming and a moving cast shadow. The frequency of reports that the stacking pile was moving faster increased as the rate of disk-stacking increased and decreased as the speed of the simulated motion in depth of the looming disk increased. They conclude from this that occlusion alone can induce a strong percept of motion in depth. 
However, occlusions are primarily considered a source of ordinal depth information—that is they convey the order of surfaces in depth. Particularly in stimulus arrangements where depth information from other sources is limited, or in conflict, there is evidence that observers experience depth order, but not depth magnitude (Andersen & Braunstein, 1983; Kaplan, 1969). Thus in Engel et al. (2006) experiments the motion in depth percept may be due to the successive interposition of elements, but we do not know whether observers experienced an increase in the depth or height of the pile. Here, we take a closer look at the depth perceived in the stacking disk phenomenon, specifically the interaction between occlusion and enumeration of those occlusions on perceived height. In Experiment 1 we did this with a group of experienced psychophysical observers who were unaware of the experimental design, before replicating this study with naïve observers with no prior experience in psychophysical experiments in Experiment 2. In Experiment 3 we evaluated the impact of potential attentional biases toward the stacking (i.e., changing) pile by reversing the stimulus and successively removing disks. In Experiment 4, we removed occlusion cues from our stimuli to investigate whether successively presenting nonoverlapping disks one after the other would influence the perceived number of disks on the screen, and if our other findings could be explained by this phenomenon. Overall, our results show that the stacking disk illusion is not only limited to the perception of motion but also impacts the perceived depth/height of the stacked array. 
Experiment 1
In Experiment 1, we investigated whether successively presenting occluding disks in a way that has previously been shown to induce a motion illusion (Engel et al., 2006) also affected the perceived height of the “pile” of disks. We did this by asking experienced psychophysical observers to compare the height of “Stacking” piles of disks, formed by successive presentation of occluding disks, with a “Static” pile of occluding disks where all the disks appeared simultaneously. 
Methods
Subjects
Eleven experienced psychophysical observers with a known history of conducting and participating in other psychophysical experiments, including one author (the others were naïve with respect to the experimental question), were recruited to participate in Experiment 1. Data from two participants was excluded due to poor performance, with an apparent lapse rate of approximately 30% in the two easiest levels in one or more of the conditions, leading to poor psychometric function fits. This left a total of nine participants (seven female, two male; aged 21–34). In this and the subsequent experiments, all participants gave informed consent before starting the experiments, and all procedures were approved by the York University Office of Research Ethics (in accordance with the Declaration of Helsinki, 2003). 
Materials
All experiments were developed using the PsychoPy Builder (version 2020.1.2) with JavaScript code components and were hosted on Pavlovia (Peirce et al., 2019). The experiments were all completed on the observer's own computer; participants were not permitted to use a phone or tablet. Before the beginning of each experiment, all participants completed a size calibration procedure with an image of a credit card that they had to adjust to the size of a physical credit card. This allowed us to determine the approximate size of their screen and to ensure a computer display was being used. 
The stimuli used consisted of two sets of overlapping disks that appeared to form piles to the left and right of the screen. Examples of the stimuli can be seen in Supplementary Movies 1 and 2. The disks that comprised these piles were 0.25 height units in diameter, meaning they were a quarter of the height, but not the width, of the screen they were presented on. The disks were gray (RGB values of [0,0,0] on a scale of −1 to 1) circles with a five-pixel black (RGB values of [−1, −1, −1]) border presented on a black background with a gray fixation cross that was 0.04 × 0.04 height units, with a line width of two pixels. The position of the disks in each pile was randomized so that each pile was unique, but restricted to ensure a stack of disks was formed: the x and y coordinates for the edge of a given circle in a pile were at most 0.25 height units above or below, or to the left or right, of the center of the half screen on which the stack was presented. This meant that at the extremes of the positioning, two individual circles may just touch and not overlap, but every circle in the pile touched or overlapped a hypothetical central circle. 
Procedure
We assessed two conditions: a Static, and a Static versus Stacking condition. On each trial in the Static condition, two completed static piles of disks were displayed on the screen for 3.4 seconds. After this the screen went blank, and participants were able to respond (see Figure 1). In the Static versus Stacking condition, participants were shown one complete, static pile of disks, and a second pile where disks successively appeared and stacked on top of each other over the course of the trial. A new disk appeared in the Stacking pile every 200 ms until the pile was complete, at which point the stimuli would disappear and participants could make their response (see Figure 1). In both conditions, participants were asked to indicate which pile of disks appeared taller at the end of the presentation. The left/right location of a particular stack was randomized for both conditions. Observers received no feedback following their responses. 
Figure 1.
 
The time course of a trial in the Static versus Stacking (top) and Static (bottom) conditions. In the Static versus Stacking condition, one pile of disks was pre-formed and remained static on the screen for the duration of the trial. The other (stacking) pile of disks built up over the duration of the trial, with a new disk appearing every 200 ms until all disks were present (the location where a new disk would be added is indicated by a dashed outline for explanatory purposes, all disks were identical in the actual displays). At this point, the screen went blank, and participants were able to make their response. In the Static condition, participants viewed two static piles of disks for a total of 3.4 seconds, before the screen went blank and participants could respond.
Figure 1.
 
The time course of a trial in the Static versus Stacking (top) and Static (bottom) conditions. In the Static versus Stacking condition, one pile of disks was pre-formed and remained static on the screen for the duration of the trial. The other (stacking) pile of disks built up over the duration of the trial, with a new disk appearing every 200 ms until all disks were present (the location where a new disk would be added is indicated by a dashed outline for explanatory purposes, all disks were identical in the actual displays). At this point, the screen went blank, and participants were able to make their response. In the Static condition, participants viewed two static piles of disks for a total of 3.4 seconds, before the screen went blank and participants could respond.
In both conditions, the number of disks in each pile was varied trial-by-trial according to the method of constant stimuli. The relative number of disks in each pile ranged from ± 9 disks with a step size of 3. One pile (chosen at random) always contained 20 disks while the other would have 11, 14, 17 or 20 disks. In the Static versus Stacking condition, the duration of the trial depended on the number of disks present in the stacking pile. As a result, the minimum and maximum trial durations were 2.2 and 4 seconds respective for the Static versus Stacking condition. All trials in the Static condition had a duration of 3.4 seconds. The two conditions were tested in two separate blocks of trials, with the blocks presented in a random order for each observer. Each block contained 20 trials per level, for a total of 140 trials per block. 
In all experiments, cumulative normal psychometric functions were fit to the data using MATLAB (R2019a; The MathWorks Inc., 2019) and the Palamedes toolbox (Prins & Kingdom, 2009). All psychometric function fits were based on the number of disks in one pile relative to the other. These functions were used to calculate relative points of subjective equality (PSEs) and just-noticeable differences (JNDs) corresponding to the 75% and 50% points on the psychometric function. In the Stacking conditions, the Static pile was treated as the standard, which meant that the PSE measured the number of additional (or fewer) disks required for the Stacking pile to look equivalent to the Static pile. JASP (Version 0.14.1; JASP Team, 2020) was used to conduct statistical analyses for all experiments. 
Results and discussion
In Experiment 1, we compared relative points of subjective equality and just-noticeable differences for the (i) Static and (ii) Static versus Stacking conditions. Figure 2 shows data and fitted psychometric functions for one observer (a) and the distribution of fits for all observers (b). In all cases, the Static versus Stacking curves were shifted to the right relative to the Static versus Static curves, indicating a bias toward reporting that the Stacking pile was taller. There was a significant difference in mean relative PSE between the two conditions (paired t(8) = −11.41, padjusted < 0.001; p values were adjusted to control false discovery rate across t-tests [Benjamini & Hochberg, 1995]) as illustrated in Figure 3. The Static versus Stacking relative PSE was also significantly different from zero (t(8) = 6.517, padjusted < 0.001). Our experiment was constructed in such a way that in the Static versus Stacking condition, a positive relative PSE indicates a stacking pile looks taller than an equivalent static pile. Thus, participants were strongly biased toward saying that the stacking pile of disks was taller than the static pile. In other words, the positive PSE indicates that (i) a static pile needed to have more disks than a stacking pile to appear to be the same height and (ii) a stacking pile looked higher than a static pile with the same number of disks. 
Figure 2.
 
(a) Sample psychometric function for a representative observer. The proportion of trials where the Static standard was chosen is plotted against the number of disks in the standard pile versus the test pile, either for Stacking (black) or Static (red) test stimuli. Cumulative normal fits are shown in black solid and red dashed lines, respectively. (b) Fitted psychometric functions for each observer (by color). Solid and dashed lines show Stacking and Static test stimuli, respectively.
Figure 2.
 
(a) Sample psychometric function for a representative observer. The proportion of trials where the Static standard was chosen is plotted against the number of disks in the standard pile versus the test pile, either for Stacking (black) or Static (red) test stimuli. Cumulative normal fits are shown in black solid and red dashed lines, respectively. (b) Fitted psychometric functions for each observer (by color). Solid and dashed lines show Stacking and Static test stimuli, respectively.
Figure 3.
 
The mean relative points of subjective equality for experienced observers in the Static and Static versus Stacking conditions of Experiment 1. Error bars are 95% confidence intervals.
Figure 3.
 
The mean relative points of subjective equality for experienced observers in the Static and Static versus Stacking conditions of Experiment 1. Error bars are 95% confidence intervals.
We also found that the relative PSE for the Static condition was slightly, but significantly different from zero (t(8) = −2.974, padjusted = 0.026). Given that the static conditions were essentially indistinguishable, we expected that performance would be at chance (zero). In this case, a relative PSE that is different from zero indicates that participants chose either the pile designated pile 1 or the pile designated pile 2 more than the other, despite both piles being static and the randomized location of the two piles. It is not clear why this occurred, we suspect it may be a spurious effect due to the relatively low number of observers. Additionally, as shown in Figure 4, we found a significant difference in JNDs between the Static and Static versus Stacking conditions (t(8) = 4.613, padjusted = 0.004). These observers were more precise in the Static versus Stacking condition than in the Static condition, although it is not immediately obvious why this would be the case. Given that this was a small group of psychophysically experienced observers we sought to replicate this finding, and our relative PSE results, in a large group of naïve observers in Experiment 2
Figure 4.
 
The mean just-noticeable differences for experienced observers in the Static and Static versus Stacking conditions of Experiment 1. Error bars are 95% confidence intervals.
Figure 4.
 
The mean just-noticeable differences for experienced observers in the Static and Static versus Stacking conditions of Experiment 1. Error bars are 95% confidence intervals.
Experiment 2
In Experiment 1, we found that experienced psychophysical observers appeared to be biased toward saying a Stacking pile of disks was taller than a Static pile of disks. In Experiment 2, we repeated Experiment 1 with a large group of naïve observers to determine if these results were stable and generalizable. 
Methods
Subjects
Sixty-three naïve observers were recruited to participate in Experiment 2 using the online experiment recruitment tool Prolific. Seven observers were excluded for failing a control condition (see Procedure) that ensured participants were not pressing random buttons, 13 participants were excluded for not completing the experiment, and three observers were excluded for poor performance and consequently, poor psychometric function fits. This left a total of 40 observers, with 20 observers taking part in the Static condition (eight female, 12 male; aged 18–65), and 20 in the Static versus Stacking condition (10 female, 10 male; aged 18–56). 
Procedure
The methods used for Experiment 2 were very similar to Experiment 1, with a few notable changes to make the task clearer and easier for naïve observers. To minimize the length of the experiment and the potential for demand characteristics, we used a between-subject design in which participants only completed either the Static or Static versus Stacking condition. 
Before the beginning of the main experiment, participants in Experiment 2 again began with a size-calibration routine to extract the height of their screen. Participants then completed a series of 10 practice trials to give them a feel for the task. These trials consisted of two piles of disks, as in the main experimental conditions, with one pile containing eight disks and the other containing 20 disks. In the Static versus Stacking condition, there were five practice trials with eight disks in the Stacking pile and five practice trials with 20 disks in the Stacking pile. Observers received no feedback on their responses. 
In Experiment 2 we used the same method as in Experiment 1 but the relative number of disks presented was ±12, thus in both conditions, the number of disks in each pile ranged between eight and 20. As in the previous study there was always at least one (randomly selected) pile with 20 disks and the other would have eight, 11, 14, 17, or 20 disks. Again, the duration of the trial depended on the number of disks present in the stacking pile in the Static versus Stacking condition, so the minimum and maximum duration were 1.6 and four seconds, respectively. The duration of the trials in the Static condition was always 3.4 seconds. 
Because of the online nature of the data collection, in addition to the 180 experimental trials, a set of control trials were included in Experiment 2. These were designed to determine if observers were pressing buttons at random and consisted of a single disk in one “pile,” with 20 disks in the other pile. On these trials participants should always choose the 20-disk pile. In the Static versus Stacking control condition, the Stacking pile was always the 20-disk pile. All other features of Experiment 2 were identical to those in Experiment 1
Results and discussion
In Experiment 2, we compared relative points of subjective equality and just-noticeable differences as we did with experienced observers in Experiment 1. However, in this study, we contrasted the results from the group of participants who completed the Static condition to the group that completed the Static versus Stacking condition. 
As in Experiment 1, there was a significant difference between relative PSEs in the Static and Static versus Stacking condition (t(38) =−3.591, padjusted = 0.002; see Figure 5). As in Experiment 1, we found that the relative PSE for the Static versus Stacking condition was significantly different from zero (t(19) = 5.151, padjusted < 0.001). However, unlike in Experiment 1, we found that the Static relative PSE was not significantly different from zero (t(19) = 0.670, padjusted = 0.511). It appears that the significant difference between the Static relative PSE and zero in Experiment 1 was a spurious effect. Replicating Experiment 1, we again found a significant difference in JND between the Static and Static versus Stacking condition in Experiment 2 (t(38) = 2.315, padjusted = 0.034; see Figure 6) with a smaller JND in the latter case. 
Figure 5.
 
The mean relative points of subjective equality for naïve observers in the Static and Static versus Stacking conditions of Experiment 2. Error bars are 95% confidence intervals.
Figure 5.
 
The mean relative points of subjective equality for naïve observers in the Static and Static versus Stacking conditions of Experiment 2. Error bars are 95% confidence intervals.
Figure 6.
 
The mean just-noticeable differences for naïve observers in the Static and Static versus Stacking conditions of Experiment 2. Error bars are 95% confidence intervals.
Figure 6.
 
The mean just-noticeable differences for naïve observers in the Static and Static versus Stacking conditions of Experiment 2. Error bars are 95% confidence intervals.
In summary, in Experiment 2, we replicated the key findings of Experiment 1 with naïve observers: participants were biased toward saying that the Stacking pile of disks was taller, and participants were more precise when comparing Stacking and Static piles of disks than when they were asked to compare two Static piles of disks. However, it remains unclear why precision was enhanced in the condition that included stacking disks. 
An alternative explanation for the bias toward seeing the Stacking pile as taller is that the stacking disks more effectively captured attention than a static pile. If this were the case, then participants may have selected the stacking pile more often simply because they were attending to it and not because the successive presentation of the occluding disks enhanced the perceived height of the pile. We evaluate this possibility in Experiment 3
Experiment 3
In our first two experiments, we found a clear bias toward choosing the stacking pile of disks over a static pile of disks when judging their height. Another potential contributor to this pattern of results is attention. That is, the same bias might be expected if the stacking disks captured more attention and this in turn caused the stacking pile to appear more salient and taller. To evaluate this possibility, in our third experiment we removed disks from a pre-existing stacked “pile” of disks. If the bias toward seeing the stacking pile as taller simply reflects an increase in attention to the changing stimulus, then in this study observers should select the “un-stacking” as taller. However, if the previous results reflect a change in perceived height due to stacking, then the bias should be toward saying the Static pile of disks is taller. 
Methods
Experiment 3 was challenging for naïve observers in pilot testing, so we tested experienced observers through the online portal (N = 9). Data from one participant was excluded because they failed to complete the study, leaving eight observers (five female, three male; aged 19–35). Participants all completed the same size calibration routine to extract the height of the screen as in Experiments 1 and 2 before beginning the experiment. 
In Experiment 3, the circle stimuli which overlapped to form piles were identical to those used in Experiments 1 and 2 but the task differed. Here, participants completed two Static versus Stacking conditions; the first included the number and range of practise trials and the control trials described in Experiment 2 but is referred to here as the Forward condition. In the second Static versus Stacking condition, participants were presented with one complete, static pile of disks, and a second pile of 40 disks where disks successively disappeared every 200 ms over the course of the trial. We refer to this as the Reverse condition. In both cases, participants were again asked to indicate which of the two piles of disks appeared taller at the end of the presentation. 
Trials in the Reverse condition were defined by a seven-level method of constant stimuli. The Static pile of disks always had a fixed 20 disks, while the Stacking pile decreased by between 8 and 32 disks, for −12, −8, −4, 0, 4, 8, 12 disks compared to the Static standard. The duration of each trial varied with the number of disks disappearing from the Stacking pile, with the shortest levels lasting 1.8 seconds (start + 8 increments = 9 × 200 ms) and the longest lasting 6.6 seconds (start + 32 increments × 200 ms). In the Reverse condition, participants also completed two experimental levels that did not have a fixed Static standard, but instead had a fixed duration of 4.2 seconds as a control. In these two levels, there were always 20 disks in the Stacking pile at the end of the trial, and either 14 or 26 disks in the Static pile. Participants also completed a control condition in the Reverse condition, where one disk was presented in the Static pile and eight disks disappeared from the Stacking pile, leaving 32 behind. As described in Experiment 2, in this instance participants should always have chosen the Stacking pile, and if they did not, they were not attending to or understanding the task. The Forward and Reverse conditions were presented in separate blocks in a random order. Participants completed 20 trials per level, giving a total of 200 trials for each of the Forward and Reverse blocks when including the 20 control trials. 
As in Experiment 2, before the main experiment, participants completed a set of practice trials. For the Forward condition, these practice trials were identical to Experiment 2. For the Reverse condition (see Supplementary Movie 3), there were a total of 10 practice trials, which all contained 20 disks in the Static pile. Five of the Stacking piles in the practice trials decreased by eight disks, leaving 32 disks at the end of the trial, and the other five decreased by 32 disks, leaving eight disks at the end of the trial. The duration of the practice trials again varied according to the number of disks being removed, and was 1.8 and 6.6 seconds, respectively. Participants received no feedback in either the practice trials or the experimental trials. 
Results and discussion
In Experiment 3, we compared relative points of subjective equality and just-noticeable differences for Forward and Reverse conditions. We hypothesized that if we observed a bias toward indicating the Static pile of disks was taller in the Reverse condition of Experiment 3, expressed here as a negative shift in relative PSE, this would suggest that the biases observed in Experiments 1 and 2 are due to perceived height, and that the effect on perceived height is reversible. However, if we observed a bias toward choosing the Stacking pile of disks in the Reverse condition Experiment 3, expressed here as a positive shift in relative PSE, it may suggest that the results in Experiment 1 and 2 were not due to the impact of stacking disks on the perceived height of the pile, but instead to factors such as attention that are driven by the stimulus changes. 
As shown in Figure 7, we observed a significant difference between the Forward and Reverse conditions in Experiment 3 (t(7) = 5.510, padjusted = 0.002). For the Forward condition, the mean relative PSE was significantly different from zero (t(7) = 4.331, padjusted = 0.006), with a positive sign. This means that participants were biased toward choosing the stacking pile of disks in the Forward condition, replicating our findings for the Static versus Stacking conditions in Experiments 1 and 2. In the Reverse condition, the mean relative PSE was again significantly different from zero (t(7) = −3.343, padjusted = 0.019), but now with a negative sign; thus in the Reverse condition, participants were biased toward saying the Static pile of disks was taller. The correlation between participants’ forward and reverse PSE values was weak and nonsignificant (r = 0.015). Given that the average relative PSEs for the Forward and Reverse conditions were in opposite directions, we conducted a t-test to assess whether there was a difference in the magnitude of the relative PSE values between the conditions (that is, testing the null hypothesis that the effects were equal but opposite) and found no significant difference (t(7) = 1.470, padjusted = 0.200). This provides further evidence in support of the notion that the effect of stacking disks on perceived height was reversible. 
Figure 7.
 
The mean relative points of subjective equality for experienced observers in the Forward and Reverse conditions of Experiment 3. Error bars are 95% confidence intervals.
Figure 7.
 
The mean relative points of subjective equality for experienced observers in the Forward and Reverse conditions of Experiment 3. Error bars are 95% confidence intervals.
Comparison of the just-noticeable differences between the Forward and Reverse conditions revealed a marginally significant difference between the two (t(7) = −2.408, padjusted = 0.055; see Figure 8). It appears that participants were more precise when completing the Forward condition than the Reverse condition. Informally, some participants reported that occasionally when a disk was removed from the pile it appeared as if a disk were being added, perhaps because a previously occluded disk was made visible. This effect was also observed by the authors, but further investigation would be needed to confirm if this was responsible for the reduced precision in the Reverse condition. 
Figure 8.
 
The mean just-noticeable differences for experienced observers in the Forward and Reverse conditions of Experiment 3. Error bars are 95% confidence intervals.
Figure 8.
 
The mean just-noticeable differences for experienced observers in the Forward and Reverse conditions of Experiment 3. Error bars are 95% confidence intervals.
In Experiment 3, the PSEs indicated that observers perceived the Stacking pile of disks as taller than the equivalent Static piles when disks were added to, but shorter when disks were taken away from, the Stacking pile. This suggests that the effect of successive presentation of occlusions on perceived height is reversible, and that our findings in Experiments 1 and 2 are not a result of a Stacking pile of disks appearing to be more salient than a Static pile. However, it remained unclear from these experiments if these effects were the result of an interaction between the occlusion cues and the successive presentation of the disks, or if the successive presentation of the disks alone could explain our findings. We investigated this possibility in Experiment 4
Experiment 4
In Experiment 4, we investigated whether the effects observed for experienced observers in Experiments 1 and 3 and naïve observers in Experiment 2 could simply be explained by the successive presentation, or enumeration, of the disks, which would make it easier for observers to keep track of the number of disks in a pile. To evaluate this possibility, we removed all occlusion information from our stimuli and, instead of asking participants to make judgements about which pile of disks was taller, we asked them to indicate which side of the screen had a greater number of disks at the end of the presentation. If participants in this study were biased toward choosing the successively presented enumerated disks, the results of the previous experiments might be explained by the successive presentation alone. However, if participants were not biased toward choosing the successively presented disks, then enumeration cannot explain the results of Experiments 1 to 3
Methods
A total of 49 naïve observers recruited through Prolific participated in Experiment 4, with data from eight participants excluded for having incomplete datasets, and from one participant for failing the control condition designed to ensure participants were not pressing random buttons. This left a total of 40 observers, 20 in the Static condition (14 female, six male; aged 18–32), and 20 in the Static versus Successive condition (seven female, 13 male; aged 18–59). As in the previous experiments, participants were required to access the task on a computer. 
The stimuli consisted of two sets of nonoverlapping disks, one on the left-hand side of the screen, the other on the right (see Supplementary Movie 4). The disks were 0.075 height units in diameter, meaning they filled 7.5% of the height, but not the width, of the screen they were presented on. The disks were grey circles with a black border that was 1 pixel wide and were presented on a black background. The screen was divided vertically into two halves by a white center line that was 3 pixels wide. To ensure that the disks did not overlap, each half-screen was divided into a grid with 25 sections. Only one disk could be present in each section but could appear at any randomly determined location within that section. The section a disk would appear in was also randomly determined, to minimize the appearance of stimulus organization while still ensuring the disks did not overlap. These subsections were not visible to the observers. 
Two conditions were tested with separate groups of observers: (i) Static and (ii) Static versus Successive conditions. In the Static condition, participants viewed two complete static sets of disks, one on the left and one on the right half of the screen, for either 1.6 or 2.8 seconds. After this time the screen went blank, and participants were required to make their response. In the Static versus Successive condition, participants were presented with one static set of disks on one half of the screen. On the other “Successive” half of the screen, a new disk would appear every 200 ms until the end of the trial, at which point the stimuli would disappear and the participants could respond (see Figure 9). For both conditions, participants were asked to indicate which side of the screen contained the most disks at the end of the presentation. Whether a particular set of disks appeared on the left or the right side of the screen was randomized for both conditions, so in the Static versus Successive condition, the successively presented disks could appear unpredictably either on the left or right of the screen. 
Figure 9.
 
The time course of a trial in the Static versus Successive (top) and Static (bottom) conditions in Experiment 4. In the Static versus Successive condition, one side of the screen was static for the duration of the trial. The other (successive) side of the screen had new disks added over the duration of the trial (indicated by a dashed outline for explanatory purposes, all disks were identical in the actual displays), with a new disk appearing every 200 ms until all disks were present. At this point, the screen went blank, and participants were able to make their response. In the Static condition, participants viewed two static piles of disks for either 1.6 or 2.8 seconds before the screen went blank and participants could respond.
Figure 9.
 
The time course of a trial in the Static versus Successive (top) and Static (bottom) conditions in Experiment 4. In the Static versus Successive condition, one side of the screen was static for the duration of the trial. The other (successive) side of the screen had new disks added over the duration of the trial (indicated by a dashed outline for explanatory purposes, all disks were identical in the actual displays), with a new disk appearing every 200 ms until all disks were present. At this point, the screen went blank, and participants were able to make their response. In the Static condition, participants viewed two static piles of disks for either 1.6 or 2.8 seconds before the screen went blank and participants could respond.
In both conditions, the number of disks on each half of the screen was determined by a seven-level method of constant stimuli. However, we used two different standards, to give a total of 14 levels. In the Static versus Successive condition, the Successive side of the screen always had either eight or 14 disks at the end of the presentation (eight-disk and 14-disk standards respectively). With the eight-disk standard, the Static half of the screen contained either two, four, six, eight, 10, 12, or 14 disks, so that the Static half of the screen could contain up to six more or six fewer disks than the Successive half of the screen. In the 14-disk standard, the Static half of the screen would contain either eight, 10, 12, 14, 16, 18, or 20 disks, so that the Static half of the screen again could have ± six disks relative to the Successive half of the screen. The same design was used for the Static condition, except in this case, the standard comparison was also Static. Trial durations were 1.6 and 2.8 seconds for the eight- and 14-disk standards, respectively. Trial durations were held constant (for each standard) in this experiment to avoid any influence of duration on numerosity judgements (Javadi & Aichelburg, 2012). 
In addition to the experimental trials, participants completed a set of control trials where one Static disk was presented on one half of the screen and 14 successively presented or static disks were presented on the other half of the screen, depending on the condition. The rationale was that participants should always say that the side with 14 disks has more disks than the side with one disk, and if they did not, then the participant was consistently not attending to the task and should be excluded. Each participant completed either the Static or the Static versus Successive condition but completed both the eight-disk and 14-disk standard trials for their condition. Each condition contained 20 trials per level, giving 300 trials per participant, including the control set, presented in pseudorandom order. 
Before the start of the experiment, participants again completed the size calibration routine used in Experiments 13. Participants then completed a series of 12 practice trials, which consisted of three repeats of the four most extreme conditions for the two disk standards (two disks versus eight-disk standard, 14 disks versus eight-disk standard, eight disks versus 14-disk standard, and 20 disks versus 14-disk standard). Participants received no feedback on their responses during the practice trials or the main experiment. 
Results and discussion
In Experiment 4, as in previous experiments, we estimated both relative PSEs and JNDs for our data. Using a mixed repeated-measures analysis of variance, we found no significant difference in relative PSE between the (i) Static and (ii) Static versus Successive conditions for either the eight-disk or 14-disk standard, main effect of condition: F(1, 38) = 0.401, p = 0.530, ηp2 = 0.010, and no significant difference in relative PSE between the 8-disk and 14-disk standards, F(1, 38) = 1.550, p = 0.221, ηp2 = 0.039 (see Figure 10). There was also no significant interaction between condition and disk standard, F(1, 38) = 0.841, p = 0.365, ηp2 = 0.022. Because our findings here were nonsignificant, we additionally conducted a Bayesian mixed analysis of variance to evaluate the strength of the evidence for the null hypothesis. We used the default prior settings for JASP Version 0.14.1: r scale fixed effects = 0.5, r scale random effects = 1, and r scale covariates = 0.354. The model that best described the data was the null model with a Bayes factor of 3.36, which is considered substantial evidence for the null hypothesis (Jeffreys, 1961; Robert, Chopin, & Rousseau, 2009). Neither condition, disk standard, nor the interaction between the two factors appeared to explain the data. These analyses suggest that successively presenting nonoverlapping disks had no effect on the perceived number of disks, and that this was true regardless of the number of disks used in the trial. 
Figure 10.
 
The mean relative points of subjective equality for naïve observers in Experiment 4. One set of participants completed both the eight-disk and 14-disk standard levels in the Static versus Successive condition, whereas a different set of participants completed both the eight-disk and 14-disk standard levels in the Static condition. Error bars are 95% confidence intervals.
Figure 10.
 
The mean relative points of subjective equality for naïve observers in Experiment 4. One set of participants completed both the eight-disk and 14-disk standard levels in the Static versus Successive condition, whereas a different set of participants completed both the eight-disk and 14-disk standard levels in the Static condition. Error bars are 95% confidence intervals.
However, we did observe a significant difference between the Static and Static versus Successive JNDs, F(1, 38) = 70.055, p < 0.001, η2 = 0.575, and a small but significant difference in JND between the eight-disk and the 14-disk standard, F(1, 38) = 4.637, p = 0.038, η2 = 0.012 (see Figure 11). In contrast, there was no significant interaction between condition and disk standard, F(1, 38) = 0.046, p = 0.831, η2 = 1.21 × 10−4. This suggests that participants were significantly more precise in the Static condition than the Static versus Successive condition and were more precise in the trials with the eight-disk standard, which contained fewer disks. 
Figure 11.
 
The mean just-noticeable differences for naïve observers in Experiment 4. As for the relative PSEs, one set of participants completed both the eight-disk and 14-disk standard levels in the Static versus Successive condition, whereas a different set of participants completed both standard levels in the Static condition. Error bars are 95% confidence intervals.
Figure 11.
 
The mean just-noticeable differences for naïve observers in Experiment 4. As for the relative PSEs, one set of participants completed both the eight-disk and 14-disk standard levels in the Static versus Successive condition, whereas a different set of participants completed both standard levels in the Static condition. Error bars are 95% confidence intervals.
The greater precision in the eight-disk compared to 14-disk standard condition is consistent with previous work on numerosity, where precision is degraded as the number of objects being assessed increases (Dehaene, 2003; Gallistel & Gelman, 2000; Testolin & McClelland, 2020). Additionally, there is evidence that for four to seven or fewer objects, observers are able to subitize, or very rapidly and accurately report the number objects present (Anobile, Cicchini, & Burr, 2016; Kaufman, Lord, Reese, & Volkmann, 1949; Mandler & Shebo, 1982). In the eight-disk standard conditions, several of the test stimuli had six or fewer circles on one half of the screen, potentially allowing for subitizing to improve performance. However, it is important to keep in mind that the difference in JND between the eight-disk and 14-disk conditions was small (a fraction of a disk on average). 
In Experiments 1 and 2, we also observed a significant difference in JNDs between the Static and Static versus Stacking conditions. However, in those experiments, participants were less precise in the Static conditions, whereas here in Experiment 4, participants were more precise in the Static conditions. Thus successive presentation and enumeration of the disks cannot explain the difference in JNDs between conditions in Experiments 1 and 2. Instead, the combination of occlusion and successive presentation likely explains why precision is improved in the Static versus Stacking conditions of Experiments 1 and 2
Similarly, we found no significant difference in PSEs between the Static and Static versus Successive conditions, which suggests that the successive presentation of disks alone does not explain the effects observed in Experiments 1 to 3. Instead, specifically the successive presentation of occluding objects appears to be responsible for the observed effects on perceived height in those experiments. 
General discussion
Occlusion and apparent stack height
In this article we considered the stacking disk illusion, where presenting a set of overlapping disks one after another generates a motion in depth percept. Engel and colleagues (2006) first investigated this illusion with a speed discrimination task and also demonstrated that the stacking disk illusion induces a motion in depth aftereffect. They did not consider whether enumerating a series of occluding disks would also affect perceived height, which we may expect given that occlusion is primarily an ordinal depth cue. In Experiments 1 and 2, we demonstrated that when presenting two piles of disks, one static and the other having new disks added over time, as in the stacking disk illusion, both experienced and naïve participants were biased toward saying the stacking pile of disks was taller. In Experiment 3, we assessed whether these results could be explained by assuming that the stacking disks received more attention. Our results demonstrated that the effect observed in Experiments 1 and 2 was reversible: when disks were successively removed from a pile of disks, participants were biased toward saying the static pile of disks was taller and the stacking pile of disks was shorter. This provides good evidence that the effects in Experiments 1 and 2 are not due to attentional factors. 
In Experiment 4, we investigated whether the effects observed in the previous experiments could be explained exclusively by enumeration of disks. Here we found that when the disks did not occlude each other the previously observed effect of adding new disks no longer biased participants. This finding appears consistent with previous work reporting that changing the speed at which objects are added to an array does not affect perceived numerosity (Hollingsworth, Simmons, Coates, 1991). It is possible that successive presentation or enumeration has no impact at all on the perception of numerosity. Importantly, the effect of successively presenting occluding objects in a pile observed in Experiments 1 to 3 cannot be explained by enumeration alone—the effect is driven by occlusion specifically. 
Precision
In Experiments 1 and 2 we observed that participants were less precise when comparing two Static piles of disks than when they were comparing a Static and Stacking pile of disks. One possible explanation for this is that in the Static pile of disks, one can never see all of the disks present in the pile, whereas in the Stacking pile of disks, one eventually sees all of the disks during the trial interval. In the Static condition, judgements may have then been affected more by the random positions the disks were rendered in, with disks at the top of the pile potentially occluding more disks in one pile than the other, giving the impression of varying height, even when the number of disks in the piles was the same, in turn reducing precision. This argument is also consistent with our finding that precision was lower in the reversed (starting from maximal occlusion) compared to forward (starting from no occlusion) stacking conditions in Experiment 3
Although we prefer this visibility explanation based on its parsimony and consistency with phenomenological impressions, other explanations for improved precision in the dynamic case are possible. For example, the stack size was specified by occlusion; all other cues to three-dimensional layout such as binocular disparity and relative disc size always specified that there was no depth in the stack. Linear cue combination models (Kemp, Cesanek, & Domini, 2023; Landy, Maloney, Johnston, & Young, 1995) predict that these zero cues would be combined linearly with occlusion to produce the percept.1 In models using maximum likelihood estimation (Landy & Kojima, 2001) the overall precision reflects relative cue reliability and precision in our experiment would only change if the precision of the occlusion-based estimate improved under dynamic conditions. In the Intrinsic Constraint model an improvement in precision would reflect the dynamic occlusion signal being “stronger” than the static signal. In both cases the explanation relies on the signal-to-noise ratio of the occlusion-based cue and are thus not discernable in our data. Further they depend solely on the occlusion cue so considering cue conflict adds little to the explanation. A fuller consideration of this possibility would require manipulating presence or reliability of the other cues, for example one could remove stereopsis with monocular viewing of the stimulus. Although this could be explored in future work, these models are unlikely to be able to explain the results given that linear cue combination with zero magnitude cues always predicts a bias toward a smaller stack in the less precise (or weaker) static case and this was not found for our “reverse” condition in Experiment 3
Effects of duration
In Experiments 1 to 3, the duration of the trials varied along with the number of disks being added, whereas in Experiment 4, the duration of the trials was fixed, due to concerns over participants using the length of the trial to make judgements about the numerosity, as has been found previously (Javadi & Aichelburg, 2012). However, participants were still able to perform the task when duration could not be used as a cue in Experiment 4, confirming that participants were likely judging the height or number of disks. This is consistent with other work suggesting that numerosity may affect duration perception, but duration does not appear to affect numerosity judgements (Dormal & Pesenti, 2013). 
Depth and motion in depth
It is somewhat surprising that the stacking stimuli give rise to seemingly quantitative depth or motion in depth. Theoretically, (successive) occlusion is strictly an ordinal depth cue indicating the depth relation between objects (Andersen & Braunstein, 1983; Gibson, 1982; Kaplan, 1969) but not the magnitude of the depth (or change in depth during stacking). Each stacking disk could add an unknown amount to the pile as their thickness is not specified in the stimulus. Engel et al. (2006) participants reliably matched the motion-in-depth to looming stimulus—a quantitative if possibly uncalibrated depth cue—suggesting that the stacking induced a quantifiable motion-in-depth percept. Even though the depth created by each occlusion is not specified, a succession of such occlusions suggests an increase in depth and, if each occlusion is assumed to be a fixed size (quanta), the enumeration of the disks could form a basis for depth judgements on a ratio scale (Stevens, 1958). Assigning a value to the quanta (perceptually or cognitively) might underly the ability of Engel et al. (2006) participants to match reliably to a looming stimulus. But this would not be necessary in our stimuli because the piles could always be compared on number (or an interval scale). So, given that the Stacking and Static piles were comparable, why did the participant respond as if the Stacking pile appeared taller? There are several potential and not necessarily mutually exclusive answers to this question, including (i) attentional bias toward the changing stimulus, (ii) enhancement of the depth by motion in depth, (iii) predictive tracking of the stacking stimulus, and (iv) configural or enumeration effects. As discussed previously, the results of Experiment 3 argue against the role of an attentional bias. 
The second explanation is more tenable. It has been shown that the stacking disk illusion appears to induce a perception of motion in depth (Engel et al., 2006), and that occlusions can apparently affect motion interpretation and bias motion perception (Duncan, Albright, & Stoner, 2000; Graf, Adams, & Lages, 2004; McDermott & Adelson, 2004). Given this, it is possible that this interaction is reciprocal and that the motion percept generated by the stacking disk illusion in turn influences depth perception. Sakano, Allison, and Howard (2012) reported that the average depth of stereoscopically moving planes appeared biased in the direction of motion in depth. Earlier, Edwards and Badcock (2003) reported that looming optic flow can bias depth perception in the direction of simulated motion in depth. Thus it is possible that the motion in depth in the stacking stimulus biases the perceived depth in the direction of the motion consistent with the pattern of results for both forward and reverse stacking found here. However, it should be noted that the depth effects in our studies are very large (bias of approximately 25% of the standard pile in Experiment 3), which may argue against these being secondary percepts to motion in depth. Furthermore, the converse may be true, and the impression of motion in depth may derive from a successive impression of increasing depths. Our data do not speak to this, but the strongest evidence against this suggestion are the aftereffect findings of Engel et al. (2006)
The third explanation suggests that opposite biases in the forward and backward stacking piles arise from extrapolation or prediction of the dynamic sequence. According to this account, the visual system anticipates the future state of the pile, and this future state is compared to the static pile. Because the future state will be larger (forward) or smaller (backward) than the current pile, this will predict the direction of biases found (see Figure 12). Such effects are commonplace in perception and particularly in the motion perception literature (Perrinet & Masson, 2012; Weiss, Simoncelli, & Adelson, 2002). For example, in the Flash-Grab effect (Cavanagh & Anstis, 2013; Takao, Sarodo, Anstis, Watanabe, & Cavanagh, 2022) the path of an oscillating object appears shorter than it actually is and the position of a flashed target near the peak of the oscillation is biased in the same direction (toward the perceived endpoint). Similarly in the Flash-Lag effect, a flashed object seems to trail behind the trajectory of a moving object. Although the mechanism for the effect remains debatable it is generally interpreted as an effectively shorter latency for the moving stimuli than the flashed object because of the presence of predictive motion in the former (for review, see Nishida, 2011). There is evidence that the basis for motion anticipation is at early stages of visual processing, for instance in the pooling of ganglion cell activity (Berry, Brivanlou, Jordan, & Meister, 1999). Although the Berry et al. (1999) experiments were performed using salamanders and rabbits, the mechanisms they describe (e.g., contrast gain control) are common across species. This is of interest here because, as noted by Gegenfurtner (1999) in his commentary, motion anticipation phenomena depend strongly on stimulus contrast (unlike other motion processing). In future experiments one way to test the role of predictive motion in the stacking disks percepts is to assess its dependence on contrast. Another expectation for a simple prediction-based model is that it will tend to exhibit overshoot errors when the pile stops stacking (or restarts) as shown in the inset of Figure 12. Interestingly such predictions have not always been borne out for the flash-lag effect for some variants of the stimulus and this has proven important for modeling the flash-lag phenomena (Eagleman & Sejnowski, 2007; Kanai, Sheth, & Shimojo, 2004; Nijhawan, 2002). It would be interesting to measure the time course of the effects of initiation, termination, and reversal of stacking on the perceived height of the stack in the stimuli used in the present study.2 
Figure 12.
 
Simulated psychometric functions approximating those found in our observers obtained from a Markov Chain based discrete Bayesian model. Simulations for static, forward, and backward motion with a prediction interval of four steps were compared to a model run on a static presentation. The perception of the number of discs was modeled with a Beta distribution, Beta(7,3,0,n), which accounted approximately for the number of completely hidden discs when n discs were presented. Perception of the change followed a Bernoulli process with high likelihood (p = 0.99) of detecting the new disc. The update process model assumed a high likelihood for the change state to continue on the next step (p = 0.9) and a triangular distribution for process noise peaking on the predicted number of discs on the next step. Priors were set as Uniform(0, 31) for the number of disks and as the uninformative (“Jeffreys”) prior for the change state, Beta(0.5, 0.5). The model was simplified based on the facts that only a finite discrete set of discs could be visible [0, 31] and change per step was discrete [0, 1]. These stimulus constraints allowed the Bayesian posterior distribution to be evaluated exhaustively on each step. The inset shows prediction error averaged over 500 runs for a four-sample prediction as a function of step number in a sequence that alternated between increasing (filled black symbols) and static phases (no symbol). Note the increase in error accompanying these changes in state.
Figure 12.
 
Simulated psychometric functions approximating those found in our observers obtained from a Markov Chain based discrete Bayesian model. Simulations for static, forward, and backward motion with a prediction interval of four steps were compared to a model run on a static presentation. The perception of the number of discs was modeled with a Beta distribution, Beta(7,3,0,n), which accounted approximately for the number of completely hidden discs when n discs were presented. Perception of the change followed a Bernoulli process with high likelihood (p = 0.99) of detecting the new disc. The update process model assumed a high likelihood for the change state to continue on the next step (p = 0.9) and a triangular distribution for process noise peaking on the predicted number of discs on the next step. Priors were set as Uniform(0, 31) for the number of disks and as the uninformative (“Jeffreys”) prior for the change state, Beta(0.5, 0.5). The model was simplified based on the facts that only a finite discrete set of discs could be visible [0, 31] and change per step was discrete [0, 1]. These stimulus constraints allowed the Bayesian posterior distribution to be evaluated exhaustively on each step. The inset shows prediction error averaged over 500 runs for a four-sample prediction as a function of step number in a sequence that alternated between increasing (filled black symbols) and static phases (no symbol). Note the increase in error accompanying these changes in state.
Finally, there may be configural or contextual effects that impact the perceived height of either the sequence of stacking or the stacked disks themselves. For example, Boyce and Clifford (2023) have recently reported a “compacting” illusion where the length of individual line elements appear smaller when concatenated to form a longer line than when seen in isolation or in shorter lines. It is possible a similar effect may be at play here “compacting” the depths between the stacked disks relative to the depth increment for each successive disk. One would need a mechanism for the isolation of the stacking disks to prevent their grouping and compaction, presumably related to the saliency of the sudden appearance of the disk. Other configuration effects can be imagined. In Experiment 4, we discounted an explanation based on differences in the ability to enumerate dynamically and statically presented objects. We found no bias when the disks were nonoverlapping, suggesting that the bias found in the stacking disks requires both dynamic presentation and occlusion. 
In conclusion, we find that when disks are stacked sequentially so that they occlude one another, observers experience an increase in the perceived height of the pile. Our results suggest that successive occlusions generate a greater sense of height than occlusion alone. Taken together with Engle et al. (2006) results showing that these stimuli create a percept of motion in depth, it appears that these stimuli tap into an underappreciated source of dynamic depth information. The nature of the depth percept remains unclear. That is, we do not yet know if observers have a metric representation of stack height based on a default inference of the relative depth between occluding disks or if the perceived height is simply based on the accumulation of multiple ordinal depth steps. 
Acknowledgments
R. Allison and L. Wilcox acknowledge the support of NSERC (Canada) and Qualcomm Inc. for conducting this research. A. Lee was supported by a post-doctoral fellowship from the Canada First Research Excellence Fund (CFREF) under the Vision: Science to Application (VISTA) program. 
Commercial relationships: none. 
Corresponding author: Robert S. Allison. 
Email: allison@cse.yorku.ca. 
Address: Department of Electrical Engineering and Computer Science, York University, 4700 Keele St., Toronto, Ont. M3J 1P3, Canada. 
Footnotes
1  Perhaps with a weight of zero in robust models if the discrepancy is too great.
Footnotes
2  Such behavior is typical of predictive mechanisms that incorporate sensory measurements and their time differences/derivatives (e.g., the Kalman-Bucy filter or Bayesian dynamic networks), and the example model in Figure 12 was chosen as a simple demonstration. A reviewer has suggested the intrinsic constraint cue-combination model (Kemp et al., 2023) could also explain the bias (and better precision see Section 6.2) in the dynamic relative to the static case. Here though we have a single cue, occlusion, and its change over time. A cue and its derivative are typically thought to provide information about different “kinds”—for example, position and motion, luminance and contrast, and depth and slant. As far as we know, none of the linear cue combination models presented in the literature combine different kinds of information (see Landy et al., 1995) and so would require a novel theoretical extension of these concepts.
References
Andersen, G. J., & Braunstein, M. L. (1983). Dynamic occlusion in the perception of rotation in depth. Perception & Psychophysics, 34(4), 356–362, https://doi.org/10.3758/BF03203048. [PubMed]
Anobile, G., Cicchini, G. M., & Burr, D. C. (2016). Number as a primary perceptual attribute: A review. Perception, 45(1–2), 5–31, https://doi.org/10.1177/0301006615602599. [PubMed]
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300, https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
Berry, M. J., Brivanlou, I. H., Jordan, T. A., & Meister, M. (1999). Anticipation of moving stimuli by the retina. Nature, 398(6725), Article 6725, https://doi.org/10.1038/18678.
Boyce, W. P., & Clifford, C. W. G. (2023). The long and short of it? A novel geometric illusion. Perception, 52(3), 151–182, https://doi.org/10.1177/03010066221148437. [PubMed]
Cavanagh, P., & Anstis, S. (2013). The flash grab effect. Vision Research, 91, 8–20, https://doi.org/10.1016/j.visres.2013.07.007. [PubMed]
Dehaene, S. (2003). The neural basis of the Weber-Fechner law: A logarithmic mental number line. Trends in Cognitive Sciences, 7(4), 145–147, https://doi.org/10.1016/S1364-6613(03)00055-X. [PubMed]
Dormal, V., & Pesenti, M. (2013). Processing numerosity, length and duration in a three-dimensional Stroop-like task: Towards a gradient of processing automaticity? Psychological Research, 77(2), 116–127, https://doi.org/10.1007/s00426-012-0414-3.
Duncan, R. O., Albright, T. D., & Stoner, G. R. (2000). Occlusion and the interpretation of visual motion: Perceptual and neuronal effects of context. Journal of Neuroscience, 20(15), 5885–5897, https://doi.org/10.1523/jneurosci.20-15-05885.2000.
Eagleman, D. M., & Sejnowski, T. J. (2007). Motion signals bias localization judgments: A unified explanation for the flash-lag, flash-drag, flash-jump, and Frohlich illusions. Journal of Vision, 7(4), 3, https://doi.org/10.1167/7.4.3. [PubMed]
Edwards, M., & Badcock, D. R. (2003). Motion distorts perceived depth. Vision Research, 43(17), 1799–1804, https://doi.org/10.1016/S0042-6989(03)00307-9. [PubMed]
Engel, S. A., Remus, D. A., & Sainath, R. (2006). Motion from occlusion. Journal of Vision, 6(9), 649–652, https://doi.org/10.1167/6.5.9. [PubMed]
Gallistel, C. R., & Gelman, R. (2000). Non-verbal numerical cognition: From reals to integers. Trends in Cognitive Sciences, 4(2), 59–65, https://doi.org/10.1016/S1364-6613(99)01424-2. [PubMed]
Gegenfurtner, K. (1999). The eyes have it! Nature, 398(6725), Article 6725, https://doi.org/10.1038/18563.
Gibson, J. J. (1982). The change from visible to invisible: A study of optical transitions. In Reed, E. & Jones, R. (Eds.), Reasons for Realism (pp. 194–202). Oxfordshire, UK: Routledge.
Graf, E. W., Adams, W. J., & Lages, M. (2004). Prior depth information can bias motion perception. Journal of Vision, 4(6), 427–433, https://doi.org/10.1167/4.6.2. [PubMed]
Hollingsworth, W. H., Simmons, J. P., Coates, T. R., & Cross, H. A. (1991). Perceived numerosity as a function of array number, speed of array development, and density of array items. Bulletin of the Psychonomic Society, 29(5), 448–450, https://doi.org/10.3758/BF03333967.
Javadi, A. H., & Aichelburg, C. (2012). When time and numerosity interfere: The longer the more, and the more the longer. PLoS ONE, 7(7), 1–9, https://doi.org/10.1371/journal.pone.0041496.
Jeffreys, H. (1961). Theory of Probability (3rd ed.). Oxford, UK: Oxford University Press.
Kanai, R., Sheth, B. R., & Shimojo, S. (2004). Stopping the motion and sleuthing the flash-lag effect: Spatial uncertainty is the key to perceptual mislocalization. Vision Research, 44(22), 2605–2619, https://doi.org/10.1016/j.visres.2003.10.028. [PubMed]
Kaplan, G. A. (1969). Kinetic disruption of optical texture: The perception of depth at an edge. Perception & Psychophysics, 6(4), 193–198, https://doi.org/10.3758/BF03207015.
Kaufman, E. L., Lord, M. W., Reese, T. W., & Volkmann, J. (1949). The discrimination of visual number. The American Journal of Psychology, 62(4), 498–525. [PubMed]
Kemp, J. T., Cesanek, E., & Domini, F. (2023). Perceiving depth from texture and disparity cues: Evidence for a non-probabilistic account of cue integration. Journal of Vision, 23(7), 13, https://doi.org/10.1167/jov.23.7.13. [PubMed]
Kersten, D., Mamassian, P., & Knill, D. C. (1997). Moving cast shadows induce apparent motion in depth. Perception, 26(2), 171–192, https://doi.org/10.1068/p260171. [PubMed]
Landy, M. S., & Kojima, H. (2001). Ideal cue combination for localizing texture-defined edges. Journal of the Optical Society of America A, 18(9), 2307–2320, https://doi.org/10.1364/JOSAA.18.002307.
Landy, M. S., Maloney, L. T., Johnston, E. B., & Young, M. (1995). Measurement and modeling of depth cue combination: In defense of weak fusion. Vision Research, 35(3), 389–412, https://doi.org/10.1016/0042-6989(94)00176-M. [PubMed]
Mandler, G., & Shebo, B. J. (1982). Subitizing: An analysis of its component processes. Journal of Experimental Psychology: General, 111(1), 1–22, https://doi.org/10.1037/0096-3445.111.1.1. [PubMed]
McDermott, J., & Adelson, E. H. (2004). The geometry of the occluding contour and its effect on motion interpretation. Journal of Vision, 4(10), 944–954, https://doi.org/10.1167/4.10.9. [PubMed]
Nijhawan, R. (2002). Neural delays, visual motion and the flash-lag effect. Trends in Cognitive Sciences, 6(9), 387–393, https://doi.org/10.1016/S1364-6613(02)01963-0. [PubMed]
Nishida, S. (2011). Advancement of motion psychophysics: Review 2001–2010. Journal of Vision, 11(5): 11, https://doi.org/10.1167/11.5.11. [PubMed]
Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Höchenberger, R., Sogo, H., & Lindeløv, J. K. (2019). PsychoPy2: Experiments in behavior made easy. Behavior Research Methods, 51(1), 195–203, https://doi.org/10.3758/s13428-018-01193-y. [PubMed]
Perrinet, L. U., & Masson, G. S. (2012). Motion-based prediction is sufficient to solve the aperture problem. Neural Computation, 24(10), 2726–2750, https://doi.org/10.1162/NECO_a_00332. [PubMed]
Prins, N., & Kingdom, F. A. A. (2009). Palamedes: Matlab routines for analyzing psychophysical data.
Robert, C. P., Chopin, N., & Rousseau, J. (2009). Harold jeffreys's theory of probability revisited. Statistical Science, 24(2), 141–172, https://doi.org/10.1214/09-STS284.
Sakano, Y., Allison, R. S., & Howard, I. P. (2012). Motion aftereffect in depth based on binocular information. Journal of Vision, 12(1), 11, https://doi.org/10.1167/12.1.11. [PubMed]
Schiff, W., Caviness, J. A., & Gibson, J. J. (1962). Persistent fear responses in rhesus monkeys to the optical stimulus of “Looming.” Science, 136(3520), 982–983, https://doi.org/10.1126/science.136.3520.982. [PubMed]
Stevens, S. S. (1958). Problems and methods of psychophysics. Psychological Bulletin, 55, 177–196, https://doi.org/10.1037/h0044251. [PubMed]
Takao, S., Sarodo, A., Anstis, S., Watanabe, K., & Cavanagh, P. (2022). A motion-induced position shift that depends on motion both before and after the test probe. Journal of Vision, 22(12), 19, https://doi.org/10.1167/jov.22.12.19. [PubMed]
Testolin, A., & McClelland, J. L. (2020). Do estimates of numerosity really adhere to Weber's law? A reexamination of two case studies. Psychonomic Bulletin and Review, https://doi.org/10.3758/s13423-020-01801-z.
von Helmholtz, H. (1962). Helmholtz's treatise on physiological optics, Vol. 3, Trans. From the 3rd German ed. (1867) (Southall, J. P. C., Trans.). Mineola, NY: Dover Publications.
Weiss, Y., Simoncelli, E. P., & Adelson, E. H. (2002). Motion illusions as optimal percepts. Nature Neuroscience, 5(6), Article 6, https://doi.org/10.1038/nn0602-858.
Wheatstone, C. (1838). Contributions to the physiology of vision. part the first. on some remarkable, and hitherto unobserved, phenomena of binocular vision. Philosophical Transactions of the Royal Society of London, 128, 371–394, https://doi.org/10.1098/rstl.1838.0019.
Supplementary material
Supplementary Movie S1. A representation of the stimuli used in the experiment. On each trial one side of the display contains a static pile of disks while the other has a stacking pile built up by sequentially adding one disk at a time. 
Supplementary Movie S2. A representation of the static stimuli used in the experiment. On each trial each side contains a static pile of disks. 
Supplementary Movie S3. A representation of the Reverse stacking stimuli used in Experiment 3. On each trial, one side of the display contains a static pile of disks while the other shows a Reverse stacking pile where disks are sequentially removed one at a time. 
Supplementary Movie S4. A representation of the nonoverlapping Static versus Successive stimuli used in the experiment. On each trial, one side of the display contained a nonoverlapping static arrangement of disks while the other shows a nonoverlapping arrangement built up by sequentially adding one disk at a time. 
Figure 1.
 
The time course of a trial in the Static versus Stacking (top) and Static (bottom) conditions. In the Static versus Stacking condition, one pile of disks was pre-formed and remained static on the screen for the duration of the trial. The other (stacking) pile of disks built up over the duration of the trial, with a new disk appearing every 200 ms until all disks were present (the location where a new disk would be added is indicated by a dashed outline for explanatory purposes, all disks were identical in the actual displays). At this point, the screen went blank, and participants were able to make their response. In the Static condition, participants viewed two static piles of disks for a total of 3.4 seconds, before the screen went blank and participants could respond.
Figure 1.
 
The time course of a trial in the Static versus Stacking (top) and Static (bottom) conditions. In the Static versus Stacking condition, one pile of disks was pre-formed and remained static on the screen for the duration of the trial. The other (stacking) pile of disks built up over the duration of the trial, with a new disk appearing every 200 ms until all disks were present (the location where a new disk would be added is indicated by a dashed outline for explanatory purposes, all disks were identical in the actual displays). At this point, the screen went blank, and participants were able to make their response. In the Static condition, participants viewed two static piles of disks for a total of 3.4 seconds, before the screen went blank and participants could respond.
Figure 2.
 
(a) Sample psychometric function for a representative observer. The proportion of trials where the Static standard was chosen is plotted against the number of disks in the standard pile versus the test pile, either for Stacking (black) or Static (red) test stimuli. Cumulative normal fits are shown in black solid and red dashed lines, respectively. (b) Fitted psychometric functions for each observer (by color). Solid and dashed lines show Stacking and Static test stimuli, respectively.
Figure 2.
 
(a) Sample psychometric function for a representative observer. The proportion of trials where the Static standard was chosen is plotted against the number of disks in the standard pile versus the test pile, either for Stacking (black) or Static (red) test stimuli. Cumulative normal fits are shown in black solid and red dashed lines, respectively. (b) Fitted psychometric functions for each observer (by color). Solid and dashed lines show Stacking and Static test stimuli, respectively.
Figure 3.
 
The mean relative points of subjective equality for experienced observers in the Static and Static versus Stacking conditions of Experiment 1. Error bars are 95% confidence intervals.
Figure 3.
 
The mean relative points of subjective equality for experienced observers in the Static and Static versus Stacking conditions of Experiment 1. Error bars are 95% confidence intervals.
Figure 4.
 
The mean just-noticeable differences for experienced observers in the Static and Static versus Stacking conditions of Experiment 1. Error bars are 95% confidence intervals.
Figure 4.
 
The mean just-noticeable differences for experienced observers in the Static and Static versus Stacking conditions of Experiment 1. Error bars are 95% confidence intervals.
Figure 5.
 
The mean relative points of subjective equality for naïve observers in the Static and Static versus Stacking conditions of Experiment 2. Error bars are 95% confidence intervals.
Figure 5.
 
The mean relative points of subjective equality for naïve observers in the Static and Static versus Stacking conditions of Experiment 2. Error bars are 95% confidence intervals.
Figure 6.
 
The mean just-noticeable differences for naïve observers in the Static and Static versus Stacking conditions of Experiment 2. Error bars are 95% confidence intervals.
Figure 6.
 
The mean just-noticeable differences for naïve observers in the Static and Static versus Stacking conditions of Experiment 2. Error bars are 95% confidence intervals.
Figure 7.
 
The mean relative points of subjective equality for experienced observers in the Forward and Reverse conditions of Experiment 3. Error bars are 95% confidence intervals.
Figure 7.
 
The mean relative points of subjective equality for experienced observers in the Forward and Reverse conditions of Experiment 3. Error bars are 95% confidence intervals.
Figure 8.
 
The mean just-noticeable differences for experienced observers in the Forward and Reverse conditions of Experiment 3. Error bars are 95% confidence intervals.
Figure 8.
 
The mean just-noticeable differences for experienced observers in the Forward and Reverse conditions of Experiment 3. Error bars are 95% confidence intervals.
Figure 9.
 
The time course of a trial in the Static versus Successive (top) and Static (bottom) conditions in Experiment 4. In the Static versus Successive condition, one side of the screen was static for the duration of the trial. The other (successive) side of the screen had new disks added over the duration of the trial (indicated by a dashed outline for explanatory purposes, all disks were identical in the actual displays), with a new disk appearing every 200 ms until all disks were present. At this point, the screen went blank, and participants were able to make their response. In the Static condition, participants viewed two static piles of disks for either 1.6 or 2.8 seconds before the screen went blank and participants could respond.
Figure 9.
 
The time course of a trial in the Static versus Successive (top) and Static (bottom) conditions in Experiment 4. In the Static versus Successive condition, one side of the screen was static for the duration of the trial. The other (successive) side of the screen had new disks added over the duration of the trial (indicated by a dashed outline for explanatory purposes, all disks were identical in the actual displays), with a new disk appearing every 200 ms until all disks were present. At this point, the screen went blank, and participants were able to make their response. In the Static condition, participants viewed two static piles of disks for either 1.6 or 2.8 seconds before the screen went blank and participants could respond.
Figure 10.
 
The mean relative points of subjective equality for naïve observers in Experiment 4. One set of participants completed both the eight-disk and 14-disk standard levels in the Static versus Successive condition, whereas a different set of participants completed both the eight-disk and 14-disk standard levels in the Static condition. Error bars are 95% confidence intervals.
Figure 10.
 
The mean relative points of subjective equality for naïve observers in Experiment 4. One set of participants completed both the eight-disk and 14-disk standard levels in the Static versus Successive condition, whereas a different set of participants completed both the eight-disk and 14-disk standard levels in the Static condition. Error bars are 95% confidence intervals.
Figure 11.
 
The mean just-noticeable differences for naïve observers in Experiment 4. As for the relative PSEs, one set of participants completed both the eight-disk and 14-disk standard levels in the Static versus Successive condition, whereas a different set of participants completed both standard levels in the Static condition. Error bars are 95% confidence intervals.
Figure 11.
 
The mean just-noticeable differences for naïve observers in Experiment 4. As for the relative PSEs, one set of participants completed both the eight-disk and 14-disk standard levels in the Static versus Successive condition, whereas a different set of participants completed both standard levels in the Static condition. Error bars are 95% confidence intervals.
Figure 12.
 
Simulated psychometric functions approximating those found in our observers obtained from a Markov Chain based discrete Bayesian model. Simulations for static, forward, and backward motion with a prediction interval of four steps were compared to a model run on a static presentation. The perception of the number of discs was modeled with a Beta distribution, Beta(7,3,0,n), which accounted approximately for the number of completely hidden discs when n discs were presented. Perception of the change followed a Bernoulli process with high likelihood (p = 0.99) of detecting the new disc. The update process model assumed a high likelihood for the change state to continue on the next step (p = 0.9) and a triangular distribution for process noise peaking on the predicted number of discs on the next step. Priors were set as Uniform(0, 31) for the number of disks and as the uninformative (“Jeffreys”) prior for the change state, Beta(0.5, 0.5). The model was simplified based on the facts that only a finite discrete set of discs could be visible [0, 31] and change per step was discrete [0, 1]. These stimulus constraints allowed the Bayesian posterior distribution to be evaluated exhaustively on each step. The inset shows prediction error averaged over 500 runs for a four-sample prediction as a function of step number in a sequence that alternated between increasing (filled black symbols) and static phases (no symbol). Note the increase in error accompanying these changes in state.
Figure 12.
 
Simulated psychometric functions approximating those found in our observers obtained from a Markov Chain based discrete Bayesian model. Simulations for static, forward, and backward motion with a prediction interval of four steps were compared to a model run on a static presentation. The perception of the number of discs was modeled with a Beta distribution, Beta(7,3,0,n), which accounted approximately for the number of completely hidden discs when n discs were presented. Perception of the change followed a Bernoulli process with high likelihood (p = 0.99) of detecting the new disc. The update process model assumed a high likelihood for the change state to continue on the next step (p = 0.9) and a triangular distribution for process noise peaking on the predicted number of discs on the next step. Priors were set as Uniform(0, 31) for the number of disks and as the uninformative (“Jeffreys”) prior for the change state, Beta(0.5, 0.5). The model was simplified based on the facts that only a finite discrete set of discs could be visible [0, 31] and change per step was discrete [0, 1]. These stimulus constraints allowed the Bayesian posterior distribution to be evaluated exhaustively on each step. The inset shows prediction error averaged over 500 runs for a four-sample prediction as a function of step number in a sequence that alternated between increasing (filled black symbols) and static phases (no symbol). Note the increase in error accompanying these changes in state.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×