Free
Article  |   August 2013
Rotating columns: Relating structure-from-motion, accretion/deletion, and figure/ground
Author Affiliations
Journal of Vision August 2013, Vol.13, 6. doi:https://doi.org/10.1167/13.10.6
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Vicky Froyen, Jacob Feldman, Manish Singh; Rotating columns: Relating structure-from-motion, accretion/deletion, and figure/ground. Journal of Vision 2013;13(10):6. https://doi.org/10.1167/13.10.6.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract
Abstract
Abstract:

Abstract  We present a novel phenomenon involving an interaction between accretion deletion, figure-ground interpretation, and structure-from-motion. Our displays contain alternating light and dark vertical regions in which random-dot textures moved horizontally at constant speed but in opposite directions in alternating regions. This motion is consistent with all the light regions in front, with the dark regions completing amodally into a single large surface moving in the background, or vice versa. Surprisingly, the regions that are perceived as figural are also perceived as 3-D volumes rotating in depth (like rotating columns)—despite the fact that dot motion is not consistent with 3-D rotation. In a series of experiments, we found we could manipulate which set of regions is perceived as rotating volumes simply by varying known geometric cues to figure ground, including convexity, parallelism, symmetry, and relative area. Subjects indicated which colored regions they perceived as rotating. For our displays we found convexity to be a stronger cue than either symmetry or parallelism. We furthermore found a smooth monotonic decay of the proportion by which subjects perceive symmetric regions as figural, as a function of their relative area. Our results reveal an intriguing new interaction between accretion-deletion, figure-ground, and 3-D motion that is not captured by existing models. They also provide an effective tool for measuring figure-ground perception.

Introduction
The interpretation of images in three dimensions involves the determination of the relative depth ordering of surfaces, as well as the estimation of the 3-D shape of individual surfaces, both of which require integrating a wide array of separate cues. In this paper we describe a novel phenomenon in which image motion gives rise to a vivid 3-D interpretation that involves figure-ground interpretation and 3-D structure from motion but cannot completely be explained by existing models of these phenomena. 
Figure ground
Figure-ground (f/g) interpretation, in which foreground regions are segmented from backgrounds, has been extensively studied in the decades since it was first noted by Rubin (1921). In f/g interpretation, the common boundary between two regions is perceived as “owned” by one of the regions, which means that it is perceived as the bounding contour of the figural region, while the ground region is perceived as extending behind the figure at a farther depth. Numerous factors have been identified that tend to promote figural status, including symmetry (Bahnsen, 1928; Kanizsa & Gerbino, 1976; Machilsen, Pauwels, & Wagemans, 2009), convexity (Kanizsa & Gerbino, 1976; Metzger, 1953), parallelism (Metzger, 1953), axiality and part structure (Feldman & Singh, 2006; Froyen, Feldman, & Singh, 2010; Hoffman & Singh, 1997), articulating motion (Barenholtz & Feldman, 2006), and many others (see Wagemans et al., 2012, for a review). The determination of figure and ground is, thus, an essential step in the creation of a 3-D interpretation of the image. 
Accretion deletion
Most research on f/g has involved static images, but motion provides a wealth of potential cues to figural status and depth ordering. One salient motion cue to depth is the accretion and deletion of texture (Gibson, Kaplan, Reynolds, & Wheeler, 1969; Hegdé, Albright, & Stoner, 2004; Kaplan, 1969; Michotte, Thinès, & Crabbé, 1964; Mutch & Thompson, 1985; Thompson, Mutch, & Berzins, 1985). When a textured surface moves behind another object, the texture progressively disappears (deleting) behind the occluding surface and, likewise, reappears (accreting) as the texture emerges from behind it. Either of these cases can provide a vivid sense of relative depth, with the accreting/deleting surface interpreted as behind. The strength of the accretion-deletion cue can be seen in its ability to override another depth cue, motion parallax (Ono, Rogers, Ohmi, & Ono, 1988). Accretion-deletion has also been shown to resolve the direction of rotation ambiguity present in orthographically projected rotating spheres (Ono et al., 1988). However, recently, it has been found that the strength of the accretion-deletion cue, relative to stereo disparity, shows individual differences (Hildreth & Royden, 2011). While some subjects strongly favor the stereo cue when put into conflict with accretion deletion, others favor the accretion-deletion cue. 
In this paper, we describe a new phenomenon that arises from the interaction between accretion deletion and geometric (static) cues to figure ground. In contrast to the standard interpretation of accretion/deletion, in which the accreting or deleting texture is perceived as behind, in our displays, certain accreting or deleting textures are perceived as figural, i.e., in front, and rotating in depth, so that the accretion and deletion is attributed to self-occlusion of a rotating 3-D object. 
3-D interpretation in our displays
Figure 1A shows a schematic of our basic display. The display is similar to a classical f/g stimulus with alternating light and dark vertical regions but with each region containing texture that is moving horizontally at a constant speed. The textures in alternating regions move in opposite directions. Hence, in a typical display, alternating regions might have (say) a dark texture moving to the left, while the other regions would have a light texture moving to the right (see Figure 2). The accretion/deletion cues in such a display are ambiguous in that every boundary has texture accreting or deleting on both sides. For example, at a given boundary, the light texture could be seen as disappearing behind the dark texture, or, alternatively, the dark texture could be seen as disappearing behind the light texture. However, these two percepts are not seen at the same time. Rather, the interpretation tends to be determined by the figure/ground assignment of the two regions, as determined by static geometric cues (e.g., convexity or symmetry). The ground regions, which are invariably all of one color, are interpreted as completing amodally behind the figural regions to form one continuous translating sheet. The figural regions, on the other hand, are perceived as distinct objects and in front. 
Figure 1
 
Display setup and phenomenology: (A) The displays were created by adding motion in one direction to odd regions and in the other direction to even regions in classical figure-ground displays. (B) This could yield one of two percepts depending on which one was perceived as figural. The black ones were perceived as rotating in front of a white background which was seen as sliding behind them, or vice versa.
Figure 1
 
Display setup and phenomenology: (A) The displays were created by adding motion in one direction to odd regions and in the other direction to even regions in classical figure-ground displays. (B) This could yield one of two percepts depending on which one was perceived as figural. The black ones were perceived as rotating in front of a white background which was seen as sliding behind them, or vice versa.
 
Figure 2
 
Movie of the stimuli for Experiment 1: (A) convexity display, (B) symmetry display, (C) parallelism display, (D) unbiased display (see more demonstrations at http://ruccs.rutgers.edu/∼jacob/demos/motionfg).
Surprisingly, however, the figural regions are also seen as 3-D volumes rotating in depth about a vertical axis (Figure 1B). When subjects participating in this study were asked to freely describe these displays, almost all (18/19) subjects spontaneously reported seeing rotating columns over a flat translating textured sheet, like rolling pins rotating over a flat sheet. This aspect of the percept is novel and, in two ways, inconsistent with conventional models. First, the texture motion within each region has constant velocity, in contrast to the sinusoidal speed profile which would be consistent with the geometry of 3-D rotation (assuming a locally cylindrical shape rotating at constant angular velocity), and which is generally understood as a prerequisite for the percept of structure-from-motion (Andersen & Bradley, 1998; Braunstein, 1962). Second, the percept of 3-D rotation is also obtained with asymmetric columns (e.g., convex columns in Figure 2A) which would—were they physically undergoing a 3-D rotation—continuously change their silhouette profile (2-D projection of their bounding contour). In our displays, the boundaries between columns do not change at all over the course motion sequence, which is geometrically inconsistent with 3-D rotation. Nevertheless, the percept of 3-D rotation survives this inconsistency. 
A natural way to understand this percept of 3-D rotation, despite its inconsistency with the constant-velocity profile of the texture, is that it allows the visual system to “explain” the accreting and deleting textures present on both sides of each boundary. On one side of the boundary, accretion deletion is attributed to occlusion behind another surface (the “standard” interpretation of accretion deletion). Whereas, on the other side, the accreting or deleting texture is attributed to self-occlusion due to the 3-D rotation of a volumetric object. It is clear that, if the light-colored regions (say) are being perceived as disappearing behind adjacent surfaces, then the dark-colored regions cannot also be given the same interpretation. In other words, the light and dark columns cannot both be disappearing behind each other. An interpretation of 3-D rotation on one set of regions solves this problem, since the rotating regions are now interpreted as being in front, and the accretion/deletion of the texture is attributed to self-occlusion due this 3-D rotation. The texture wraps around the 3-D object; it disappears when the object turns away from the observer and appears when it turns towards the observer. 
This still leaves a two-way ambiguity however; namely, which set of regions—dark or light—should be assigned the interpretation of rotating in depth. This is where the geometric cues to figure and ground (such as convexity) become relevant. Our observations with such displays suggest that the regions that are more likely to be figural based on such static (geometric) cues, are also the ones that tend to be perceived as rotating in 3-D. The experiments reported in this paper document this observation empirically and, moreover, systematically manipulate geometric f/g cues—convexity, parallelism, symmetry, and relative area—in order to examine their influence on the percept of 3-D rotation. 
Experiment 1
Experiment 1 tested how the bistable percept depicted in Figure 1B was disambiguated by three geometric cues to f/g interpretation: convexity, symmetry, and parallelism. In addition to these three cues, an “unbiased” condition was added, containing randomly generated displays that were neutral with respect to the three cues manipulated in the other conditions. Subjects were shown classical f/g stimuli containing either of three cues or the unbiased displays (Figure 2), to which motion was added as in Figure 1A. In order to determine which of two possible percepts they had, they were asked to indicate which regions, black or white, they perceived as rotating. 
Method
Participants
Nine Rutgers University students, naive to the purpose of the experiment, participated for course credit. 
Stimuli
The stimuli for this experiment were created in two stages. In the first stage, basic (static) f/g displays were created, containing alternating black and white stripes with specific geometric properties. In the second stage, textural motion was added to these displays to generate the motion stimuli (Figure 1). More specifically, in the first stage, basic f/g stimuli of 7.29° of visual angle (DVA) high and 9.68 DVA wide were created, containing eight alternating dark and light regions. Every other region in these stimuli was manipulated to contain one of three f/g cues (convexity, symmetry, or parallelism) biasing subjects' percepts to seeing those regions as figural, or the unbiased shapes (which were equated for those three f/g cues). The convexity displays were created as follows. A convex boundary consisted of a series of half circles of random radii, resulting in a series of convex parts along the boundary. This was repeated for every boundary in such a way that every other region consisted of convex parts. Furthermore, no two boundaries were the same, and boundaries were chosen so that every region had equal area size. Using this procedure we were able to generate displays closely resembling classical convexity stimuli (e.g., Kanizsa & Gerbino, 1976) (see Figure 2A). 
The remaining f/g cues were implemented by representing the boundaries as B-splines. This gave us the advantage that we could easily control for overall curvature. Curvature is closely related to convexity (more positive curvature means more convex) and, if not controlled, could confound the effects of the other f/g cues implemented below. We operationalized this control by keeping the sum of signed curvature along each boundary equal to zero. A first cue that we manipulated using this method was parallelism. B-splines for boundaries in parallel displays were defined by 12 control points and a polynomial degree of three. The y coordinates of the control points were at equidistant points along the height of the display. The x coordinates of the two most bottom (x1, x2) and two most top (x11, x12) control points were set so that every boundary in the display was equidistant to each other at these positions. All other control points were assigned x coordinates that were randomly sampled from [x1 − 37.70, x1 + .37.70] arcmin. To create the parallel display, four different B-spline boundaries were created. Parallel regions in these displays were then defined as a translated set of each of those (see Figure 2C). Boundaries in symmetric displays were generated similarly, with the only difference being the number of control points, here 20. Symmetric regions in these displays were then defined as a mirrored set of each of the four different B-spline boundaries created (see Figure 2B). Lastly, we created a class of displays we refer to as “unbiased.” In these displays, there was no bias for odd or even regions to be preferred as figural, based on any of the three former cues, i.e., convexity, parallelism, and symmetry. These displays were created by generating boundaries in the same way as the boundaries for the symmetric displays. In this case, though, only one unique boundary was generated for a given display and a mirrored set of it was repeated every other region (see Figure 2D). Pilot data showed that some of such randomly generated unbiased displays still induced some figural bias. Therefore, we chose only those displays which showed no such figural bias. 
For each of these four conditions, three unique displays were generated, resulting in 12 displays with different geometries. Furthermore, on half the trials, the odd regions were dark and the even were light colored, while in the other half, it was the other way around (counterbalanced and crossed with other factors). On half the trials, the displays were shown reflected over their vertical axis (counterbalanced and crossed with other factors). Finally, on half of the trials, the displays were shown reflected over their horizontal middle axis (counterbalanced and crossed with other factors). 
In the second stage of stimulus generation, textural motion of was added to the regions in the displays, generated in the first stage, in such a way that all dark regions had identical motion in one direction, and all light regions had identical motion in the opposite direction. In both cases, constant-speed horizontal motion was imparted to the random dot textures. In case of the dark regions, the random dot texture was sampled from a beta distribution with parameters [α = 6, β = 2], generating a dark texture with sparsely scattered light pixels. The texture for the light regions, on the other hand, was sampled from a beta distribution with parameters [α = 2, β = 6], generating a light texture with sparsely scattered dark pixels. Every pixel in these textures was 1.47 arcmin by 1.47 arcmin. Each of these textures could move either to the left or to the right. This was implemented as follows. For the rightward motion, in each frame t the texture columns [2, N] were taken from texture columns [1, N − 1] in frame t − 1, and the first column in frame t was resampled in the manner described above (similarly for leftward motion). This procedure was repeated at a rate of 40 frames/s, resulting in a motion with a speed of 0.98 DVA/s. These moving textures were then assigned to the dark and light columns in the previously generated f/g stimuli in such a way that, for example, all dark regions had motion in one direction while all white regions had motion in the opposite directions (see the sample stimuli online). We also counterbalanced for the direction of motion so that in half the trials the dark regions had leftward motion, while in the other half they had rightward motion (counterbalanced and crossed with other factors). 
Design and procedure
Subjects sat at 85 cm from a 21 in. CRT monitor (144 Hz, 1024 × 768 pixels) on which the displays were presented using Psychtoolbox (Brainard, 1997; Kleiner et al., 2007) on a Windows XP PC. Subjects ran a total of 192 trials split into two blocks, i.e., 2 (Color) × 2 (Horizontal Reflection) × 2 (Vertical Reflection) × 2 (Motion Direction) × 4 (Geometric f/g Cues) × 3 (Displays Per Cue). All conditions were counterbalanced for each subject, and trials were randomized for each subject separately. The experiment started with 16 practice trials to acquaint the subject with the displays and the task. Both in the practice and main experiment, each trial consisted of 800 ms of premask followed by 800 ms of the premask with a fixation cross added to it. The mask consisted of eight randomly generated semitransparent single frames of unbiased displays overlaid on top of each other. Following this, the actual experimental display was shown first for 2 s static on the first frame, then, 2 s as a dynamic display with moving texture planes, without a fixation cross. Lastly, the subject was presented with a post-mask screen, identical to the premask, for a minimum of 800 ms. Once this post-mask was presented, the subject was asked to indicate which colored region they perceived as rotating by means of a keyboard response. 
Results and discussion
Figure 3 shows the results plotted as the proportion of trials the subjects reported seeing regions containing one of the three geometric f/g cues as rotating. For the unbiased condition, data was reported as the proportion of times the subjects reported seeing the odd regions (e.g., the light regions in Figure 2D) as rotating. Responses were analyzed for the geometric f/g cue factor only (i.e., unbiased, convexity, parallelism, symmetry). No other counterbalancing factors (e.g., color) were found to yield any main effect or interaction. All, except the unbiased, yielded responses that were significantly different from chance: convexity, t(8) = 10.08, p < 0.001 —05; parallelism, t(8) = 2.56, p < 0.05; symmetry, t(8) = 2.98, p < 0.05; unbiased, t(8) = 0.32, p = 0.76. A multilevel logistic regression showed a significant main effect of geometric f/g cues when compared to an unconditional means model (containing only an intercept) by means of a likelihood-ratio test (LR = 248, df = 12, p < 0.001). Tukey pairwise comparisons revealed the following effects. Convexity more strongly biased subjects percept towards seeing rotation in those regions manipulated than any of the other cues manipulated (convexity unbiased, p < 0.001; convexity symmetry, p < 0.001; convexity parallelism, p < 0.001). For all other pairwise comparisons, the null hypothesis could not be rejected. The strong effect of convexity might be attributed to number of regions present in our displays. Peterson and Salvagio (2008) showed that, in case an f/g display consisted of eight regions (four convex and four concave), averaged over all subjects, convex regions were seen as figural in about 91% of the time. This closely matches our data, in which the convex regions were seen as rotating 90% of the time, averaged over all subjects. Symmetry, on the other hand, showed a weaker effect than convexity, a result reported before (Kanizsa & Gerbino, 1976). Others found subjects to see symmetric regions as figural 75% of the time (Peterson & Gibson, 1994). In our experiment, with very different displays and geometrical configurations, we found, averaged over all subjects, symmetric regions to be perceived as rotating in 61% of the time. The weakest effect was found for parallelism, where, averaged over all subjects, parallel regions were only seen as rotating 58% of time. 
Figure 3
 
Results for Experiment 1 indicating the proportion of trials the subjects reported seeing regions containing one of the three geometric f/g cues as rotating. For the unbiased condition, data was reported as the proportion of times the subjects reported seeing the odd regions as rotating. Error bars represent ±1 SE as computed between subjects. Conditions significantly different from 0.5 are indicated by *. The red line corresponds to chance level (0.5).
Figure 3
 
Results for Experiment 1 indicating the proportion of trials the subjects reported seeing regions containing one of the three geometric f/g cues as rotating. For the unbiased condition, data was reported as the proportion of times the subjects reported seeing the odd regions as rotating. Error bars represent ±1 SE as computed between subjects. Conditions significantly different from 0.5 are indicated by *. The red line corresponds to chance level (0.5).
Experiment 2
Experiment 2 further tested the influence of static f/g cues on subjects' percepts of the displays by introducing area size. Rather than studying the effect of area in isolation, that is in displays containing straight boundaries, it was put into interaction with symmetry. By doing so, we not only tested the effect of area size but also showed how different cues might be combined. Subjects were shown the basic symmetric displays from Experiment 1 in which the width of the symmetric regions was varied through five discrete steps (Figure 4). In order to determine which of two possible percepts they held (Figure 1B), subjects were asked to indicate which regions, black or white, they perceived as rotating. 
Figure 4
 
Stimuli for Experiment 2: Five manipulations of the relative area of the symmetric regions that were implemented. Values indicate the logarithm of the ratio of the width of the symmetric region (arcmin) over the width of the asymmetric region (arcmin).
Figure 4
 
Stimuli for Experiment 2: Five manipulations of the relative area of the symmetric regions that were implemented. Values indicate the logarithm of the ratio of the width of the symmetric region (arcmin) over the width of the asymmetric region (arcmin).
Method
Participants
Ten Rutgers University students, naive to the purpose of the experiment, participated for course credit. 
Stimuli and procedure
The stimuli for this experiment were created using the same two-step procedure as in Experiment 1. Four displays each with different symmetric regions analogous to Experiment 1 were created. To manipulate the relative area of the symmetric regions in these displays, we adjusted the pairwise distances between adjacent boundaries. In contrast to Experiment 1, boundaries were not placed in such a way that the x1 control point of each is equidistant to each other. More precisely, we manipulated five levels of area size, ranging from the symmetric regions being narrower than the asymmetric regions to the symmetric regions being wider than the asymmetric regions. This was implemented by adjusting the positions of the boundaries so that the x1s of a boundaries that make up a symmetric region were positioned such that their distance was [67, 75, 83, 92, 100] arcmin, while the boundaries making up the asymmetric regions were set to [100, 92, 83, 75, 67] arcmin, respectively, to make up the five different area ratio displays. This brings us to a total of 20 geometrically different displays, 4 (Symmetric Displays) × 5 (Area Ratios)—see Figure 4. As in Experiment 1, displays were counterbalanced for color, vertical reflection, and horizontal reflection. 
Subjects were tested in the same environment using the same protocol as in Experiment 1. Each subject ran a total of 320 trials, split into two blocks, i.e., 2 (Color) × 2 (Horizontal Reflection) × 2 (Vertical Reflection) × 2 (Motion Direction) × 4 (Area Ratios) × 4 (Displays Symmetric Displays). All conditions were counterbalanced for each subject, and trials were randomized for each subject. 
Results and discussion
Figure 5 shows the results plotted as the proportion of trials the subjects reported perceiving the symmetric regions as rotating. Responses were analyzed in terms of the logarithm of all five area ratios (the other, counterbalancing factors, such as color, were found not to yield any main nor interaction effect). A multilevel logistic regression showed a significant effect of log area ratio when compared to an unconditional means model (containing only an intercept) by means of a likelihood-ratio test (LR = 72, df = 3, p < 0.001). The narrower the symmetric regions (relative to the asymmetric regions), the more likely they were seen as rotating (as indicated by the estimated slope1 β1 = −1.1164, SE = 0.2351). This effect, averaged over subjects, ranged from 77% for the narrowest symmetric regions to 58% for the widest symmetric regions. (Note that the latter corresponds to a cue-conflict situation, where symmetry and relative area exert influences in opposite directions.) In the neutral case, where area size was equal between symmetric and asymmetric regions, subjects on average perceived the symmetric regions as rotating in 69% of trials, which was slightly higher than we found in Experiment 1, 61%. This difference could be partially due to the difference in manipulated conditions in the two experiments inducing a difference in mental set. Overall, the results extend earlier findings (Baylis & Driver, 1995; Koffka, 1935; Rubin, 1921) of the effects of area size on figure ground, and show how area size combines with symmetry in a monotomic fashion to strengthen or weaken subjects' bias towards seeing the symmetric regions as figural—and, in our case, rotating in 3-D. 
Figure 5
 
Results for Experiment 2: Each data point (horizontal jitter was added to these for presentation purposes) depicts, for an individual subject, the proportion of trials on which they saw the symmetric regions as rotating. The blue line depicts the logistic regression model, with the gray shading its 95% confidence interval. The red line corresponds to chance level (0.5).
Figure 5
 
Results for Experiment 2: Each data point (horizontal jitter was added to these for presentation purposes) depicts, for an individual subject, the proportion of trials on which they saw the symmetric regions as rotating. The blue line depicts the logistic regression model, with the gray shading its 95% confidence interval. The red line corresponds to chance level (0.5).
General discussion
Interpreting images in three dimensions involves estimating the relative depths between different surfaces, as well as the 3-D shape of each surface, by integrating a wide array of cues. In this paper, we presented a novel phenomenon that provides insights into how the interaction between accretion deletion (a depth-from-motion cue) and geometric f/g (static) cues give rise to 3-D interpretations of rotating surfaces. In our displays, textural motion was added to classic f/g displays (see Figure 1). The accretion-deletion cue in these displays was fully ambiguous, since accretion deletion was occurring on both sides of each boundary. We found that this motion-based ambiguity was resolved by static cues to figure and ground. That is, the regions predicted to be figural by the static cues tended to be perceived as volumetric objects rotating in 3-D, whereas the regions predicted to be ground were perceived as moving behind the figural regions, and amodally completing into a single frontoparallel surface. Moreover, we found that the proportion by which subjects perceive regions as rotating was dependent on the strength of the figural cues in those regions. On the methodological side, this phenomenon provides a novel way of measuring f/g perception, since the percept of 3-D rotation closely tracks the strength of the f/g cue. On the theoretical side, our results highlight the need for studying depth-from-motion cues and geometric f/g cues as an ensemble. The observed interactions between depth from accretion-deletion and static f/g cues cannot be explained by standard accounts of accretion deletion or 3-D structure from motion. 
Accretion deletion
Our results show that accreting and deleting textured surfaces can be interpreted in two very distinct ways depending on their interpretation of relative depth based on geometric f/g cues. One such interpretation is directly in line with its standard interpretation as presented by Kaplan (1969): The textured region that undergoes accretion or deletion is perceived as a surface that is moving behind another surface. A second interpretation, which we introduce here, arises when a region undergoing accretion deletion is perceived as being in front based on geometric f/g cues. In this case, such the accretion and deletion of the textured surface is interpreted as arising from self-occlusion due to rotation in depth of 3-D object. A similar secondary interpretation was informally reported by Kaplan (1969) for displays in which textures of approximately equal average luminance on both sides of a margin, unlike ours, were accreting and deleting. However, in their case, subjects reported seeing both sides as rotating. This is in strong contrast to our case in which only for example the odd regions were perceived as rotating, while the other even regions were perceived as continuing behind. Major differences between our and their display are the introduction of luminance boundaries and the introduction of multiple regions (two vs. eight in our displays). Further research currently under way will show if those are the deciding factors of the perception found in our displays. 
At a more basic level the current phenomenon suggests that geometric cues to f/g and accretion-deletion cues are combined together to form a final percept, and should be studied and modeled as an ensemble. Furthermore, it poses a challenge to current models of depth from motion (e.g., Barnes & Mingolla, 2013; Beck, Ognibeni, & Neumann, 2008; Berzhanskaya, Grossberg, & Mingolla, 2007; Raudies & Neumann, 2010; Yonas, Craton, & Thompson, 1987). Specifically, most current models do encompass a so-called FORM module (e.g., Barnes & Mingolla, 2013; Beck et al., 2008; Berzhanskaya et al., 2007; Raudies & Neumann, 2010) in which luminance-based borders are analyzed. However in those models, these luminance-based borders are not processed in such a way that the shape of these borders (i.e., geometric cues to f/g) can influence the relative depth of surfaces as computed from local motion cues, such as accretion deletion. 
3-D rotation
The percept of 3-D rotation obtained in our displays is inconsistent with our stimulus displays in two ways. First, in our displays all texture elements merely undergo a constant-velocity translation in a specified direction, whereas a locally cylindrical structure would generate a sinusoidal speed profile in the image. Indeed, according to standard models of structure from motion, an interpretation of rigid 3-D rotation requires a sinusoidal speed profile (Andersen & Bradley, 1998; Braunstein, 1962). Nevertheless, observers perceive the “figural” regions as volumetric objects rotating in depth. Second, the 3-D rotation of an asymmetric object results in continuously changing occluding contours. In our displays, however, the contours always remained fixed (in all conditions—three out of four of which involved asymmetric regions). Yet, again, the “figural” regions were perceived as rotating in 3-D. In a general sense, our results favor a Bayesian viewpoint in which image data that conflicts with a particular scene model generally only diminish its likelihood, rather than flatly ruling it out. Nevertheless existing Bayesian models of structure-from-motion (e.g., Hogervorst & Eagle, 1998) cannot handle complex situations, such as our displays, that involve multiple surfaces at distinct depths, and certainly do not take figural status into account. Hence while our data tend to favor a Bayesian view of motion and depth interpretation, a model that can explain our results does not yet exist (though our ongoing research aims to develop one). 
Why is the percept of 3-D rotation generated despite these inconsistencies with the image data? As noted in the Introduction, since accretion deletion is occurring on both sides of each boundary, a “standard” interpretation of accretion deletion (of disappearance behind an adjacent surface) is not possible on both sides of the boundary. An interpretation of 3-D rotation on one side allows the visual system to solve this problem, allowing it to “explain” the accreting and deleting textures on both sides of each boundary in a mutually consistent way. On one side accretion deletion is attributed to occlusion behind adjacent surfaces, whereas, on the other side, the accreting or deleting texture is attributed to self-occlusion due to the 3-D rotation of a volumetric object. 
Figure-ground geometric cues
In this paper, we showed how our inherently ambiguous motion displays were disambiguated by geometric cues to f/g. We found different cues to have different strengths in biasing subjects' percepts of seeing particular regions as rotating in 3-D. For our displays, we found convexity to yield the strongest bias, in that convex regions were most likely seen as rotating volumes. The effects of parallelism and symmetry were somewhat less strong. We did not replicate the strong effect of symmetry found by others (e.g., Peterson & Gibson, 1994). This failure to replicate was due to the fact that our displays differed from theirs in many respects, especially in the geometry of the boundaries. In other work (Froyen, Tanrıkulu, Singh, & Feldman, in press) we found that the degree of symmetry could be modulated by manipulating the undulations along the boundary. Hence, “symmetry” is not a binary cue—either present or absent—that has one particular strength. Rather the strength of a cue is dependent on the geometry of the entire region, and its complement. Experiment 2 further emphasizes this by showing the influence of area size on figural status can be systematically manipulated. We found a smooth monotonic decay of the proportion by which subjects perceive symmetric regions as figural, as a function of the relative area of those regions. This finding expands on earlier findings (Baylis & Driver, 1995; Koffka, 1935; Rubin, 1921), that narrower regions were more likely to be perceived as figural. Furthermore, we showed that if two cues are combined in the same display, it is not the case that one will simply dominate the other. For example when the asymmetric regions were narrower than the symmetric regions, we did not find that subjects were biased to seeing the asymmetric region as figural. Rather, we found that a linear model worked well in capturing the results as the log odds of subjects' responses (of seeing the symmetric regions as figural) as a linear combination of both symmetry and area size. This linear combination suggests that the cues of symmetry and relative area combine in a manner consistent with “weak fusion” (Baek & Sajda, 2003; Clark & Yuille, 1990; Landy, Maloney, Johnston, & Young, 1995). It will be interesting in future work to test this hypothesis more directly. 
A methodological tool for measuring figure ground
Our results showed that geometric cues to f/g disambiguate the accretion-deletion cues in our displays (Figure 1), and biased the percept of 3-D rotation in proportion to their strength. These displays thus also provide a novel way of measuring f/g perception, since the percept of 3-D rotation tends to tracks the strength of the f/g geometric cue. Traditionally, f/g percepts have been measured simply by asking the subject which region they perceive as figural. Such direct explicit report tasks are, however, prone to individual biases, high-level interpretations, and extraperceptual factors (Struber & Stadler, 1999). Less prone to such factors was the visual short-term memory task proposed by Driver and Baylis (1996), based on the assumption that figural regions have a stronger memory trace than ground regions. Typically, a subject would be presented with an f/g display and, subsequently, with a test display, containing one of the regions from the f/g display. A subject would then be found to be faster at recognizing the region in the test display if it were part of the figural than ground regions. Even though this task was used by many researchers as the gold standard, it has been remarked that the test display itself might be influencing the short-term memory effect (Hulleman & Humphreys, 2004). Hulleman and Humphreys (2004) proposed yet another method based on visual search. In their displays, one region would be symmetric while others were not. If this symmetric region was also figural, subjects would more easily find it than when it was to be ground. However effective, this one symmetric region might confound the displays when another cue is to be measured. From a totally different tradition is the on-off method (Hoffman & Singh, 1997; Stevens & Brookes, 1988), which in the recent years has been applied to traditional alternating black-white regioned f/g stimuli (Peterson & Salvagio, 2008). Subjects are presented with a red dot on top of one of the regions and asked if it was on top of the figural region or not. The phenomenon in the current paper, as a methodology, overcomes many of the issues as present in the direct methods (explicit report, on-off method), because of the indirect nature of the question asked to subjects (“Which colored regions do you perceive as rotating?”). Such an indirect question was made possible by the perceptual coupling (Hochberg & Peterson, 1987) between the rotation percept and the f/g percept, present in our stimuli. Furthermore, the bistable nature of our stimulus makes it easy for the observer to report their observation. The usefulness of methods employing such stimuli has been pointed out before (e.g., Backus, 2009). Lastly, no second image need to be shown as a target (visual short-term memory method), nor does the geometry of the displays need to be changed (visual search method) that might influence our f/g perception. 
Conclusion
We presented a novel phenomenon, involving the interaction between depth from accretion-deletion and geometric (static) cues to f/g. In our displays, textured regions on both sides of a boundary were accreting and deleting, with the same speed but in opposite directions. This yielded an inherently ambiguous percept, which was disambiguated by geometric f/g cues. The resulting percept was surprising in that one set of regions (those more likely to be assigned as figural based on the geometric cues) was perceived as 3-D columns rotating in depth, while the ground regions were seen as translating, and amodally completing, behind them. The results showed that the likelihood of being perceived as rotating in 3-D was determined by the strength of the geometric f/g cues. On the methodological side, this phenomenon provides a novel way of measuring f/g perception, since the percept of 3-D rotation closely tracks the strength of the f/g cue. On the theoretical side, our results highlight the need for studying geometric f/g cues and depth-from-motion cues in tandem. The observed interactions between accretion deletion and geometric f/g cues cannot be explained by standard accounts of accretion deletion or structure from motion. 
Acknowledgments
This research was supported by NIH EY021494 to J. F. and M. S., and NSF DGE 0549115 (Rutgers IGERT in Perceptual Science). We are grateful to John Wilder, Ö. Daglar Tanrıkulu, and two anonymous reviewers for their many helpful comments. We thank Lorilei Alley for her help at various stages of this study. 
Commercial relationships: none. 
Corresponding Author: Vicky Froyen. 
Email: vickyf@rutgers.edu. 
Address: Department of Psychology, Center for Cognitive Science, Rutgers University, New Brunswick, NJ, USA. 
References
Andersen R. Bradley D. (1998). Perception of three-dimensional structure from motion. Trends in Cognitive Sciences, 2 (6), 222–228. [CrossRef] [PubMed]
Backus B. T. (2009). The mixture of bernoulli experts: A theory to quantify reliance on cues in dichotomous perceptual decisions. Journal of Vision, 9 (1): 6, 1–19, http://www.journalofvision.org/content/9/1/6, doi:10.1167/9.1.6. [PubMed] [CrossRef]
Baek K. Sajda P. (2003 October). A probabilistic network model for integrating visual cues and inferring intermediate-level representations. Paper presented at the Third International Workshop on Statistical and Computational Theories of Vision, Nice, France. (pp. 3–10).
Bahnsen P. (1928). Eine untersuchung uber symmetrie und assymmetrie bei visuellen wahrnehmungen [A study about symmetry and asymmetry in visual percepts]. Zeitschrift für Psychology, 108, 129–154.
Barenholtz E. Feldman J. (2006). Determination of visual figure and ground in dynamically deforming shapes. Cognition, 101 (3), 530–544. [CrossRef] [PubMed]
Barnes T. Mingolla E. (2013). A neural model of visual figure-ground segregation from kinetic occlusion. Neural Networks, 37, 141–164. [CrossRef] [PubMed]
Baylis G. Driver J. (1995). One-sided edge assignment in vision: 1. figure-ground segmentation and attention to objects. Current Directions in Psychological Science, 4 (5), 140–146. [CrossRef]
Beck C. Ognibeni T. Neumann H. (2008). Object segmentation from motion discontinuities and temporal occlusions—A biologically inspired model. PLoS ONE, 3 (11), e3807. [CrossRef] [PubMed]
Berzhanskaya J. Grossberg S. Mingolla E. (2007). Laminar cortical dynamics of visual form and motion interactions during coherent object motion perception. Spatial Vision, 20 (4), 337–395. [CrossRef] [PubMed]
Brainard D. H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436. [CrossRef] [PubMed]
Braunstein M. (1962). Depth perception in rotating dot patterns: Effects of numerosity and perspective. Journal of Experimental Psychology, 64 (4), 415–420. [CrossRef] [PubMed]
Clark J. Yuille A. (1990). Data fusion for sensory information processing systems (Vol. 105). Boston, MA: Kluwer Academic Publishers.
Driver J. Baylis G. (1996). Edge-assignment and figure-ground segmentation in short-term visual matching. Cognitive psychology, 31 (3), 248–306. [CrossRef] [PubMed]
Feldman J. Singh M. (2006). Bayesian estimation of the shape skeleton. Proceedings of the National Academy of Sciences, 103, 18014–18019. [CrossRef]
Froyen V. Feldman J. Singh M. (2010). A Bayesian framework for figure-ground interpretation. In Lafferty J. Williams C. K. I. Shawe-Taylor J. Zemel R. Culotta A. (Eds.), Advances in neural information processing systems (Vol. 3, pp. 631–639). Vancouver, British Columbia, Canada: Curran Associates.
Froyen V. Tanrıkulu O. D. Singh M. Feldman J. (in press). Stereoslant: A novel method for measuring figure-ground assignment. Manuscript submitted for publication.
Gibson J. Kaplan G. Reynolds H. Wheeler K. (1969). The change from visible to invisible. Attention, Perception, & Psychophysics, 5 (2), 113–116. [CrossRef]
Hegdé J. Albright T. Stoner G. (2004). Second-order motion conveys depth-order information. Journal of Vision, 4 (10): 1, 838–842, http://www.journalofvision.org/content/4/10/1, doi:10.1167/4.10.1. [PubMed] [CrossRef] [PubMed]
Hildreth E. Royden C. (2011). Integrating multiple cues to depth order at object boundaries. Attention, Perception, & Psychophysics, 73 (7), 1–18.
Hochberg J. Peterson M. A. (1987). Piecemeal organization and cognitive components in object perception: Perceptually coupled responses to moving objects. Journal of Experimental Psychology: General, 116 (4), 370. [CrossRef] [PubMed]
Hoffman D. D. Singh M. (1997). Salience of visual parts. Cognition, 63 (1), 29–78. [CrossRef] [PubMed]
Hogervorst M. Eagle R. (1998). Biases in three-dimensional structure-from-motion arise from noise in the early visual system. Proceedings of the Royal Society of London. Series B: Biological Sciences, 265 (1406), 1587–1593. [CrossRef]
Hulleman J. Humphreys G. W. (2004). A new cue to figure-ground coding: Top-bottom polarity. Vision Research, 44 (24), 2779–2791. [CrossRef] [PubMed]
Kanizsa G. Gerbino W. (1976). Vision and artifact. In Henle M. (Ed.), Convexity and symmetry in figure-ground organization (pp. 25–32). New York, NY: Springer.
Kaplan G. (1969). Kinetic disruption of optical texture: The perception of depth at an edge. Attention, Perception, & Psychophysics, 6 (4), 193–198. [CrossRef]
Kleiner M. Brainard D. Pelli D. Ingling A. Murray R. Broussard C. (2007). What is new in psychtoolbox 3. Perception, 36 (14 ECVP Abstract Suppl.).
Koffka K . (1935). Principles of gestalt psychology. London, UK: Lund Humphries.
Landy M. Maloney L. Johnston E. Young M. (1995). Measurement and modeling of depth cue combination: In defense of weak fusion. Vision Research, 35 (3), 389–412. [CrossRef] [PubMed]
Machilsen B. Pauwels M. Wagemans J. (2009). The role of vertical mirror-symmetry in visual shape detection. Journal of Vision, 9 (12): 11, 1–11, http://www.journalofvision.org/content/9/12/11, doi:10.1167/9.12.11. [PubMed] [CrossRef] [PubMed]
Metzger F. (1953). Gesetze des sehens [Laws of seeing]. Frankfurt-am-Main, Germany: Waldemar Kramer.
Michotte A. Thinès G. Crabbé G. (1964). Les complements amodaux des structures perceptives [Amodal completion of perceptual structures]. Louvain, Belgium: Publications Universitaires de Louvain.
Mutch K. Thompson W. (1985). Analysis of accretion and deletion at boundaries in dynamic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7 (2), 133–138. [CrossRef] [PubMed]
Ono H. Rogers B. Ohmi M. Ono M. (1988). Dynamic occlusion and motion parallax in depth perception. Perception, 17 (2), 255–266. [CrossRef] [PubMed]
Peterson M. A. Gibson B. S. (1994). Must figure-ground organization precede object recognition? An assumption in peril. Psychological Science, 5 (5), 253–259. [CrossRef]
Peterson M. A. Salvagio E. (2008). Inhibitory competition in figure-ground perception: Context and convexity. Journal of Vision, 8 (16): 4, 1–13, http://www.journalofvision.org/content/8/16/4, doi:10.1167/8.16.4. [PubMed] [CrossRef] [PubMed]
Raudies F. Neumann H. (2010). A neural model of the temporal dynamics of figure–ground segregation in motion perception. Neural Networks, 23 (2), 160–176. [CrossRef] [PubMed]
Rubin E. (1921). Visuell wahrgenommene figuren: studien in psychologischer analyse [Visually perceived figures: studies in psychological analysis]. Kobenhaven, Denmark: Glydenalske Boghandel.
Stevens K. A. Brookes A. (1988). The concave cusp as a determiner of figure-ground. Perception, 17 (1), 35–42. [CrossRef] [PubMed]
Struber D. Stadler M. (1999). Differences in top-down influences on the reversal rate of different categories of reversible figures. Perception, 28 (10), 1185–1196. [CrossRef] [PubMed]
Thompson W. Mutch K. Berzins V. (1985). Dynamic occlusion analysis in optical flow fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7 (4), 374–383. [CrossRef] [PubMed]
Wagemans J. Elder J. H. Kubovy M. Palmer S. E. Peterson M. A. Singh M. (2012). A century of gestalt psychology in visual perception: I. perceptual grouping and figure-ground organization. Psychological Bulletin, 138 (6), 1172–1217. [CrossRef] [PubMed]
Yonas A. Craton L. Thompson W. (1987). Relative motion: Kinetic information for the order of depth at an edge. Attention, Perception, & Psychophysics, 41 (1), 53–59. [CrossRef]
Footnotes
1  β1 = log{[p(rot = 1|R = x)p(rot = 0|R = X + 1)]/[p(rot = 1|R = X + 1)p(rot = 0|R = X)]}, where rot = “symmetric regions are seen as rotating,” and R = log(area ratio).
Figure 1
 
Display setup and phenomenology: (A) The displays were created by adding motion in one direction to odd regions and in the other direction to even regions in classical figure-ground displays. (B) This could yield one of two percepts depending on which one was perceived as figural. The black ones were perceived as rotating in front of a white background which was seen as sliding behind them, or vice versa.
Figure 1
 
Display setup and phenomenology: (A) The displays were created by adding motion in one direction to odd regions and in the other direction to even regions in classical figure-ground displays. (B) This could yield one of two percepts depending on which one was perceived as figural. The black ones were perceived as rotating in front of a white background which was seen as sliding behind them, or vice versa.
Figure 3
 
Results for Experiment 1 indicating the proportion of trials the subjects reported seeing regions containing one of the three geometric f/g cues as rotating. For the unbiased condition, data was reported as the proportion of times the subjects reported seeing the odd regions as rotating. Error bars represent ±1 SE as computed between subjects. Conditions significantly different from 0.5 are indicated by *. The red line corresponds to chance level (0.5).
Figure 3
 
Results for Experiment 1 indicating the proportion of trials the subjects reported seeing regions containing one of the three geometric f/g cues as rotating. For the unbiased condition, data was reported as the proportion of times the subjects reported seeing the odd regions as rotating. Error bars represent ±1 SE as computed between subjects. Conditions significantly different from 0.5 are indicated by *. The red line corresponds to chance level (0.5).
Figure 4
 
Stimuli for Experiment 2: Five manipulations of the relative area of the symmetric regions that were implemented. Values indicate the logarithm of the ratio of the width of the symmetric region (arcmin) over the width of the asymmetric region (arcmin).
Figure 4
 
Stimuli for Experiment 2: Five manipulations of the relative area of the symmetric regions that were implemented. Values indicate the logarithm of the ratio of the width of the symmetric region (arcmin) over the width of the asymmetric region (arcmin).
Figure 5
 
Results for Experiment 2: Each data point (horizontal jitter was added to these for presentation purposes) depicts, for an individual subject, the proportion of trials on which they saw the symmetric regions as rotating. The blue line depicts the logistic regression model, with the gray shading its 95% confidence interval. The red line corresponds to chance level (0.5).
Figure 5
 
Results for Experiment 2: Each data point (horizontal jitter was added to these for presentation purposes) depicts, for an individual subject, the proportion of trials on which they saw the symmetric regions as rotating. The blue line depicts the logistic regression model, with the gray shading its 95% confidence interval. The red line corresponds to chance level (0.5).
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×