Free
Article  |   April 2011
A Bayesian model of binocular perception of 3D mirror symmetrical polyhedra
Author Affiliations
Journal of Vision April 2011, Vol.11, 11. doi:https://doi.org/10.1167/11.4.11
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Yunfeng Li, Tadamasa Sawada, Yun Shi, TaeKyu Kwon, Zygmunt Pizlo; A Bayesian model of binocular perception of 3D mirror symmetrical polyhedra. Journal of Vision 2011;11(4):11. https://doi.org/10.1167/11.4.11.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

In our previous studies, we showed that monocular perception of 3D shapes is based on a priori constraints, such as 3D symmetry and 3D compactness. The present study addresses the nature of perceptual mechanisms underlying binocular perception of 3D shapes. First, we demonstrate that binocular performance is systematically better than monocular performance, and it is close to perfect in the case of three out of four subjects. Veridical shape perception cannot be explained by conventional binocular models, in which shape was derived from depth intervals. In our new model, we use ordinal depth of points in a 3D shape provided by stereoacuity and combine it with monocular shape constraints by means of Bayesian inference. The stereoacuity threshold used by the model was estimated for each subject. This model can account for binocular shape performance of all four subjects. It can also explain the fact that when viewing distance increases, the binocular percept gradually reduces to the monocular one, which implies that monocular percept of a 3D shape is a special case of the binocular percept.

Introduction
Because human eyes are about 6.5 cm apart, the two eyes see a 3D scene from slightly different positions—thus, the two retinal images of the scene are different. This difference is called binocular disparity. Suppose F is the fixation point and A is another point near F. If the distance d of F from the viewer is much greater than the depth interval Δd between A and F, binocular disparity satisfies the following formula (Howard & Rogers, 2002): 
δ I Δ d d 2 ,
(1)
where I is the interocular distance. The above equation implies that the relation between binocular disparity and relative depth is linear—the bigger the binocular disparity, the bigger the relative depth. If the interocular distance I and the fixation distance d are known, the depth interval between A and F can be computed from their binocular disparity. Once the depth intervals between pairs of points in a 3D object are computed, the visible parts of a 3D shape can, in principle, be reconstructed (see Frisby & Stone, 2010, for a very accessible review). 
Julesz (1971) demonstrated, using random-dot stereograms, that 3D spatial relations can be perceived based on binocular disparity only. Although Julesz's study provides a possible mechanism of human 3D shape perception, the status of binocular disparity as a significant factor underlying 3D shape perception is uncertain because real objects in our natural environment are well structured and they do not resemble random dots. Prior studies (e.g., Johnston, 1991; Norman, Todd, Perotti, & Tittle, 1996; Todd & Norman, 2003) showed that when binocular disparity is the only cue providing information about 3D spatial relations, the 3D shape percept is neither accurate nor precise. Consider Johnston's (1991) experiment, which is representative for this group of studies. She asked the subject to view a random-dot stereogram of an elliptical cylinder and adjust its depth so that it is perceived as circular. The viewing direction was orthogonal to the axis of the cylinder. She found that when the viewing distance was greater than 1 m, subjects systematically underestimated the depth, and as a result, a cylinder that was stretched in depth, compared to a circular cylinder, was perceived as circular. The converse was true for distances less than 1 m. The systematic error was up to a factor of 2. Quite different results were reported by Frisby, Buckley, and Duke (1996). In six experiments, they demonstrated veridical binocular performance in judging lengths of real objects under natural viewing conditions. The authors concluded that inaccurate and unreliable binocular performance reported in the literature might be attributable to unnatural stimuli and viewing conditions. In the last three experiments, they directly compared monocular and binocular performance and showed that monocular performance was almost as good as binocular performance. This result clearly suggests that the subject applied some very effective monocular priors to produce nearly veridical monocular and binocular judgments. Unfortunately, the authors did not explore the nature of these priors. 
When more than one cue is present, the visual system is able to combine the cues. The issue of combination of cues (binocular disparity and texture) was explored in several studies (Frisby, Buckley, & Freeman, 1996; Hillis, Watt, Landy, & Banks, 2004; Knill & Saunders, 2003). Combination of cues received attention after Bayesian models had been adopted as a standard tool in explaining perception (Knill & Richards, 1996; Pizlo, 2001). It seems that the main motivation for using Bayesian models was the need to explain the result of combination of visual cues and priors, such as smoothness of a surface, or rigidity and symmetry of a 3D object. However, once Bayesian models have been adopted, they also proved to be useful in modeling combination of visual cues even in the case of uninformative priors (Frisby & Stone, 2010). If two cues provide information about the same perceptual feature (e.g., slant of a plane), it is possible to derive an optimal way of combining these two sources of information. Assume that, on average, each cue provides accurate information about the slant of a surface (i.e., the expected value is equal to the actual slant). Next, assume that the reliabilities of these two cues are different. Specifically, the variances of the probability density functions representing perceptual measurements are σ 1 2 and σ 2 2. It follows that an optimal way to combine these two cues is to compute a weighted average, where the weights are inversely proportional to the variances characterizing the two cues (e.g., Hillis et al., 2004). Empirical evidence provided by Hillis et al. (2004) and Knill and Saunders (2003) indicated that the visual system uses this optimal cue combination. It is important to point out, however, that if the expected values of the two cues are not accurate, then this cue combination may or may not be optimal. 
Why should a visual cue not be accurate? Consider some examples. Line length on the frontal plane is perceived accurately (on average). As a result, the subject can easily match the lengths of two line segments (except for some classical illusions, such as horizontal–vertical illusion). However, a line length in a 3D space may not be perceived accurately and the systematic errors can be large. The main difference between these two cases is that the former represents perceptual measurement (solution of a forward or direct problem), whereas the latter represents perceptual interpretation (solution of an inverse problem). Forward problems are computationally easy (well posed and well conditioned), but inverse problems are computationally difficult (ill posed and/or ill conditioned). Before an inverse problem is solved, it must be converted to a well-posed problem by imposing constraints (priors) on the family of possible solutions. For example, in the case of surface perception, smoothness of a surface is a prior used by the visual system (Marr & Poggio, 1976). Effective priors are not always available. In such cases, the resulting perceptual interpretation may be strongly biased and unreliable. If two cues are combined, a cue that has larger bias and larger variance should contribute less (if anything). This fact leads to a somewhat unexpected conclusion that using more sensory data is not always equivalent to a more accurate and reliable percept. We will come back to this issue at the end of the Model section. The problem of an optimal combination of cues that are biased has not received sufficient attention in the literature. 
There is one type of binocular judgment, which is always reliable. Namely, observers can make very accurate judgments about the order of points in depth (Blakemore, 1970; Westheimer, 1979). In fact, this judgment can be done with a precision that is one order of magnitude better than the distance between receptors on the retina (Westheimer & McKee, 1980). Because of this high precision, this judgment, which is called stereoacuity, belongs to the family of hyperacuities. Stereoacuity received less attention by students of shape perception because it was commonly assumed that this cue cannot lead to a metric percept of 3D shape. However, even if metric properties are not perceived, some useful aspects of shape (relief structure) can be reconstructed (Gårding, Porrill, Mayhew, & Frisby, 1995; Koenderink, van Doorn, & Kappers, 2006). 
These prior studies show that a human subject is quite accurate at judging the depth order of two points but rather inaccurate at reconstructing their depth interval. Can binocular stereoacuity, itself, lead to a reliable 3D metric shape perception? The answer is “no.” Suppose a 3D object is uniformly stretched or compressed along the depth direction. The depth order of the points in the object will not change, even though the Euclidean shape of the object will change substantially. It follows that stereoacuity will not be able to detect this change of shape. 
Thus, a question arises about the role of binocular vision in reliable 3D (metric) shape perception in our everyday life. Surely, binocular 3D shape perception should be at least as reliable as monocular 3D shape perception. We have recently shown that monocular perception of 3D symmetrical shapes is quite reliable, at least when the slant of the symmetry plane is not too close to 0 or 90 deg (Li, Pizlo, & Steinman, 2009; Sawada, 2010). This performance of the subjects was very close to the performance of our computational model that recovers a 3D shape from a single 2D image by applying simplicity constraints, such as 3D symmetry and 3D compactness. Once we realize that 3D symmetry is essential in 3D shape perception, it becomes clear how stereoacuity could improve 3D shape recovery. Consider binocular viewing of a mirror symmetric 3D object from a non-degenerate view (i.e., when the slant of the symmetry plane is not 0 or 90 deg). If the 3D object is stretched or compressed along the depth direction, its symmetry will be destroyed. Conversely, if the 3D object is stretched or compressed along a direction parallel or orthogonal to its symmetry plane, the symmetry will be preserved, but the depth order of the object's points will change. This interaction (which does not seem to have been noticed before) between the symmetry of a 3D object and the order of the object's points along the depth direction provides a powerful tool in binocular 3D shape perception. The role of this interaction in human and model's 3D shape recovery is the subject of this paper. 
Note that the presence of such an interaction was implied by results of our previous binocular shape constancy experiments (Chan, Stevenson, Li, & Pizlo, 2006; Li & Pizlo, 2005, accepted for publication). In these experiments, we measured shape constancy performance for several types of stimuli that differed with respect to the number and degree of geometrical and topological constraints. In the “polygonal line” condition, binocular performance was at chance level, which means that adding binocular disparity cue provided no benefit. When the stimuli were more “structured,” there was a reliable benefit of adding binocular disparity. In particular, when objects were symmetrical and their faces were planar, the subject could easily tell whether two 3D shapes seen from different viewing orientations were the same or different and binocular performance was always better than the monocular one. These results suggested that symmetry and planarity constraints are critical in both monocular and binocular 3D shape perception and that information about binocular disparity is used at the end of shape processing (see the model of Chan et al., 2006). The results and model presented here provide a more precise explanation of the relation between monocular shape priors and binocular disparity. Specifically, we present here a new computational model in which binocular 3D shape recovery critically depends on the symmetry constraint and on the depth order information. When depth order information is weak or absent, the binocular performance of the model reduces to monocular performance. In order to verify psychological plausibility of the new model, we performed new experiments on human 3D shape recovery. Note that the shape constancy results that have already been published (Chan et al., 2006; Li & Pizlo, 2005, accepted for publication) cannot be directly used to test the new model because shape constancy involves two functions: (i) 3D shape recovery and (ii) 3D shape matching. It follows that a direct test of the new model requires shape recovery experiment. 
Li et al. (2009) have recently developed a computational model that applied four simplicity constraints, namely, 3D symmetry, planarity of contours, maximum 3D compactness, and minimum surface area, to recover a 3D shape from its single 2D orthographic image. The recovery was close to subjects' percept produced by the image. This model could not only recover the artificial 3D shapes, like a polyhedron or a parallelepiped (Li, 2009a), but could also recover real 3D shapes, like animal bodies (Li & Pizlo, 2008; Pizlo, Sawada, Li, Kropatsch, & Steinman, 2010; Sawada, 2010). 
In this study, we performed two psychophysical experiments. In Experiment 1, we measured subjects' performance in binocular and monocular 3D shape recovery tasks. In Experiment 2, we measured subjects' stereoacuity. A Bayesian model that combines monocular 3D shape prior and stereoacuity information was developed and used to account for subjects' binocular 3D shape perception. We compared the model's performance to that of the subjects and showed very close similarity, which suggests that our binocular model is psychologically plausible. 
Psychophysics
Experiment 1: Monocular and binocular 3D shape recovery
Subject
Four subjects, including three authors (TK, TS, and YS) participated in this experiment. All subjects had normal or corrected-to-normal vision. JC was naive about the purpose of the experiment. 
Stimuli
Random abstract 3D symmetrical polyhedra like those in Figure 1 were generated. Abstract shapes, rather than shapes of common objects, like chairs, couches, or animal bodies, were used to avoid familiarity confounds (Chan et al., 2006; Pizlo & Stevenson, 1999). The polyhedra were random, subject to the following constraints. Every polyhedron was mirror symmetrical and it had 16 vertices. The “front” part of the polyhedron was a box that was smaller than the box in the “back” and these boxes had a pair of coplanar faces. These coplanar faces were orthogonal to the symmetry plane. The height and width of the polyhedron varied over the range of 1 to 5. The width of a polyhedron was defined as the thickness along the normal of its symmetry plane and the height was defined as the thickness along the normal of the coplanar faces of the two boxes (see Figure 1b). The overall size of the generated polyhedron was set so that the maximum length along X, Y, or Z axis was 10 cm. The directions of X, Y, and Z were defined relative to the observer: Z represents the direction in depth, the X-axis is horizontal, and the Y-axis is vertical. 
Figure 1
 
An illustration of the polyhedra used for Experiment 1. The width of the shapes in (a) and (c) (compared to (b)) is 1/3 and 3, respectively.
Figure 1
 
An illustration of the polyhedra used for Experiment 1. The width of the shapes in (a) and (c) (compared to (b)) is 1/3 and 3, respectively.
The 3D stimuli were generated with OpenGL and displayed binocularly by using LCD shutter glasses (RealD Crystal Eyes 3). Two slightly different images (stereoscopic images) for the subject's left and right eyes were computed according to the subject's interocular distance. The subject viewed these images through shutter glasses that were synchronized with a CRT monitor (SUN Microsystems GDM5510) so that each eye received the image designed for this eye, only. The refresh rate was 100 Hz. Thus, the image for each eye was updated at the rate of 50 Hz. The size of the computer screen was 40 cm by 30 cm and the resolution was 1280 pixels by 1024 pixels. The simulated viewing distance (i.e., the distance between the subject's eye and the center of the simulated polyhedron) and the distance between the subject and the monitor were the same and equal to 50 cm. 
Procedure
The subject viewed the images in a dark room. The subject's head was supported by a chin–forehead rest. Two symmetrical polyhedra were presented side by side and the separation between them was 13.3 cm (see Figure 2). They were shown at the eye level. A stationary reference polyhedron was shown on the left binocularly or monocularly. In the monocular viewing, when the left shutter glass was transparent, the image for the left eye was presented. When the right shutter glass was transparent, no image was presented and the screen was black. 
Figure 2
 
The illustration of the experimental setup. Two polyhedral shapes were presented side by side and the separation was 13.3 cm. The simulated viewing distance was 50 cm for both shapes.
Figure 2
 
The illustration of the experimental setup. Two polyhedral shapes were presented side by side and the separation was 13.3 cm. The simulated viewing distance was 50 cm for both shapes.
A rotating test polyhedron was shown on the right. This polyhedron was always viewed monocularly. Viewing a rotating 3D polyhedron allowed many of its different views to be seen in a short amount of time. The test polyhedron was identical to the reference polyhedron except for the 3D aspect ratio and the subject's task was to adjust the 3D aspect ratio of the test stimulus so that it matched the 3D aspect ratio of the reference stimulus. The test polyhedra formed a one-parameter family of 3D shapes that were all consistent with the 2D cyclopean image produced by the reference polyhedron. Cyclopean eye is an abstract concept referring to the information available if the camera were placed at the midpoint between the two eyes. This image can be estimated by averaging the positions (or angles) of the left and right retinal images (Banks, Van Ee, & Backus, 1997; see also Ono, 1979, 1981 for historical references to Wells and to Hering). 
As explained in 1, a single 2D orthographic image of a 3D mirror symmetrical shape determines this shape up to only one unknown parameter, its aspect ratio. It follows that if the subject perceives the 3D polyhedron as symmetrical, the only uncertainty is related to its 3D aspect ratio. 1 This experiment tested how this ambiguity is resolved under monocular and binocular viewing conditions. Specifically, we examine the nature and the role of monocular shape priors and of binocular disparity (stereoacuity). 
The 3D test polyhedron was rotated in such a way that none of its 2D images was identical with the image(s) of the reference polyhedron. This was accomplished in the following way. The 3D orientation of the test polyhedron, relative to the reference polyhedron, was first changed by 45 deg around the X-axis. The resulting polyhedron was then rotated around the Y-axis (at the speed of 80 degrees/s). This guaranteed that none of the 2D images of the rotating test polyhedron were identical to the images of the reference polyhedron (i.e., the viewing directions of the reference and test polyhedra were different). This minimized the possibility that the subject used 2D features in producing the response. To further minimize the effect of 2D artifactual cues, the average size of the test polyhedron was set at 70% of that of the reference polyhedron. As a result, the subject was faced with a typical shape constancy set up, where the task is to determine the identity of 3D shapes from different 3D viewing directions. 
The subject used the mouse to adjust the aspect ratio of the test polyhedron. At the start of each trial, the aspect ratio was set randomly. There was no time limit for the adjustment. The aspect ratio of the test polyhedron was used as a measure of the perceived shape of the reference polyhedron. If the adjusted aspect ratio of the test polyhedron equals the aspect ratio of the reference polyhedron, we can conclude that the subject perceived (recovered) the 3D shape of the reference polyhedron accurately. 
The 3D orientation of the reference polyhedron randomly varied from trial to trial. This orientation was characterized by the slant of the symmetry plane. Five slants of the symmetry plane were used: 15, 30, 45, 60, and 75 degrees. Each slant was used 20 times for a total of 100 trials in each of the two viewing conditions (monocular vs. binocular). On average, each session took about 40 min. Each session was run in two blocks of 50 trials. 
Results
For each trial, we computed the dissimilarity ϖ between the test (adjusted) 3D shape and the reference 3D shape (see 2 for details of this dissimilarity measure). The dissimilarity between these two shapes is defined as the difference between the logarithms of their aspect ratios (the logarithms are to the base of 2). The aspect ratio of a 3D shape is computed by using its thickness along two directions. As explained in 1, a single 2D orthographic image of a 3D symmetrical shape determines this shape up to only one unknown parameter, its 3D aspect ratio. More specifically, all shapes within this one-parameter family differ from one another be stretching or compressing along two orthogonal directions. One of these two directions is the direction of the normal of the symmetry plane. The other direction is parallel to the symmetry plane and it depends on the 3D orientation of the shape relative to the camera (observer's eye). So, it is natural to measure the 3D shape by the aspect ratio involving these two variable thicknesses (note that the thickness along the normal of the symmetry plane is consistent with one of the thicknesses used to randomize the dimensions of the stimuli—see Figure 1). Zero dissimilarity means that the two 3D shapes are identical. This dissimilarity measure is better than the one we used in our previous study (Li et al., 2009) because it evaluates maximal shape differences rather than shape difference along two, predetermined directions. Note that if the dissimilarity ϖ is equal to x, the aspect ratio of the test 3D shape is 2 x larger than that of the reference 3D shape (refer to 2 for details). 
The subjects' performance as a function of slant of the reference shape is shown on the left in Figure 3. In the monocular viewing, results of all four subjects are very similar. Specifically, when the slant of the symmetry plane was 45 deg, the error was close to zero. For smaller slants, the error was negative, and for larger slants, the error was positive. We call this relation the slant effect. Error of “−1” means that the aspect ratio of the test (adjusted) shape was different from the aspect ratio of the reference shape by a factor of 2, and the test shape was overall thinner along the direction normal to the symmetry plane. The error of “+1” means that the aspect ratio of the test shape was different from the aspect ratio of the reference shape by a factor of 2, and the test shape was overall thicker along the direction normal to the symmetry plane. Note that both the negative errors for small slants and the positive errors for large slants mean that the test 3D shape was compressed along the direction close to the line of sight, compared to the reference shape. The compression was not exactly along the line of sight because the symmetry of the 3D shape precludes such compressions. Overall, the errors for slants that are not too close to the degenerate views (30, 45, and 60 deg) are fairly small. This is similar to what we observed in our previous study (Li et al., 2009). This fact is important because it represents good 3D shape perception in the absence of any of the conventional depth cues such as binocular disparity, motion parallax, shading, or texture. In the monocular viewing condition, the subject was presented with a line drawing representing a single 2D perspective image of the 3D polyhedron. In the absence of depth cues, the visual system of the subject had to use simplicity constraints, such as symmetry and compactness (see the Model section). 
Figure 3
 
The dissimilarity between subjects' adjusted 3D shapes and the reference 3D shapes. The figures on the left (a, c, e, and g) show individual subjects' performance for the two viewing conditions as a function of slant. The figures on the right (b, d, f, and h) show the simulated performance by our model (see Model section).
Figure 3
 
The dissimilarity between subjects' adjusted 3D shapes and the reference 3D shapes. The figures on the left (a, c, e, and g) show individual subjects' performance for the two viewing conditions as a function of slant. The figures on the right (b, d, f, and h) show the simulated performance by our model (see Model section).
Next, consider binocular performance. Two subjects (JC, YS) showed essentially perfect performance. Both the average errors and variability were close to zero for all five slants. Performance of TS was also close to perfect for three of the five slants. So, we can say that three of the four subjects have essentially veridical binocular perception of 3D symmetrical shapes. 2 This result contrasts sharply with most of the previous studies (but see Norman et al., 2006). Most of the previous studies either used completely asymmetrical 3D shapes, or when the shapes were symmetrical, they were simple and always presented from degenerate viewing directions. Clearly, when the 3D shape is complex and symmetrical and the viewing directions are not degenerate (the slant of the symmetry plane is not 0 or 90 deg), binocular performance is extremely good. 3 Binocular performance of the fourth subject (TK) was different. He did not show what we call “binocular enhancement.” His binocular performance was only slightly better than his monocular performance. Can the individual differences in binocular 3D shape perception be explained by differences in stereoacuity of the subjects? Before we present results on judging depth order of points, we will briefly discuss results of a control experiment in which two subjects were tested in shape recovery experiment from a larger viewing distance. Binocular information deteriorates with distance, so we expected that the errors in binocular 3D shape will be larger. 
Control experiment: Larger viewing distance
Two subjects TS and YS who participated in Experiment 1 were tested. The simulated 3D shapes for the stimuli were generated the same way as in Experiment 1. The simulated viewing distance and the distance between the subject and the monitor were 200 cm. As a result, the angular size of the stimuli became about 1/4 and the binocular disparity of each vertex became about 1/16 of those in Experiment 1. Other than that, the procedure and the apparatus were the same as those in Experiment 1
Results
The graphs on the left in Figure 4 show binocular performance of the two subjects. There was no reason to expect any substantial differences in monocular viewing, when the distance changed from 50 to 200 cm. Indeed, monocular performance was very similar to that in the main experiment. Therefore, we do not show monocular curves on these graphs. Instead, we replotted the binocular curves of these two subjects from the main experiment. This allows a direct comparison of binocular performance at these two viewing distances. It can be seen that the binocular performance deteriorated somewhat. 4 This makes sense because the effectiveness of binocular information is lower at larger viewing distances. Now, the slant effect can be visible in both monocular and binocular viewing. As a result, errors in monocular and binocular viewing are correlated. The correlation coefficient is 0.99 for both subjects (p < 0.01). 
Figure 4
 
The dissimilarity between the adjusted 3D shapes and the reference 3D shapes when the 3D shapes were viewed from the distance of 200 cm. Superimposed are graphs from the main experiment showing binocular performance at the 50-cm viewing distance. The two graphs (a and c) on the left show the subjects' performance and (b) and (d) show the performance of our model (see Model section).
Figure 4
 
The dissimilarity between the adjusted 3D shapes and the reference 3D shapes when the 3D shapes were viewed from the distance of 200 cm. Superimposed are graphs from the main experiment showing binocular performance at the 50-cm viewing distance. The two graphs (a and c) on the left show the subjects' performance and (b) and (d) show the performance of our model (see Model section).
Experiment 2: Stereoacuity test
Subjects
The same four subjects who participated in Experiment 1 were tested. 
Stimuli
Polyhedra like those in Experiment 1 were generated, and on each polyhedron, two points were selected for the stereoacuity test. We generated the polyhedra and selected the points using the following procedure. A random polyhedron was generated first and its 3D orientation was chosen randomly. Next, its 2D orthographic image was computed. Recall that a single 2D orthographic image of a 3D symmetrical shape determines this shape up to one free parameter, the aspect ratio of the polyhedron. When the aspect ratio changes, so does the slant of the symmetry plane, as well as the relative depths and depth orders of a number of its vertices. For some pairs of vertices, the depth order remains the same for all 3D shapes in the one-parameter family. For such pairs of vertices, the subject (and the model) would be able to judge the depth order based on a single 2D image; binocular disparity would not be needed. Because we were interested in testing the reliability of the binocular system, it was necessary to avoid trials in which monocular information could do the job. In a sense, one can say that such trials contained 2D artifactual cues. Among the remaining pairs of vertices, the relative depth changed to various degrees when the slant of the symmetry plane changed. We considered only those pairs whose relative distance in depth varied by at least 7.8 cm. There was usually more than one pair satisfying this requirement. In such cases, we used the pair that needed the smallest change of slant of symmetry plane (and smallest change of aspect ratio) as the relative depth changed between −3.9 cm and 3.9 cm. By doing this, we tried to minimize the role of 2D artifactual cues. Once the pair of vertices was selected, the 3D polyhedron was rotated around the line of sight (Z-axis) so that the two points were on the same horizontal level. The average distance between them was 10 cm (about 3 deg). 
There were 8 depth intervals between the two points, spanning the range in depth of 7.8 cm (for TS, we used a smaller range because the pilot study showed that his stereoacuity threshold was much lower than that of the others). Positive depth interval meant that the left dot was closer to the subject. The two points whose depth order was judged were indicated by placing red dots whose radius was about 0.15 cm (see Figure 5). The stereoscopic images of the polyhedra were generated by OpenGL according to the subject's interocular distance and were presented at the center of a computer screen. 
Figure 5
 
Stereoscopic images (crossed fusion) from the “polyhedron” condition in Experiment 2.
Figure 5
 
Stereoscopic images (crossed fusion) from the “polyhedron” condition in Experiment 2.
Procedure
The subject viewed the stereoscopic images on an LCD monitor (Samsung Sync Master 2233) in a dark room through LCD shutter glasses (NVIDIA GeForce 3D Vision Kit). The refresh rate was 100 Hz. The size of the computer screen was 47.5 cm by 29.7 cm and the resolution was 1680 pixels by 1050 pixels. The viewing distance was 200 cm. The subject's head was supported by a chin–forehead rest. Method of constant stimuli was used. In each trial, the subject was asked to judge which dot was closer. An auditory feedback was provided after an incorrect response. 
There were two conditions. In the first condition, the polyhedron was not shown. As a result, the subject saw only the two dots presented side by side. In the second condition, both the polyhedron and the dots were shown. We called the former dot condition and the latter polyhedron condition. There were 400 trials in a session, 50 trials per depth interval. Before collecting experimental results, each subject ran several practice sessions. 
Results
Probit analysis was performed to estimate the stereoacuity thresholds. We used the standard deviation of the psychometric function as a measure of the threshold. Small threshold means good performance. The thresholds in the two conditions are shown in Figure 6. Recall that the dot condition refers to the conventional stereoacuity test where the subject is shown two dots. The polyhedron condition refers to the stereoacuity test in which the subject judged the depth order of two vertices of a polyhedron. First note that the thresholds vary across the subjects. In the dot condition, TS's threshold was smallest (18.45 seconds of arc) and TK's was largest (101.70 seconds of arc). This degree of individual variability is not uncommon (Ogle, 1950). 
Figure 6
 
Stereoacity thresholds (standard deviations of the psychometric functions) in Experiment 2. Note that the threshold of TK in the polyhedron condition was very large. The standard error (not shown in the graph) for this threshold was about 400 seconds of arc. Poor estimation, in this case, was due to the fact that we measured only a small fraction of a psychometric function, which was very shallow.
Figure 6
 
Stereoacity thresholds (standard deviations of the psychometric functions) in Experiment 2. Note that the threshold of TK in the polyhedron condition was very large. The standard error (not shown in the graph) for this threshold was about 400 seconds of arc. Poor estimation, in this case, was due to the fact that we measured only a small fraction of a psychometric function, which was very shallow.
In the case of the polyhedron condition, thresholds were larger for all subjects, compared to the dot condition. This result is consistent with the results of previous studies, which showed that stereoacuity thresholds were elevated when 3D surface was present (Norman & Todd, 1998). 
Note that there is individual variability in how much the threshold was elevated in the polyhedron condition. For TS and TK, the threshold was elevated by a factor of about 7. For the other two subjects, this elevation was smaller (factor of 2 for JC and 1.5 for YS). The fact that thresholds are larger when polyhedron is present suggests that the depth order judgments are affected by monocular shape priors, such as symmetry and compactness. Different magnitude of elevation of the threshold across subjects is likely to be related to the weight the visual system of a given subject puts on the shape priors. This suggests that the same model might be able to account for 3D shape recovery results in Experiment 1 and for depth order judgments in this experiment. Indeed, comparison of the binocular results shown in Figure 3 with stereoacuity thresholds in the polyhedron condition shown in Figure 6 supports this observation. The next section describes the new model of binocular shape perception. The weight of the monocular shape prior in this model will be estimated from stereoacuity experiment and then the model will be compared to the subjects' results in 3D shape recovery experiment. 
Model
Suppose a 3D reference shape η 0 is viewed binocularly from a distance d. The depth order information between pairs of points, provided by binocular vision, is represented by M 0. We use the maximum a posteriori (MAP) probability to determine model predictions of the percept η p: 
η p = a r g m a x p ( η | M 0 ) η θ ,
(2)
where θ is the one-parameter family of 3D mirror symmetrical shapes consistent with the given 2D cyclopean image. A posteriori probability p(ηM 0) is proportional to the product of p(M 0η) and p(η) (Kersten & Yuille, 2003): 
p ( η | M 0 ) p ( M 0 | η ) p ( η ) ,
(3)
where p(M 0η) is called the likelihood function, which represents the probability that a recovered 3D shape η has the same depth order of pairs of points as η 0. The function p(η) is the prior distribution of η, and it represents the visual system's preference (bias) for the recovered 3D shapes. Below, we describe how to derive p(M 0η) and p(η). 
To derive the likelihood function, we first represent the depth order information of 3D shape η by a matrix M, in which the value of the element (i, j) represents the depth order between the points i and j in this shape. Let the depth interval between points i and j be Δd i,j . Zero means that the points i and j have the same depth. Negative (or positive) depth interval means that point i is closer to (or farther from) the subject than point j. Then, the matrix M can be written as follows: 
M ( i , j ) = s i g n ( Δ d i , j ) .
(4)
Figure 7 shows a 3D shape and its corresponding depth order matrix. 
Figure 7
 
(a) A 3D shape and its 2D image. (b) The depth order matrix for the 3D shape on the left is represented by a colored array. The red patch at (i, j) means that point i is farther than point j, while the blue patch means that point i is closer than point j. The elements on the diagonal are ignored in the analysis.
Figure 7
 
(a) A 3D shape and its 2D image. (b) The depth order matrix for the 3D shape on the left is represented by a colored array. The red patch at (i, j) means that point i is farther than point j, while the blue patch means that point i is closer than point j. The elements on the diagonal are ignored in the analysis.
Let η 0 be a reference 3D shape, whose depth order matrix is M 0, and η be a recovered 3D shape that has the same orthographic image as η 0, and whose depth order matrix is M. Consider one pair of points (i, j) in η 0 and η first. Let the depth intervals between the points i and j in η 0 and η be Δd i,j 0 and Δd i,j , respectively. If Δd i,j is positive, the probability that points i and j in η 0 have the same depth order as the corresponding points in η is pd i,j 0 > 0): 
p ( Δ d i , j 0 > 0 ) = Φ ( Δ d i , j 0 σ i , j ) ,
(5)
where Φ is the cumulative distribution function of the standard normal distribution and σ i,j is the standard deviation of depth order judgments, which is the stereoacuity threshold (note that stereoacuity threshold is a function of the separation of i and j on the retina; larger separations lead to larger thresholds—see below). The stereoacuity threshold used in the likelihood function is the conventional threshold measured by showing two dots without any context. This corresponds to our “dot” condition in Experiment 2
If Δd i,j is negative, the probability that points i and j have the same depth order in η and η 0 is pd i,j 0 < 0): 
p ( Δ d i , j 0 < 0 ) = Φ ( Δ d i , j 0 σ i , j ) .
(6)
Equations 5 and 6 can be expressed by a single equation: 
p ( M 0 ( i , j ) = M ( i , j ) ) = Φ ( s i g n ( Δ d i , j ) Δ d i , j 0 σ i , j )
(7)
or 
p ( M 0 ( i , j ) = M ( i , j ) ) = Φ ( M ( i , j ) Δ d i , j 0 σ i , j ) .
(8)
Assume that the depth order judgment between one pair of points is independent of the others. Then, the joint probability that all pairs of points in η 0 have the same depth order as their corresponding pairs in η is the product of probabilities for each pair. Note that because the depth order matrix is skew-symmetrical M(i, j) = −M(j, i) (refer to Equation 4 and Figure 7), the depth order information in M is redundant. Therefore, when deriving the likelihood for η, only the elements in the upper right triangle of M are used: 
p ( M 0 | η ) = i < j Φ ( M ( i , j ) Δ d i , j 0 σ i , j ) .
(9)
In order to obtain the likelihood as a function of the 3D recovered shape, we compute p(M 0η) for all 3D symmetrical shapes η in the one-parameter family consistent with the given 2D orthographic image of η 0 (we actually compute it for a finite number of shapes using a small step of the 3D shape aspect ratio). The red curve in Figure 8 illustrates the likelihood function for one 2D image. The X-axis represents the dissimilarity ϖ between the recovered 3D shape and the reference 3D shape (the dissimilarity is defined in Experiment 1, as well as in 2). The Y-axis represents the likelihood. 
Figure 8
 
The likelihood function of the Bayesian model (red), the monocular shape prior (green), and the posterior (blue). See text for more details.
Figure 8
 
The likelihood function of the Bayesian model (red), the monocular shape prior (green), and the posterior (blue). See text for more details.
Next, we describe the distribution representing monocular shape prior p(η) (green curve in Figure 8). The symmetry prior is used in our model as an implicit constraint (an assumption) and is included in the likelihood function. Recall that likelihood is computed for a one-parameter family of 3D shapes that are all mirror symmetrical. Without symmetry prior, the likelihood would have to be computed for a larger family of 3D shapes. Similarly, planarity of contours is an implicit prior included in the likelihood because all 3D shapes used to compute the likelihood function are polyhedral objects whose faces are planar. This leaves us with only two monocular shape priors that we used in our 2009 model, namely, maximal 3D compactness and minimum surface area (Li et al., 2009). These two priors were combined into one: maximum V/S 3, where V and S are the volume and surface area of the object. Our shape prior is a Gaussian distribution over the dissimilarity ϖ. The mean value of this distribution is the dissimilarity between η 0 and the 3D shape that maximizes V/S 3. The standard deviation of this distribution is a free parameter that represents the weight of this prior, relative to the weight of likelihood. Smaller standard deviations of the monocular prior mean that this prior will be more important in determining the percept. 
Now, we are ready to compute a posteriori distribution and find its maximum. The a posteriori distribution (blue curve in Figure 8) is the product of the likelihood function representing stereoacuity and of the monocular shape prior representing maximum compactness and minimum surface area constraints. Because the standard deviation of the prior is a free parameter, we tried a wide range of standard deviations and chose the one that provided the best fit to the stereoacuity threshold in the “polyhedron” condition. So, to summarize, the likelihood function was computed by using stereoacuity threshold in the “dots” condition and the maximum of the a posteriori function was used to estimate the 3D shape and the resulting depth order of two selected vertices. The depth order obtained from the posterior was different than the depth order implied by the likelihood function because of the bias produced by the prior. Note also that because the symmetry and planarity priors were implicitly used in the likelihood function, the precision of judging the depth order for a selected pair of points on a polyhedron, implied by the maximum of the likelihood function, will usually be more accurate than the conventional stereoacuity threshold. The reason is that the symmetrical shape with planar faces serves as the source of redundant information. In general, however, adding polyhedron leads to higher stereoacuity threshold because of the bias produced by the maximum compactness and minimum surface constraints. 
There are two other features of the model that have to be mentioned: 
  1.  
    The stereoacuity threshold varies with the separation between tested points on the retina (Rady & Ishak, 1955). This dependence can be approximated by a linear function (Li, 2009b). So, the stereoacuity threshold measured in Experiment 2 for the dots that were separated by about 3 deg was scaled to match the angular separation of individual pairs of vertices, when the likelihood function was computed.
  2.  
    While all visible pairs of points on the polyhedron were used to compute the likelihood function in the simulation of shape recovery, not all of these pairs were used to compute the likelihood function in the simulation of depth order judgments. When all pairs of points are used in the simulation of depth order judgments, the estimated standard deviation of the prior is quite small. Recall that small standard deviation of the prior means that the effect of the prior, compared to that of the likelihood function, is large and results in relatively large errors in 3D shape recovery simulation. So, the proportion of pairs of points used to compute the likelihood function in the simulation of depth order judgments was a free parameter (the value of this parameter was the same for all subjects). Using a fraction of points in judging depth order of two points on a polyhedron makes sense. Specifically, when the subject is asked to compare the depth order of two points on the polyhedron, it is reasonable to assume that she will not pay attention to the entire polyhedron because the depth order task is spatially local. We already know that performance in this task is affected by the shape of the polyhedron, but it is possible that the subject paid attention to only some pairs of points on the polyhedron. So, the proportion of the pairs of points that subjects used in the depth order judgment represents the limits of spatial attention. We estimated this proportion to be 20%. Using only 20% of randomly selected pairs of points to compute the likelihood function in depth order experiment resulted in the standard deviation of the monocular shape prior that led to good fits of the model in the shape recovery experiment. In the shape recovery experiment, all pairs of points were used to compute the likelihood function because judgments about shape are spatially global. The block diagrams in Figures 9a and 9b illustrate, respectively, how the model was used to estimate the standard deviation of the prior and to simulate the subjects' performance in 3D shape recovery experiment. 5
Figure 9
 
(a) The flow diagram illustrating how the standard deviation of the monocular shape prior was estimated. (b) Flow diagram illustrating how the model simulated the subjects' performance in 3D shape recovery.
Figure 9
 
(a) The flow diagram illustrating how the standard deviation of the monocular shape prior was estimated. (b) Flow diagram illustrating how the model simulated the subjects' performance in 3D shape recovery.
The estimated standard deviations of the prior distribution produced by fitting the model's stereoacuity threshold in the polyhedron condition to the threshold of the subjects in the same condition were: 0.26, 0.49, 0.06, and 0.06 for JC, YS, TS, and TK, respectively. Compared with TS and TK, JC and YS have larger standard deviations, which means that their binocular 3D shape percept is less affected by the monocular prior. This is consistent with the psychophysical results in our Experiment 2, where the elevation of the stereoacuity threshold in the “polyhedron” condition was relatively small for JC and YS and large for TS and TK. 
Next, the estimated standard deviations of the prior distribution were applied to simulate the subjects' 3D shape recovery results. There were no additional free parameters, either in monocular or binocular condition. The model was applied to the images that were used to test the individual subjects (the monocular model was the same as that presented by Li et al., 2009). The curves representing the model's performance are shown on the right in Figures 3 and 4. It can be seen that the model performance is very similar to the subjects' performance. The only systematic (although small) difference between the model and the subjects' results is the monocular performance at the slant of 75 degrees. The average error for this slant is smaller in the subjects' performance than in the model's performance (there is a similar difference in the binocular condition). Specifically, the average error of the model is roughly a linear function of the slant of the symmetry plane, whereas the curves representing the subjects' performance are not linear, mainly because of the relatively small error at 75 deg. This difference could be related to the differences in the monocular shape priors between the model and the subjects. Explaining the nature of these (small) differences is beyond the scope of this paper. 
Next, consider the comparison of binocular performance among subjects. Although TS and TK have similar standard deviations of the monocular shape prior distribution, there is a large difference between their simulated binocular performance. Note that their psychophysical binocular performance was also quite different. The large difference in the simulated performance makes sense for the following reason. TK had much larger stereoacuity threshold when tested in the “dots” condition (101.70 seconds of arc) compared to that of TS (18. 45 seconds of arc). Larger stereoacuity threshold means larger variance of the likelihood function. As a result, TK's visual system puts substantially more weight on the monocular prior. This implies that his binocular performance is very similar to his monocular performance (see Figure 3). Note also that even though TS has the lowest stereoacuity threshold from among the four subjects (when tested in the conventional test involving two dots), his binocular performance is worse than that of YS and JC. This is because TS's binocular percept is more affected by the monocular shape prior (the estimated standard deviation of the shape prior distribution was smaller for TS compared to JC and YS). Overall, lower stereoacuity threshold and less weight on monocular shape prior distribution lead to better binocular performance. This fact is accounted for by our model and confirmed in our psychophysical experiment. 
One of the reviewers pointed out that our model is incomplete because it does not use binocular metric information represented by Equation 1. The model only uses binocular ordinal information. Could the model's performance be improved by including binocular information about depth 3D intervals? Note that binocular performance of our subjects in shape recovery of 3D symmetrical polyhedra was close to veridical and our “incomplete” model was able to account for this near-veridical performance quite well. Considering the fact that 3D intervals are almost never perceived veridically (see Introduction section), it is not likely that adding binocular metric information about 3D intervals would actually improve the model's performance or the model's fit to the subjects' performance. However, this does not imply that the human visual system will never use this information. One way to verify this possibility is to use a conflict situation. This is what the reviewer suggested. We asked two subjects to run 3D shape recovery from a viewing distance that is different from the simulated distance. By doing this, we will change the perceived 3D distances, as predicted by Equation 1. At the same time, it is easy to verify that the performance of our model will not be affected by this manipulation because (i) the one-parameter family of 3D mirror symmetric shapes corresponding to the cyclopean image does not depend on the viewing distance, and (ii) the depth order of 3D points will not change, either. 6 Will the subject's percept be affected? 
Two authors (TS and ZP) were tested in two control sessions (50 trials per session). In the first session, stereoscopic images of 3D symmetrical polyhedra were generated for a 2-m viewing distance but shown to the subjects from a 50-cm viewing distance. For this viewing condition, the perceived depth intervals, as predicted from Equation 1, were compressed by a factor of 4 compared to the simulated intervals. If the subject's percept is affected by binocular metric information, the perceived 3D shapes would be thinner, as compared to the simulated 3D shapes. Furthermore, the compression of a 3D shape along the depth direction might destroy the 3D mirror symmetry of the perceived shape. In the second session, stereoscopic images of the 3D shapes were generated for a 50-cm viewing distance and shown to the subjects from a 2-m viewing distance. For this viewing condition, the perceived depth intervals, as predicted from Equation 1, were stretched by a factor of 4 compared to the simulated intervals. If the subject's percept is affected by binocular metric information, the perceived 3D shapes would be stretched in depth, as compared to the simulated 3D shapes, and their 3D symmetry might be destroyed. The remaining aspects of the procedure and the apparatus were the same as those in Experiment 1. Results of this control experiment are shown in Figure 10. The graphs on the left show performance in the non-conflict situation, and the graphs on the right show performance in the conflict situation (TS's performance in the non-conflict situation was replotted from Figure 4). It can be seen that performance is different in these two situations and the change of performance is in the direction predicted by Equation 1. However, the magnitude of the change is much smaller than what would be predicted by Equation 1. Specifically, the maximal difference in observed shape recovery errors in the conflict situations, when the ratio of the actual to simulated viewing distance changed by a factor of 16 (from 4:1 to 1:4), is only about 2.5. So, there is some effect of binocular metric information, but this effect is rather small. Both subjects reported that on several trials in the conflict sessions the 3D shapes did not look exactly symmetric. The distortion of symmetry was expected if the binocular shape percept is affected by binocular metric information. What is interesting is that on most trials in the conflict situation the 3D shapes did look symmetric. This again suggests that the effect of binocular metric information is small and not systematic. Perhaps, in real life, this effect is observed only when the viewing direction is degenerate: parallel or orthogonal to the symmetry plane. Recall that, for such viewing directions, both the symmetry prior and the depth order information are much less effective. However, objects in our natural environment are almost never viewed from degenerate viewing directions, and if they are, the observer can usually move her head in order to view the object from another direction. Therefore, it is possible that binocular metric information is not used too often, at least when the task is to recover shapes of natural objects. 
Figure 10
 
The comparison of subjects' performance between conflict and non-conflict conditions. The graphs on the left show the results under non-conflict viewing condition, in which the simulated viewing distance was the same as the actual viewing distance. The graphs on the right show the results under conflict viewing condition. The simulated viewing distance was 50 cm (or 200 cm) and the actual viewing distance was 200 cm (or 50 cm). The error bars for the results in the conflict conditions are larger compared to those in the non-conflict conditions because the number of trials was smaller.
Figure 10
 
The comparison of subjects' performance between conflict and non-conflict conditions. The graphs on the left show the results under non-conflict viewing condition, in which the simulated viewing distance was the same as the actual viewing distance. The graphs on the right show the results under conflict viewing condition. The simulated viewing distance was 50 cm (or 200 cm) and the actual viewing distance was 200 cm (or 50 cm). The error bars for the results in the conflict conditions are larger compared to those in the non-conflict conditions because the number of trials was smaller.
Summary and conclusions
We presented what we believe is the first study that demonstrated veridical binocular perception of 3D shapes. Our results contrast sharply with the results of most of prior studies, which claimed that binocular perception of 3D shape and space is almost never veridical and is subject to large systematic errors (but see Norman et al., 2006, who reported good shape constancy using approximately symmetrical, natural objects). There are three differences between our experiment and the experiments of others: (i) we used symmetrical shapes; (ii) the shapes were complex in the sense that they contained more than a dozen of feature points; (iii) our stimuli were presented at several 3D viewing orientations that were not restricted to degenerate views. We formulated a completely new model of binocular 3D shape perception. The model, expressed in the formalism of Bayesian inference, can account for psychophysical results fairly well. The likelihood function is computed from stereoacuity threshold (depth order judgments) combined with symmetry and planarity constraints. The prior in our model is a combination of the maximal 3D compactness with minimum surface area. In this model, the binocular and monocular interpretations of 3D shape are closely related. As a result, if binocular information is less reliable (or absent altogether), the binocular percept reduces to the monocular one. So, this is the first model that can actually account for both monocular and binocular perception of shape and has the capacity to explain the transition from one to another percept when the viewing distance changes. In a sense, monocular viewing of 3D shapes is, according to our model, a special (limiting) case of binocular viewing of 3D shapes (see the study of Hillis et al., 2004, where they showed a smooth transition between binocular and monocular perception of slant of a surface from texture). 
We close by pointing out two interesting characteristics of our binocular model. First, the likelihood function in our model combines a non-metric visual information (binocular depth order) with non-metric priors: maximal (i) symmetry and (ii) planarity, and produces a very good approximation to a 3D metric shape. There has been a long tradition in visual science to discuss the nature and fundamental limitations of non-metric information. Our study shows that if the visual system uses very effective shape priors, non-metric data can lead to veridical perception of a 3D metric shape, especially if the 3D shape is complex (the 3D shape has large number of feature points that can be used in depth order judgments). So, somewhat paradoxically, the percept is more likely to be veridical when the 3D shape is more complex. This means that using overly simplistic stimuli may prevent the student of vision from studying the full capabilities of the visual system (see Pizlo, 2008 for a similar claim). Second, our Bayesian model of 3D shape perception may provide a new way to measure the nature and extent of visual spatial attention. Shape, by definition, is spatially global. Comparing depth of two points, on the other hand, is a spatially local task. In our model, judging depth order of two vertices of a 3D polyhedral shape is a task that is spatially more global than comparing depth order of two points in total darkness but less global than judging the 3D shape itself. We estimated that the subjects use 20% of the spatial information when they judge depth order of two vertices of a polyhedron. This effect of spatial attention can probably explain the inconsistencies between judgments of the shape of a 3D object and judgments of the 3D orientations of the object's surfaces, when the surface orientation is judged by means of a small elliptical probe (Li, 2009a). 
Appendix A
Derivation of a one-parameter family of 3D interpretation from a single 2D image 7
Suppose the mirror symmetrical correspondences in a 2D image are determined. For each point a, its symmetrical corresponding point is a′ and it is unique. 
Under an orthographic projection, the lines connecting pairs of symmetrical points are parallel. Suppose the direction of those lines is τ. The Cartesian coordinate system is defined as follows: The image plane is the XY plane. The X-axis has the same direction as τ and the Y-axis is orthogonal to the X-axis in the image plane. The origin of the coordinate system can be set at an arbitrary point in the image plane. The Z-axis is perpendicular to the image plane and indicates the direction in depth. Figure A1 illustrates the Cartesian coordinate system we use for the recovery of a 3D shape. 
Figure A1
 
(a) The illustration of 3D shape recovery. η is a recovered 3D shape from image I. α is the angle between the symmetry plane π S and the image plane π XY . (b–d) Three recoveries from the same image. The angles (α) between the symmetry plane of the recovered 3D shapes and the image plane are −60, −45, and −30, respectively.
Figure A1
 
(a) The illustration of 3D shape recovery. η is a recovered 3D shape from image I. α is the angle between the symmetry plane π S and the image plane π XY . (b–d) Three recoveries from the same image. The angles (α) between the symmetry plane of the recovered 3D shapes and the image plane are −60, −45, and −30, respectively.
Using this coordinate system, recovering the 3D shape is equivalent to computing the Z value for each point in the image. We use the lower case letter to represent the coordinate of a point in the image plane and a capital letter to represent the coordinate of a recovered point in the 3D space. For any pair of corresponding points 〈a i , a i ′〉, their coordinates are (x i , y i ) and (x i ′, y i ′). The coordinates of the recovered symmetrical pair 〈A i , A i ′〉 are (X i , Y i , Z i ) and (X i ′, Y i ′, Z i ′). From the property of the orthographic projection, the following equations are satisfied: 
x i = X i ,
(A1)
 
y i = Y i ,
(A2)
 
x i = X i ,
(A3)
 
y i = Y i .
(A4)
For the coordinate system defined above, the Y values for a and a′ are the same. It follows that 
y i = y i .
(A5)
From Equations A2 and A4, we can obtain 
Y i = Y i .
(A6)
As a result, the points of any symmetrical pair 〈A i , A i ′〉 have same Y value. Let π S be the symmetry plane of the recovered 3D shape and l S be the intersection of the image plane π XY and π S. Then, we have the following properties: 
  1.  
    A i and A i ′ are symmetrical with respect to π S. Hence, the line A i A i ′ is perpendicular to π S. Because l S is on the plane π S, l S is perpendicular to A i A i ′.
  2.  
    a i and a i ′ are the orthographic projections of A i and A i ′. Hence, the lines a i A i and a i A i ′ are perpendicular to the image plane π XY . Because l S is on the plane π XY , l S is perpendicular to a i A i and a i A i ′.
  3.  
    From (1) and (2), it follows that l S is perpendicular to the plane defined by a i , a i ′, A i , and A i ′. From this, it follows that l S is perpendicular to the line a i a i ′.
  4.  
    Because a i and a i ′ have the same y value, the line a i a i ′ is perpendicular to the Y-axis.
  5.  
    From (3) and (4), it follows that l S is parallel to the Y-axis.
Without loss of generality, let l S coincide with the Y-axis (see Figure A1). Let α be the angle between the symmetry plane π S and the image plane π XY and the domain of α is (−90, 90). Then, the normal of the symmetry plane (π S) is [sin(α), 0, −cos(α)] and the plane is expressed by 
sin ( α ) X cos ( α ) Z = 0 .
(A7)
Since all symmetrical pairs 〈A i , A i ′〉 are symmetrical with respect to the symmetry plane π S, A i and A i ′ must satisfy the following two conditions: 
  1.  
    The midpoint of A i and A i ′ is on the symmetry plane π S. It follows that 
    0.5 ( X i + X i ) sin ( α ) 0.5 ( Z i + Z i ) cos ( α ) = 0 .
    (A8)
  2.  
    The line connecting A i and A i ′ is perpendicular to the symmetry plane π S. It follows that 
    ( X i X i ) cos ( α ) + ( Z i Z i ) sin ( α ) = 0 .
    (A9)
    From Equations A8 and A9, we obtain 
    Z i = ( cos ( 2 α ) X i + X i ) / sin ( 2 α ) ,
    (A10)
     
    Z i = ( cos ( 2 α ) X i + X i ) / sin ( 2 α ) .
    (A11)
    Note that X i = (X i ′)′. Thus, Equation A11 can be written as 
    Z i = ( cos ( 2 α ) X i + ( X i ) ) / sin ( 2 α ) .
    (A12)
Equations A10 and A12 have the same form. Therefore, Equation A10 can be used for all points, primed and unprimed. In Equation A10, X i and X i ′ are obtained from the image. The only unknown variable is α (the angle between the symmetry plane π S and the image plane π XY ). Therefore, the recovery is uniquely determined by α. Figure A1 illustrates three possible recovered 3D shapes corresponding to three angles (α): −60, −45, and −30. 
Appendix B
The measure of dissimilarity between two 3D shapes
Suppose η 1 and η 2 are two 3D shapes recovered from the same 2D image. The angles between their symmetry planes and the image plane are α 1 and α 2. A 1 is a point in η 1 whose corresponding point in η 2 is A 2 (see Figure B1). 
Figure B1
 
The comparison between two 3D shapes. η 1 and η 2 are two recovered 3D shapes from image I. The angles between the image plane and the symmetry planes of η 1 and η 2 are α 1 and α 2. A 1 is a point in η 1 whose corresponding point in η 2 is A 2.
Figure B1
 
The comparison between two 3D shapes. η 1 and η 2 are two recovered 3D shapes from image I. The angles between the image plane and the symmetry planes of η 1 and η 2 are α 1 and α 2. A 1 is a point in η 1 whose corresponding point in η 2 is A 2.
From Equation A10, the z value of the point A 1 in η 1 is 
Z 1 = cos ( 2 α 1 ) X + X sin ( 2 α 1 ) ,
(B1)
where X and X′ are the x values of A 1 and its symmetrical counterpart. Similarly, the z value of A 2 in η 2 is 
Z 2 = cos ( 2 α 2 ) X + X sin ( 2 α 2 ) .
(B2)
Subtracting the left-hand and right-hand sides, we obtain 
Z 2 = cos ( 2 α 1 ) cos ( 2 α 2 ) sin ( 2 α 2 ) X + sin ( 2 α 1 ) sin ( 2 α 2 ) Z 1 .
(B3)
Equation B3 represents a 3D affine transformation between A 1 and A 2. It is expressed as follows: 
( X Y Z 2 ) = ( 1 0 0 0 1 0 cos ( 2 α 1 ) cos ( 2 α 2 ) sin ( 2 α 2 ) 0 sin ( 2 α 1 ) sin ( 2 α 2 ) ) ( X Y Z 1 ) .
(B4)
Note that this affine transformation is satisfied for any points in η 1 and their corresponding points in η 2. Therefore, the transformation from η 1 to η 2 is a 3D affine transformation. Let Q be a matrix representing this 3D affine transformation. The relation between η 1 and η 2 can be written simply as η 2 = 1. Q can be decomposed and written as a product of three simpler matrices Q = USV′, where 
U = ( cos ( α 2 ) 0 sin ( α 2 ) 0 1 0 sin ( α 2 ) 0 cos ( α 2 ) ) ,
(B5)
 
S = ( cos ( α 1 ) cos ( α 2 ) 0 0 0 1 0 0 0 sin ( α 1 ) sin ( α 2 ) ) ,
(B6)
 
V = ( cos ( α 1 ) 0 sin ( α 1 ) 0 1 0 sin ( α 1 ) 0 cos ( α 1 ) ) .
(B7)
 
U and V are orthonormal matrices. They represent 3D rotation operations and they change an object's orientation but not its shape. S is a diagonal matrix and it represents a stretching transformation. It results in the change of an object's aspect ratio. So, the change from η 1 to η 2 can be considered as the following sequence of transformations: η 1 is compressed or stretched along the directions [cos(α 1) 0 sin(α 1)] and [−sin(α 1) 0 cos(α 1)] by the factors cos(α 1)/cos(α 2) and sin(α 1)/sin(α 2), respectively. 
Figure B2 illustrates the process. In Figure B2a, the angles (α) of the recovered 3D shapes η 1 and η 2 are 30 and 45 degrees. First, η 1 is compressed along the direction [−sin(30) 0 cos(30)] by a factor of sin(30)/sin(45) (see Figure B2b). Next, the resulting shape is stretched along the direction [cos(30) 0 sin(30)] by a factor of cos(30)/cos(45) (see Figure B2c). After this stretching transformation, the two shapes (transformed η 1 and η 2) are identical except for their 3D orientation. Now, if we rotate the transformed η 1 by 15 degrees around Y-axis, it will coincide with η 2
Figure B2
 
Illustration of the 3D affine transformation between two 3D recovered objects. (a) η 1 (the bottom) and η 2 (the top) are recovered from the same image and their corresponding angles α (the angle between the symmetry plane and the image plane) are 30 and 45 degrees. The two arrows indicate the directions along which η 1 will be compressed or stretched. (b) η 1 is compressed along the normal of its symmetry plane. (c) The resulting object is stretched along the direction indicated by the other arrow.
Figure B2
 
Illustration of the 3D affine transformation between two 3D recovered objects. (a) η 1 (the bottom) and η 2 (the top) are recovered from the same image and their corresponding angles α (the angle between the symmetry plane and the image plane) are 30 and 45 degrees. The two arrows indicate the directions along which η 1 will be compressed or stretched. (b) η 1 is compressed along the normal of its symmetry plane. (c) The resulting object is stretched along the direction indicated by the other arrow.
Let m and n represent the two directions, i.e., 
m = ( cos ( α 1 ) 0 sin ( α 1 ) ) ,
(B8)
 
n = ( sin ( α 1 ) 0 cos ( α 1 ) ) ,
(B9)
and e m and e n represent the change coefficients along these two directions: 
e m = cos ( α 1 ) cos ( α 2 ) ,
(B10)
 
e n = sin ( α 1 ) sin ( α 2 ) .
(B11)
 
Note that n is the normal of the symmetry plane of η 1 and m is perpendicular to n and the Y-axis. 
The decomposition of Q shows that the only difference between η 1 and η 2 is their aspect ratio. Assume that the aspect ratio of η 1 is λ. Then, the aspect ratio of η 2 is (e n /e m )λ. Therefore, the change from η 1 to η 2 corresponds to the change of aspect ratio by e n /e m . The dissimilarity between η 1 to η 2 is, therefore, defined as the logarithm of the absolute value of e n /e m : 
ω ( η 1 , η 2 ) = log 2 ( | e n e m | ) .
(B12)
 
From Equations B10, B11, and B12, we obtain 
ω ( η 1 , η 2 ) = log 2 ( | tan ( α 1 ) | ) log 2 ( | tan ( α 2 ) | ) .
(B13)
 
Equation B13 shows that the measure of the dissimilarity ϖ (η 1, η 2) has the following properties: 
  1.  
    The range of ϖ (η 1, η 2) is (−∞, ∞).
  2.  
    ϖ (η 1, η 2) = 0 iff ∣α 1∣ = ∣α 2∣ (the slants of the symmetry planes of η 1 and η 2 are the same). It follows that the two shapes η 1 and η 2 are identical except for a depth reversal.
  3.  
    ϖ(η 1, η 2) = −ϖ(η 2, η 1).
  4.  
    ϖ(η 1, η 2) = ϖ(η 1, η 3) + ϖ(η 3, η 2).
Acknowledgments
This research was supported by the National Science Foundation, the US Department of Energy, and the Air Force Office of Scientific Research. The authors are grateful to the two anonymous reviewers for helpful comments. 
Commercial relationships: none. 
Corresponding author: Yunfeng Li. 
Address: 703 Third Street, West Lafayette, IN 47906, USA. 
Footnotes
Footnotes
1  We used perspective rather than orthographic images in the experiment. Note that a single 2D perspective image of a 3D mirror symmetric shape determines this shape uniquely (Rothwell, 1995). There is no unknown parameter. This is different from the case of a 2D orthographic image, where the image determines the 3D mirror symmetric shape up to one unknown parameter. Considering the viewing conditions that we used (the simulated size of the object and the viewing distance), the perspective images in our experiment were very similar to orthographic ones. Therefore, it is reasonable to assume that in monocular viewing, the visual system had to resolve the one-parameter shape ambiguity. This assumption was confirmed by the good fits of the monocular model to the subjects' results.
Footnotes
2  We obtained very similar results with four other subjects (Li, 2009b). We also showed that performance is very good when motion parallax is used in the case of a monocular observer (Li, 2009b).
Footnotes
3  We tested two subjects (YL and ZP) in binocular shape recovery with pyramids as well as with our polyhedral objects, both viewed from a degenerate viewing direction (slant of the symmetry plane equal to 90 deg). With both types of objects, performance was worse compared to the performance when the views are non-degenerate. At the same time, however, performance in the case of polyhedral objects showed smaller systematic error, compared to the performance in the case of pyramids.
Footnotes
4  We tested two other subjects using more viewing distances (50, 100, 200, and 300 cm) and observed very similar effects (Li, 2009b).
Footnotes
5  One of the anonymous reviewers pointed out that our assumption of the independence of depth order judgments across pairs of vertices of the polyhedron is not likely to be correct. This makes sense because depth orders of the vertices of a symmetrical polyhedron are correlated and, thus, provide redundant information. Recall that the independence assumption was used in computing our likelihood function (Equation 9). We used this assumption primarily because of computational simplicity. Removing this assumption will change the likelihood function, but because the same function is used in simulating results of Experiments 1 and 2, it is possible that the fits of the model to the subjects' results, as well as the estimated proportion of pairs of points in judging depth order of vertices in Experiment 2, would remain the same.
Footnotes
6  More precisely, binocular disparities will change proportionally to the reciprocal of the viewing distance, and the angular separation of points in the retinal image will change the same way. These two effects will cancel each other in the likelihood function.
Footnotes
7  A brief version of this derivation can be found in Li et al. (2009).
References
Banks M. S. Van E. E. R. Backus B. T. (1997). The computation of binocular visual direction: A re-examination of Mansfild and Legge (1996). Visual Research, 37, 1605–1610. [PubMed]
Blakemore C. (1970). The range and scope of binocular depth discrimination in man. The Journal of Physiology, 211, 599–622. [PubMed] [CrossRef] [PubMed]
Chan M. W. Stevenson A. K. Li Y. Pizlo Z. (2006). Binocular shape constancy from novel views: The role of a priori constraints. Perception & Psychophysics, 68, 1124–1139. [PubMed] [CrossRef] [PubMed]
Frisby J. P. Buckley D. Duke P. A. (1996). Evidence for good recovery of lengths of real objects seen with natural stereo viewing. Perception, 25, 129–154. [PubMed] [CrossRef] [PubMed]
Frisby J. P. Buckley D. Freeman J. (1996). Stereo and texture cue integration in the perception of planar and curved large real surfaces. In Inui T. McClelland J. L. (Eds.), Attention and performance XVI. (pp. 71–91). Cambridge, MA: MIT Press.
Frisby J. P. Stone J. V. (2010). Seeing (2nd ed.). Cambridge, MA: MIT Press.
Gårding J. Porrill J. Mayhew J. E. W. Frisby J. P. (1995). Stereopsis, vertical disparity and relief transformations. Vision Research, 35, 703–722. [PubMed] [CrossRef] [PubMed]
Hillis J. M. Watt S. J. Landy M. S. Banks M. S. (2004). Slant from texture and disparity cues: Optimal cue combination. Journal of Vision, 4(12):1, 967–992, http://www.journalofvision.org/content/4/12/1, doi:10.1167/4.12.1. [PubMed] [Article] [CrossRef] [PubMed]
Howard I. P. Rogers R. J. (2002). See in depth. Toronto, ON, Canada: I Poerteous.
Johnston E. P. (1991). Systematic distortions of shape from stereopsis. Vision Research, 31, 1351–1360. [PubMed] [CrossRef] [PubMed]
Julesz B. (1971). Foundations of cyclopean perception. Chicago: The University of Chicago Press.
Kersten D. Yuille A. (2003). Bayesian models of object perception. Current Opinion in Neurobiology, 13, 150–158. [PubMed] [CrossRef] [PubMed]
Knill D. C. Richards W. (1996). Perception as Bayesian inference. New York: Cambridge University Press.
Knill D. C. Saunders J. A. (2003). Do humans optimally integrate stereo and texture information for judgments of surface slant? Vision Research, 43, 2539–2558. [PubMed] [CrossRef] [PubMed]
Koenderink J. J. van Doorn A. J. Kappers A. M. L. (2006). Pictorial relief. In Jenkin M. R. M. Harris L. R. (Eds.), Seeing spatial form (pp. 11–33). Oxford, UK: Oxford University Press.
Li Y. (2009a). Computational models of 3D shape perception. Dissertation Abstract International, 71, 116A. (UMI No. AAT 3403114). Retrieved September 21, 2010, from Dissertations and Theses database.
Li Y. (2009b). Perception of parallelepipeds: Perkins' law. Perception, 38, 1767–1781. [PubMed] [CrossRef]
Li Y. Pizlo Z. (2005). Monocular and binocular perception of 3D shape: The role of a priori constraints [Abstract]. Journal of Vision, 5(8):521, 521a, http://www.journalofvision.org/content/5/8/521, doi:10.1167/5.8.521. [CrossRef]
Li Y. Pizlo Z. (May, 2008). Perception of 3D shapes from line drawings. Poster session presented at the Annual Meeting of the Society for Vision Science Society, Naples, FL.
Li Y. Pizlo Z. (accepted for publication). Depth cues vs. simplicity in 3D shape perception. Topic in Cognitive Psychology.
Li Y. Pizlo Z. Steinman M. R. (2009). A computational model that recovers the 3D shape of an object from a single 2D retinal representation. Vision Research, 49, 979–991. [PubMed] [CrossRef] [PubMed]
Marr D. Poggio T. (1976). Cooperative computation of stereo disparity. Science, 194, 283–286. [PubMed] [CrossRef] [PubMed]
Norman J. F. Crabtree C. E. Norman H. F. Moncrief B. K. Hermann M. Kapley N. (2006). Aging and the visual, haptic, and cross-modal perception of natural object shape. Perception, 35, 1383–1395. [PubMed] [CrossRef] [PubMed]
Norman J. F. Todd J. T. (1998). Stereoscopic discrimination of interval and ordinal depth relations on smooth surfaces and in empty space. Perception, 27, 257–272. [PubMed] [CrossRef] [PubMed]
Norman J. F. Todd J. T. Perotti V. J. Tittle J. S. (1996). The visual perception of three-dimensional length. Journal of Experimental Psychology: Human Perception and Performance, 22, 173–186. [PubMed] [CrossRef] [PubMed]
Ogle K. N. (1950). Researches in binocular vision. Philadelphia: W B Saunders.
Ono H. (1979). Axiomatic summary and deductions from Hering's principles of visual direction. Perception & Psychophysics, 25, 473–477. [PubMed] [CrossRef] [PubMed]
Ono H. (1981). On Wells's (1792) law of visual direction. Perception & Psychophysics, 30, 403–406. [PubMed] [CrossRef] [PubMed]
Pizlo Z. (2001). Perception viewed as an inverse problem. Vision Research, 41, 3145–3161. [PubMed] [CrossRef] [PubMed]
Pizlo Z. (2008). 3D shape: Its unique place in visual perception. Cambridge, MA: MIT Press.
Pizlo Z. Sawada T. Li Y. Kropatsch G. W. Steinman M. R. (2010). New approach to the perception of 3D shape based on veridicality, complexity, symmetry and volume. Vision Research, 50, 1–11. [PubMed] [CrossRef] [PubMed]
Pizlo Z. Stevenson A. (1999). Shape constancy from novel views. Perception & Psychophysics, 61, 1299–1307. [PubMed] [CrossRef] [PubMed]
Rady A. A. Ishak I. G. H. (1955). Relative contributions of disparity and convergence to stereoscopic acuity. Journal of the Optical Society of America, 45, 530–534. [PubMed] [CrossRef] [PubMed]
Rothwell C. A. (1995). Object recognition through invariant indexing. Oxford, UK: Oxford University Press.
Sawada T. (2010). Visual detection of symmetry of 3D shapes. Journal of Vision, 10(6):4, 1–22, http://www.journalofvision.org/content/10/6/4, doi:10.1167/10.6.4. [PubMed] [Article] [CrossRef] [PubMed]
Todd J. T. Norman J. F. (2003). The visual perception of 3D shape from multiple cues: Are observers capable of perceiving metric structure? Perception & Psychophysics, 65, 31–47. [PubMed] [CrossRef] [PubMed]
Westheimer G. (1979). Cooperative neural processes involved in stereoscopic acuity. Experimental Brain Research, 36, 585–597. [PubMed] [CrossRef] [PubMed]
Westheimer G. McKee S. P. (1980). Stereogram design for testing local stereopsis. Investigative Ophthalmology & Visual Science, 19, 802–809. [PubMed] [PubMed]
Figure 1
 
An illustration of the polyhedra used for Experiment 1. The width of the shapes in (a) and (c) (compared to (b)) is 1/3 and 3, respectively.
Figure 1
 
An illustration of the polyhedra used for Experiment 1. The width of the shapes in (a) and (c) (compared to (b)) is 1/3 and 3, respectively.
Figure 2
 
The illustration of the experimental setup. Two polyhedral shapes were presented side by side and the separation was 13.3 cm. The simulated viewing distance was 50 cm for both shapes.
Figure 2
 
The illustration of the experimental setup. Two polyhedral shapes were presented side by side and the separation was 13.3 cm. The simulated viewing distance was 50 cm for both shapes.
Figure 3
 
The dissimilarity between subjects' adjusted 3D shapes and the reference 3D shapes. The figures on the left (a, c, e, and g) show individual subjects' performance for the two viewing conditions as a function of slant. The figures on the right (b, d, f, and h) show the simulated performance by our model (see Model section).
Figure 3
 
The dissimilarity between subjects' adjusted 3D shapes and the reference 3D shapes. The figures on the left (a, c, e, and g) show individual subjects' performance for the two viewing conditions as a function of slant. The figures on the right (b, d, f, and h) show the simulated performance by our model (see Model section).
Figure 4
 
The dissimilarity between the adjusted 3D shapes and the reference 3D shapes when the 3D shapes were viewed from the distance of 200 cm. Superimposed are graphs from the main experiment showing binocular performance at the 50-cm viewing distance. The two graphs (a and c) on the left show the subjects' performance and (b) and (d) show the performance of our model (see Model section).
Figure 4
 
The dissimilarity between the adjusted 3D shapes and the reference 3D shapes when the 3D shapes were viewed from the distance of 200 cm. Superimposed are graphs from the main experiment showing binocular performance at the 50-cm viewing distance. The two graphs (a and c) on the left show the subjects' performance and (b) and (d) show the performance of our model (see Model section).
Figure 5
 
Stereoscopic images (crossed fusion) from the “polyhedron” condition in Experiment 2.
Figure 5
 
Stereoscopic images (crossed fusion) from the “polyhedron” condition in Experiment 2.
Figure 6
 
Stereoacity thresholds (standard deviations of the psychometric functions) in Experiment 2. Note that the threshold of TK in the polyhedron condition was very large. The standard error (not shown in the graph) for this threshold was about 400 seconds of arc. Poor estimation, in this case, was due to the fact that we measured only a small fraction of a psychometric function, which was very shallow.
Figure 6
 
Stereoacity thresholds (standard deviations of the psychometric functions) in Experiment 2. Note that the threshold of TK in the polyhedron condition was very large. The standard error (not shown in the graph) for this threshold was about 400 seconds of arc. Poor estimation, in this case, was due to the fact that we measured only a small fraction of a psychometric function, which was very shallow.
Figure 7
 
(a) A 3D shape and its 2D image. (b) The depth order matrix for the 3D shape on the left is represented by a colored array. The red patch at (i, j) means that point i is farther than point j, while the blue patch means that point i is closer than point j. The elements on the diagonal are ignored in the analysis.
Figure 7
 
(a) A 3D shape and its 2D image. (b) The depth order matrix for the 3D shape on the left is represented by a colored array. The red patch at (i, j) means that point i is farther than point j, while the blue patch means that point i is closer than point j. The elements on the diagonal are ignored in the analysis.
Figure 8
 
The likelihood function of the Bayesian model (red), the monocular shape prior (green), and the posterior (blue). See text for more details.
Figure 8
 
The likelihood function of the Bayesian model (red), the monocular shape prior (green), and the posterior (blue). See text for more details.
Figure 9
 
(a) The flow diagram illustrating how the standard deviation of the monocular shape prior was estimated. (b) Flow diagram illustrating how the model simulated the subjects' performance in 3D shape recovery.
Figure 9
 
(a) The flow diagram illustrating how the standard deviation of the monocular shape prior was estimated. (b) Flow diagram illustrating how the model simulated the subjects' performance in 3D shape recovery.
Figure 10
 
The comparison of subjects' performance between conflict and non-conflict conditions. The graphs on the left show the results under non-conflict viewing condition, in which the simulated viewing distance was the same as the actual viewing distance. The graphs on the right show the results under conflict viewing condition. The simulated viewing distance was 50 cm (or 200 cm) and the actual viewing distance was 200 cm (or 50 cm). The error bars for the results in the conflict conditions are larger compared to those in the non-conflict conditions because the number of trials was smaller.
Figure 10
 
The comparison of subjects' performance between conflict and non-conflict conditions. The graphs on the left show the results under non-conflict viewing condition, in which the simulated viewing distance was the same as the actual viewing distance. The graphs on the right show the results under conflict viewing condition. The simulated viewing distance was 50 cm (or 200 cm) and the actual viewing distance was 200 cm (or 50 cm). The error bars for the results in the conflict conditions are larger compared to those in the non-conflict conditions because the number of trials was smaller.
Figure A1
 
(a) The illustration of 3D shape recovery. η is a recovered 3D shape from image I. α is the angle between the symmetry plane π S and the image plane π XY . (b–d) Three recoveries from the same image. The angles (α) between the symmetry plane of the recovered 3D shapes and the image plane are −60, −45, and −30, respectively.
Figure A1
 
(a) The illustration of 3D shape recovery. η is a recovered 3D shape from image I. α is the angle between the symmetry plane π S and the image plane π XY . (b–d) Three recoveries from the same image. The angles (α) between the symmetry plane of the recovered 3D shapes and the image plane are −60, −45, and −30, respectively.
Figure B1
 
The comparison between two 3D shapes. η 1 and η 2 are two recovered 3D shapes from image I. The angles between the image plane and the symmetry planes of η 1 and η 2 are α 1 and α 2. A 1 is a point in η 1 whose corresponding point in η 2 is A 2.
Figure B1
 
The comparison between two 3D shapes. η 1 and η 2 are two recovered 3D shapes from image I. The angles between the image plane and the symmetry planes of η 1 and η 2 are α 1 and α 2. A 1 is a point in η 1 whose corresponding point in η 2 is A 2.
Figure B2
 
Illustration of the 3D affine transformation between two 3D recovered objects. (a) η 1 (the bottom) and η 2 (the top) are recovered from the same image and their corresponding angles α (the angle between the symmetry plane and the image plane) are 30 and 45 degrees. The two arrows indicate the directions along which η 1 will be compressed or stretched. (b) η 1 is compressed along the normal of its symmetry plane. (c) The resulting object is stretched along the direction indicated by the other arrow.
Figure B2
 
Illustration of the 3D affine transformation between two 3D recovered objects. (a) η 1 (the bottom) and η 2 (the top) are recovered from the same image and their corresponding angles α (the angle between the symmetry plane and the image plane) are 30 and 45 degrees. The two arrows indicate the directions along which η 1 will be compressed or stretched. (b) η 1 is compressed along the normal of its symmetry plane. (c) The resulting object is stretched along the direction indicated by the other arrow.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×