This study tested perception of symmetry of 3D shapes from single 2D images. In Experiment 1, performance in discrimination between symmetric and asymmetric 3D shapes from single 2D line drawings was tested. In Experiment 2, performance in discrimination between different degrees of asymmetry of 3D shapes from single 2D line drawings was tested. The results showed that human performance in the discrimination was reliable. Based on these results, a computational model that performs the discrimination from single 2D images is presented. The model first recovers the 3D shape using a priori constraints: 3D symmetry, maximal 3D compactness, minimum surface area, and maximal planarity of contours. Then the model evaluates the degree of symmetry of the 3D shape. The model provided good fit to the subjects' data.

*a priori*constraints that determine the likelihood of the 3D interpretations of a given 2D image. We present a new computational model that explains the nature of these constraints. Performance of the model in a 3D symmetry discrimination task is very similar to the performance of human subjects.

^{1}

*d*′ used in the signal detection theory and its standard error. Higher performance corresponds to higher values of

*d*′;

*d*′ = 0 represents chance performance and

*d*′ = ∞ represents perfect performance.

*d*′ was computed for each session. The standard error was computed from two values of

*d*′.

*d*′ and the abscissa shows the level of distortion of the asymmetric polyhedra.

^{2}The results of static and kinetic conditions are plotted separately. The three curves indicate the three asymmetry types: Condition-R (circles), -N (triangles), and -P (squares). The results were analyzed using a three-way ANOVA within-subjects design: conditions (R vs. N vs. P) × levels of distortion for generating asymmetric polyhedra (L1–L4) × type of display (static vs. kinetic).

*F*

_{3, 46}= 249.94,

*p*< 0.001) and when the presentation was kinetic (

*F*

_{1, 46}= 24.42,

*p*< 0.001). Performance was significantly higher than chance level for all asymmetry types even in static L1 condition, in which the distortion of asymmetric shapes was the smallest (R:

*t*(5) = 5.10,

*p*< 0.005; N:

*t*(5) = 3.44,

*p*< 0.05; P:

*t*(5) = 3.29,

*p*< 0.05). This means that the subjects could reliably discriminate between symmetric and asymmetric 3D polyhedra even when only one 2D image was provided. The main effect of the asymmetry type was also significant (

*F*

_{2, 46}= 105.18,

*p*< 0.001). An interaction between the asymmetry type and the level of distortion was also significant (

*F*

_{6, 46}= 5.89,

*p*< 0.001), but this interaction was most likely due to a floor effect. A posteriori test (Tukey HSD) showed that the difference between Conditions-N and -P was significant in L4 (

*p*< 0.005) but was not significant in L1, L2, and L3 conditions (

*p*> 0.05). The other interactions were not significant (

*p*> 0.05). Performance was the best in Condition-R and the worst in Condition-P. This was expected. The most interesting result, however, is that the subject was able to perform above chance level in static Condition-P. Recall that in this condition, each image of an asymmetric polyhedron is consistent with a symmetric 3D interpretation. It follows that in all trials in this condition, the 2D image was consistent with both symmetric and asymmetric 3D interpretation regardless whether the 2D image was produced by a symmetric or an asymmetric 3D shape. So, mathematically, the images in this condition were completely ambiguous. How could the subject make the discrimination? It seems that the only way to make the discrimination reliably is to evaluate the likelihood of the symmetric and asymmetric 3D interpretations. The computational model described later in this paper, does this by recovering a 3D shape by means of maximizing a cost function that includes four

*a priori*constraints. Once the 3D shape is recovered, its 3D symmetry is evaluated.

*d*′ improved by only 0.26. These results suggest that kinetic depth effect is of secondary importance in 3D symmetry perception. Pictorial information provided by a single 2D image plays a major role.

*d*′, and the abscissa shows levels of distortion of the less asymmetric polyhedra. The three curves indicate the asymmetry types. The results were analyzed using two-way ANOVA within-subjects design.

*F*

_{2, 16}= 99.99,

*p*< 0.001). Performance was significantly higher than chance level for all asymmetry types even in L3 condition, in which the difference was the smallest (R:

*t*(5) = 14.25,

*p*< 0.001; N:

*t*(5) = 10.07,

*p*< 0.05; P:

*t*(5) = 18.21,

*p*< 0.05). This means that the subjects could reliably discriminate between two degrees of asymmetry of 3D shapes from single 2D images. The main effect of the asymmetry type was also significant (

*F*

_{2, 16}= 26.29,

*p*< 0.001). An interaction between these two factors was not significant (

*F*

_{4, 16}= 1.76,

*p*= 0.187). Performance was the best in Condition-R and the worst in Condition-P. As in Experiment 1, the subject was able to perform above chance level in Condition-P. These results show that the human visual can measure the degree of asymmetry of a 3D shape and use this measure in discriminating between more and less asymmetric 3D shapes. The nature of the perceptual metric for asymmetry will be discussed in the next section.

*d*′ improved by only 0.19. As in Experiment 1, the kinetic depth effect did not substantially improve the performance.

*H*represents the 3D shape,

*V*

_{3D}(

*H*) and

*S*

_{3D}(

*H*) are the volume and the surface area of the 3D shape

*H*(Hildebrandt & Tromba, 1996). Maximum 3D compactness, in conjunction with 3D symmetry constraint give the object its volume. Minimum surface area is defined as the minimum of the total surface area of the object. It is equivalent to the maximum of the reciprocal of the total surface area. This constraint tends to decrease the thickness of the 3D shape along the depth direction. It makes the 2D image more stable in the presence of small 3D rotations, and thus the 3D recovered shape is more likely; a small change of the viewing direction does not change the 2D image substantially (Li et al., 2009). In order to use the surface area of the 3D shape as a unit-free parameter, the reciprocal of the total surface area of the 3D shape was multiplied by the size of the surface area of the projected 3D shape to the 2D image:

*S*

_{3D}(

*H*) and

*S*

_{2D}(

*H*) are the surface area of the 3D shape

*H*and the surface area of the projection of

*H*to the 2D image. The symmetric 3D shape which maximizes compactness(

*H*) + surface(

*H*) is chosen from the family of the symmetric 3D shapes.

*α*

_{ a}and

*α*

_{counterpart(a)}are the corresponding 2D angles of contours, and

*n*

_{ a}is number of the 2D angles of the polyhedron

*H*. If the recovered shape is perfectly symmetric, its two halves are identical and symmetry(

*H*) is zero; otherwise, it is smaller than zero. Planarity constraint enforces the contours enclosing the faces of the 3D shape to be planar (Leclerc & Fischler, 1992; see also Hong, Ma, & Yu, 2004; Sawada, Li, & Pizlo, 2010). In a planar

*n*

_{p}-gon, the sum of all interior angles is equal to (

*n*

_{p}− 2)

*π*. When a convex polygon is not planar, the sum of its angles is smaller.

^{3}Hence, the departure from planarity can be measured by computing the difference between the sum of the angles and (

*n*

_{p}− 2)

*π*. Planarity of each face is defined as negative of the departure from planarity. Planarity of faces of the 3D shape is defined here as an average of planarity from all faces:

*f*represents polygonal face of the 3D shape,

*α*

_{f,a}is

*a*th internal angle of face

*f,*and

*n*

_{f}is the number faces of the polyhedron

*H*. For the other two constraints, maximum 3D compactness and minimum surface, Equations 1 and 2 are used. The following equation is used as an overall objective function:

*E*(

*H*). The dimensionality of the search is established as follows: 1 (the orientation of the symmetry plane) + 8 (the depth positions of the “3D symmetry line segments”) +

*n*

_{ o}(the number of the occluded vertices). Note that the orientation of the symmetry plane is characterized by two parameters: slant and tilt. Tilt is estimated as an average orientation of the 2D symmetry line segments (Zabrodsky & Weinshall, 1997). Hence, only slant is used in the search. All 3D symmetry line segments are restricted to be parallel to the normal of the symmetry plane, if the 2D symmetry line segments are parallel to one another in the image. Otherwise, the 3D symmetry line segments are restricted to be as parallel as possible. The positions of the occluded vertices are restricted to be along the 3D symmetry lines emanating from their symmetric counterparts. The model uses a steepest gradient descent method for finding the minimum of −

*E*(

*H*) (which is equivalent to maximum of

*E*(

*H*)).

*d*′ was computed for each session. The criterion for model's classification of the recovered 3D shape as “asymmetric” was chosen so as to minimize the sum of squared differences between the

*d*′ of the model and that of the subject in each experiment. Hence, there was one free parameter for 12 data points in Experiment 1 and one free parameter for 9 data points in Experiment 2.

*d*′ and the abscissa shows levels of distortion of the asymmetric polyhedra. The results show that the model can account for human performance quite well. Specifically, the model's performance improves with the level of the distortion. The performance is the best with the asymmetric polyhedra in Condition-R and is the worst with the asymmetric polyhedra in Condition-P. The same trends are also observed in the results of the human subjects. Similarly good fit was found for results in Experiment 2 (the bottom panel). It is worth pointing out that the estimated response bias of the model was very similar to that of human subjects in each session. On average, the rate of “symmetric shape” responses of the model, when the 3D shape was symmetric, was 80%, whereas that of the human subjects was 79%. The judgments of both the model and the human subjects in the psychophysical experiments were slightly biased toward “symmetric shape” responses.

*z*= 0 be the image plane and the

*x*- and

*y*-axes of the 3D Cartesian coordinate system be the 2D coordinate system on the image plane. Consider a 3D polyhedron that is mirror symmetric with respect to a plane

*S*in 3D space. We begin with the analysis of those symmetric pairs of vertices that are both visible. 3D symmetry line segments connecting symmetric pairs in 3D space are parallel to one another. The 2D symmetry line segments, which are the projection of the 3D symmetry line segments in the 2D image, are also parallel to one another under an orthographic projection. Let's set the direction of the

*x*-axis so that the

*x*-axis is parallel to the 2D symmetry line segments (note that this does not restrict the generality). If the polyhedron is approximately symmetric and the 3D and 2D symmetry line segments are only approximately parallel, the

*x*-axis is set to a direction that is as parallel to the 2D symmetry line segments as possible in the least squares sense:

**p**′

_{ i}= [

*x*′

_{ i},

*y*′

_{ i}]

^{ t}and

**p**′

_{counterpart(i)}= [

*x*′

_{counterpart(i)},

*y*′

_{counterpart(i)}]

^{ t}are 2D orthographic projections of vertices

*i*and its symmetric counterpart(

*i*).

**p**

_{i}= [

*x*

_{i},

*y*

_{i}]

^{t}and

**p**

_{counterpart(i)}= [

*x*

_{counterpart(i)},

*y*

_{counterpart(i)}]

^{t}are the projections of the vertices

*i*and counterpart(

*i*) after the correction. The corrected image is consistent with an orthographic image of a perfectly symmetric 3D shape. If the 2D symmetry line segments are all exactly parallel in the 2D image,

**p**′

_{i}=

**p**

_{i}and

**p**′

_{counterpart(i)}=

**p**

_{counterpart(i)}.

*y*-axis:

**q**

_{i}and

**q**

_{counterpart(i)}are the reflections of

**p**

_{i}and

**p**

_{counterpart(i)}in the virtual image. The virtual image is a valid 2D image of the same 3D shape after the 3D shape has been rotated around the

*y*-axis. Let the 3D coordinates of the symmetric pair of vertices

*i*and counterpart(

*i*) at the orientation for which the real image was obtained be

**v**

_{i}= [

*x*

_{i},

*y*

_{i},

*z*

_{i}]

^{t}and

**v**

_{counterpart(i)}= [

*x*

_{counterpart(i)},

*y*

_{i},

*z*

_{counterpart(i)}]

^{t}. Note that

*x*- and

*y*-values of

**v**

_{i}and

**v**

_{counterpart(i)}are identical to those of

**p**

_{i}and

**p**

_{counterpart(i)}under the orthographic projection. In the same way, let the 3D coordinates of the symmetric pair of vertices

*i*and counterpart(

*i*) at the orientation for which the virtual image was obtained be

**u**

_{i}= [−

*x*

_{i},

*y*

_{i},

*z*

_{i}]

^{t}and

**u**

_{counterpart(i)}= [−

*x*

_{counterpart(i)},

*y*

_{i},

*z*

_{counterpart(i)}]

^{t}. Then, the vertex

**u**

_{counterpart(i)}that corresponds to

**v**

_{i}after the 3D rigid rotation can be written as follows:

**R**

_{3D}is a 3 × 3 rotation matrix. The

**R**

_{3D}in Equation A4 represents the 3D rigid rotation around the

*y*-axis:

*θ*is the angle of the rotation around the

*y*-axis. Note that the correspondence of points in the real and virtual images produced by the 2D reflection (i.e.,

**p**

_{i}maps to

**q**

_{i}and

**p**

_{counterpart(i)}maps to

**q**

_{counterpart(i)}) is “opposite” to the correspondence produced by the 3D rotation (i.e.,

**p**

_{i}maps to

**q**

_{counterpart(i)}and

**p**

_{counterpart(i)}maps to

**q**

_{i}) (see Figure A1).

*z*

_{ i}can be computed:

*i*of the symmetric 3D shape can be written as follows:

**v**

_{ i}depends on

*θ*. It means that symmetric 3D shapes that are consistent with the corrected image can be represented by a one-parameter family. The family is controlled by

*θ,*which is the angle of rotation around the

*y*-axis.

*S*of the symmetric 3D shape. The 3D symmetry line segments are parallel to the normal of the symmetry plane

*S*and the midpoints of the 3D symmetry line segments are on

*S*. The midpoint of a 3D symmetry line segment connecting the vertices

*i*and counterpart(

*i*) is

*S*:

*S*is a plane whose normal is perpendicular to the

*y*-axis and the

*y*-axis is on

*S*. The normal of

*S*can be written as follows:

*σ*

_{ s}of

*S*is defined as an angle between

*n*_{ s}and the z-axis, and it can be computed as follows:

*n*_{ s}. Equation A12 shows that

*θ*which is the parameter specifying the family 3D shapes is directly related to the orientation of the 3D symmetry line segments and the symmetry plane of the 3D shape.

*S*and the

*zx*-plane be

*s*

_{ zx}. Let the height of the object be its length along the

*y*-axis and the width be its length along

*n*_{ s}and the depth be its length along

*s*

_{ zx}. The

*y*-axis,

*n*_{ s}, and

*s*

_{ zx}are perpendicular to one another. Let the width

*W*of the 3D shape be measured as the length of the longest 3D symmetry line segment. Consider the 3D symmetry line segment connecting a symmetric pair of the vertices

*i*and counterpart(

*i*). The length

*l*

_{ i}(

*θ*) of this 3D symmetry line segment can be computed as follows:

*θ*. It means that the 3D shape is proportionally stretched along the direction of 3D symmetry line segments by the same factor, which is a function of

*θ*: From Equation A13 we have:

*W*(

*θ*) is the width of the 3D shape.

*W*(

*π*/2) is the width of the shape when

*θ*=

*π*/2.

*s*

_{ zx}, which is on the

*zx*-plane. Note that the 3D symmetry line segments are parallel to

*n*_{ s}and

*n*_{ s}is perpendicular to

*s*

_{ zx}. The depth of the 3D shape can be measured by computing the longest distance between two midpoints of the 3D symmetry line segments. The midpoints are coplanar on

*S,*and

*n*_{ s}is perpendicular to the

*y*-axis. Hence, the distance

*d*

_{ ij}between two midpoints along

*s*

_{ zx}can be computed as follows:

*m*

_{ i}is a midpoint of a 3D symmetry line segment connecting

*i*and counterpart(

*i*),

**m**

_{ j}is a midpoint of a 3D symmetry line segment connecting

*j*and counterpart(

*j*), and

*d*

_{ ij}(

*θ*) is the distance between

**m**

_{ i}and

**m**

_{ j}. Equation A15 shows that the distances between

**m**

_{ i}and

**m**

_{ j}are scaled as a function of

*θ*. It means that the 3D shape is proportionally scaled along the depth axis as a function of

*θ*: From Equation A15 we have:

*D*(

*π*/2) is the depth of the 3D shape when

*θ*=

*π*/2. From Equation A8, we see that the

*y*-values of the vertices are independent of

*θ*. It means that the height of the 3D shape is constant (and thus is independent of

*θ*).

*π*/2) is the aspect ratio of the 3D shape for

*θ*=

*π*/2. Equations A11, A12, and A17 show that

*θ,*which is the parameter characterizing the family, specifies the orientation of the symmetry plane, orientation of the 3D symmetry line segments, and the aspect ratio of the 3D shape.

*z*-value of the visible vertex is obtained by computing an intersection of this face and the projection line emanating from the image of this vertex. The hidden counterpart is recovered by reflecting the visible vertex with respect to the known symmetry plane of the 3D shape.

**Δ**

_{3D}is a 3D distortion, and

**v**′

_{ i}is the position of the vertex

*i*after the distortion. Let the 3D coordinates of

**Δ**

_{3D}be [Δ

_{ x}, Δ

_{ y}, Δ

_{ z}]

^{ t}. From Equation A2,

**Δ**

_{3D}= [0,

*y*

_{ i}−

*y*′

_{ i}, Δ

_{ z}]

^{ t}, where Δ

_{ z}can be arbitrary. The magnitude of 3D distortion (

**Δ**

_{3D}) is minimized when Δ

_{ z}= 0. Hence, the minimally distorted symmetric shape which is consistent with the real 2D image can be written as follows:

*x*and

*y*coordinates in Equation A19 is an inverse transformation of that in Equation A2.

*α*′

_{ a}and

*α*′

_{counterpart(a)}are the projections of the corresponding 2D angles of contours of the polyhedron

*H*. In order to measure 2D symmetry of the image, only the visible line segments in the image were considered (Figure B1). If the 2D image of the 3D shape is perfectly symmetric, its two halves are identical and symmetry2D(

*H*) is zero; otherwise, it is smaller than zero.

*d*′ was computed for each session based on measured 2D symmetry of images used in the session. The criterion for classification of 2D symmetry was chosen so as to minimize the sum of squared differences between the

*d*′ of the model and that of the subject in each experiment.

*d*′ and the abscissa shows levels of distortion of the asymmetric polyhedra. The model's performance shows similar effect of the levels of the distortion as that observed in human performance. However, the overall performance of the model is quite close to chance level in most conditions. This contrasts with performance of the subjects which was well above chance level in most conditions. Furthermore, the response bias of the model that maximized the fit to the subjects' results was very different from that of the human subjects. On average, the proportion of “symmetric shape” responses of the model, when the 3D shape was symmetric, was 13%. This means that the model almost never responded “symmetric” when the 3D shape was symmetric. This contrasts markedly with human responses, where the proportion of responses “symmetric” for symmetric shapes was equal to 79%. This extreme response bias of the model would not lead to successful performance in everyday life where most objects are symmetric and should be perceived as such. Changing the model's response bias so that it is identical to the subjects' response bias would make model's discriminability even worse (close to chance level in all conditions). All these results show that discrimination of 2D symmetry of images of 3D shapes cannot account for the human performance in discrimination between symmetric and asymmetric 3D shapes.

*σ*) and tilt (

*τ*). Slant is the angle between the first line of sight and the rotated line of sight. Hence, slant specifies the amount of the rotation. Tilt is the angle between the projection of the rotated line of sight to the original image and the

*x*-axis of the original image. Tilt specifies the axis of rotation, around which the line of sight is rotated. Hence, the 2D image of the 3D shape after the rotation of the line of sight can be written as image(

*H, σ, τ*). The image(

*H,*0,

*τ*) represents the original image. A difference between these two images is computed here as a sum of absolute differences between projections of 2D angles in image(

*H,*0,

*τ*) and image(

*H, σ, τ*):

*n*

_{ a}is number of the 2D angles of the polyhedron

*H,*and

*α*′

_{ a}(0,

*τ*) and

*α*′

_{ a}(

*σ, τ*) are the projections of a 2D angle of contours of

*H*in image(

*H,*0,

*τ*) and image(

*H, σ, τ*). Note that a small

*σ*causes a large difference between the images if image(

*H,*0,

*τ*) is unstable. Stability of image(

*H,*0,

*τ*) is defined here as a negative of its instability, which is a sum of the differences defined in Equation C1 computed for a small change of slant (here 1 degree) and over all tilts:

*H*) and MinStability(

*H*) are maximum and minimum stability of images of the polyhedron

*H*among images of 2562 different viewing orientations of

*H*. The 2562 viewing orientations were derived by connecting the center of a viewing sphere and 2562 points that are almost uniformly distributed on the surface of the sphere (see Ballard & Brown, 1982, pp. 492–493).

*d*′ was computed for each session based on the normalized stabilities. The criterion for symmetric versus asymmetric response of the model was chosen so as to minimize the sum of squared differences between the

*d*′ of the model and that of the subject in each experiment.

*d*′ and the abscissa shows levels of distortion of the asymmetric polyhedra. The model's performance shows similar trends to the results of the subjects. However, the overall performance of the model was substantially poorer than the human performance. Furthermore, the estimated response bias of the model was very different from that of human subjects in each session. On average, the rate of “symmetric shape” responses of the model, when the 3D shape was symmetric, was 15%, whereas that of human subjects was 79%. Again, this response bias of the model would not lead to successful performance in everyday life where most objects are symmetric and should be perceived as such. Changing the model's response bias so that it is identical to the subjects' response bias would make model's discriminability close to chance level. These results clearly show that it is unlikely that the stability of symmetric 3D interpretations of 2D images is the primary factor in the discrimination between symmetric and asymmetric 3D shapes from single 2D images. A priori constraints, such as 3D compactness, seem to be critical in this task. It is important to point out, however, that the generic viewpoint model implemented here is not necessarily the only, or even the best way to do it. Therefore, this simulation experiment should not be treated as an ultimate test rejecting the generic viewpoint approach to the problem of discriminating between 3D symmetric and asymmetric shapes.

^{1}The method of generating the symmetric polyhedra was almost the same to that in Chan et al. (2006), Li et al. (2009), and Pizlo and Stevenson (1999). The only difference is that the symmetric polyhedron used in these prior studies had coplanar bottom faces, but that used in this study did not.