Human SFS is also known to produce large individual differences with only qualitative agreement between observers (Koenderink, van Doorn, & Kappers,
1992; Todd et al.,
1996). These findings suggest that attempts to establish a quantitative theory for SFS are futile. However, Battu, Kappers, and Koenderink (
2007) and Koenderink, van Doorn, and Kappers (
2001) have shown that despite inconsistencies between observers, participants nonetheless produced consistent shape estimates up to an affine transformation. That is, the 3D representations perceived by observers differ only by scaling and shearing. Mathematically, this is described as follows: suppose that
z 1(
x,
y) and
z 2(
x,
y) are depth functions estimated by two observers, where
x and
y are coordinates in the image plane, the relationship between
z 1(
x,
y) and
z 2(
x,
y) can be described by a multiple linear regression:
z 1(
x,
y) =
az 2(
x,
y) +
bx +
cy +
d where the constant
a represents a scaling factor and
b,
c, and
d control shearing transforms of the 3D surface. Therefore, we argue that human SFS can be expressed by
where
(
x,
y) is the 3D shape as reported by the observer, and
z(
x,
y) is the provisional 3D surface computed from shading alone and is (we presume) a function of low-level processes and therefore more or less common to all observers. In other words, the 3D information computed from shading alone is not enough to recover a full representation of the 3D shape. We call this 3D information the “proto-surface.” The remaining information is normally provided by other cues. When no other cues are available, participants “make up” for the missing information by applying their “beholder's share” (Koenderink et al.,
2001), resulting in large interobserver variances. Following this logic, we believe that the common proto-surface
z(
x,
y) in
Equation 1 is a key to understanding human SFS.