Humans have a remarkable capacity to perceive, to discriminate, and to remember faces. Our ability to recognize one another is critical to successful navigation in our social world, and faces—despite sharing the same basic features in the same basic configurations—serve as a primary source of individual recognition. Attempts to explain this ability have inspired the development of numerous empirical and methodological techniques in the fields of psychology, neuroscience, and computer science. Until recently, most experiments in face perception have used raw or manually altered photographs of faces as stimuli (e.g., Ellis, Burton, Young, & Flude,
1997; Tanaka & Sengco,
1997). Although this has allowed researchers to stay close to the phenomenon of interest, reliance on these stimuli has resulted in a number of important limitations. Photographed faces are largely uncontrolled stimuli; they are rarely matched for size, orientation, or lighting conditions. In addition, photographs do not provide a systematic way of modifying face-specific image properties, which severely limits the extent to which similarities between stimuli can be measured, controlled, or manipulated.
Valentine's (
1991) proposal of a face space, in which faces are represented as points in a high-dimensional space and distances between points represent perceptual dissimilarities between the corresponding faces, provided a theoretical framework in which relationships between face stimuli could be formalized. This general framework, along with a few axiomatic assumptions, produced elegant explanations of several well-known phenomena in the face perception literature, including the distinctiveness and other-race effects. Without direct control over the actual face stimuli used in experiments, however, it has been difficult to empirically test whether the assumptions behind the general model hold true.
For example, Valentine (
1991) conjectured that typical faces occupy a dense, central region of face space, whereas distinctive faces lie in the sparser periphery of the space. This claim has been used as an explanation of the well-reported phenomenon that distinctive faces, being farther from each other and therefore less confusable, elicit better recognition performance than typical faces. However, the original claim regarding the spatial distributions of typical and distinctive faces cannot be tested without a method for assigning particular faces to particular points in face space. Another example involves the phenomenon of cross-race identification. Several studies have reported an asymmetric own-race bias in the recognition of faces (see Bothwell, Brigham, & Malpass,
1989; Meissner & Brigham,
2001; Sporer,
2001). In particular, Caucasian subjects often show better recognition performance with Caucasian faces compared to Asian or African American faces, whereas Asian or African American subjects perform just as well with both types of faces. Researchers debate the cause of this asymmetry, with some focusing on differences in exposure to cross-race faces between the two groups (e.g., Tanaka, Kiefer, & Bukach,
2004), others highlighting the role of differences in race homogeneity (see Lindsay, Jack, & Christian,
1991), and still others implicating social factors such as status and attitudes (e.g., Barden, Maddux, Petty, & Brewer,
2004). Having a concrete measure of the physical variability of faces within and between race and other demographic groups would help resolve this debate and contribute to our understanding of how individual faces and face categories might be encoded.
Since Valentine's (
1991) proposition of the face space model, several different image-processing techniques have been developed to enable the measurement and manipulation of similarity between faces. The most popular methods have included the use of eigenfaces (e.g., Turk & Pentland,
1991), landmark-based morphing (Benson & Perrett,
1993) and 3D reconstructions based on laser scans or photographs (e.g., Blanz & Vetter,
1999; Bruce et al.,
1993). Although these methods have contributed to our understanding of face representation, they have fallen short of providing a fully reconstructive face space model that would enable the controlled generation of parametrically defined stimuli.
The eigenface method decomposes images into a set of dimensions based on variations across pixel values. Because the processing is done on raw pixel values, even slight variations in lighting conditions among the original photographs can have massive effects on the eigenvalue decomposition, which can cause two faces that are perceptually similar to have vastly different eigenface representations. In addition, if face images are not precisely aligned and normalized before processing, the resulting dimensions in the eigenspace can be incoherent and averaging two or more face images together can result in “ghost” features. For example, averaging together a face with wide-set eyes and a face with narrow-set eyes will create a face with four semitransparent eyes. Because the relative locations of interior features vary substantially across faces, this correspondence problem cannot be avoided by simply centering and scaling face images. As a consequence of the correspondence problem, a large number of dimensions in the eigenface representation end up being uninformative, artificially boosting the dimensionality of the space to hundreds of dimensions (see Penev & Sirovich,
2000).
Landmark-based models provide a way to solve the correspondence problem. The method requires the manual placement of a few hundred points on identifiable face parts, such as the tip of the nose or the corners of the eyes, across a collection of face images. This spatial coding produces a high-dimensional space of landmark locations that allows for arbitrary averaging, or morphing, among the set of coded face images. However, the method does not provide a fully reconstructive parameterization; the location of landmark points alone, without accompanying color or texture information, is insufficient to reconstruct a face image. Therefore, reconstructions rely on detailed information from the original face images that is extremely high-dimensional and largely uncontrolled across images (see Beale & Keil,
1995).
Some researchers have also employed methods based on 3D laser scans as well as 3D reconstructions derived from photographs at multiple views (e.g., O'Toole, Vetter, Troje, & Bülthoff,
1997; Vetter,
1998). These methods involve automatic alignments of several thousand locations and textures across a collection of 3D face data and use specialized graphics software to display and manipulate the resulting images. Although this approach can produce rather realistic face reconstructions, the automatic alignment procedure—based on digitally derived image properties—does not guarantee true anatomical correspondence between points across different faces, again creating a correspondence problem and a large number of uninterpretable dimensions. In addition, its usefulness for face perception researchers is limited by the expensive equipment and software needed to build a database, to construct the model, and to display the 3D images.
To avoid some of these obstacles, there have been recent attempts at low-dimensional parameterizations of face space using simplified face stimuli. Synthetic faces (Wilson, Loffler, & Wilkinson,
2002) are one such method. These stimuli are computerized line drawings obtained from gray-scale face photographs by manually identifying a set of landmark points within each face and extracting local contrast information in specified regions. Synthetic faces are then reconstructed by smoothly interpolating between the landmark points, matching the contrast patterns of the original image, and placing face features in specified locations. Synthetic faces carry the main advantage of providing a relatively low-dimensional full parameterization of faces (using 37 dimensions) while preserving substantial individuating facial information. In their behavioral studies, Wilson et al. (
2002) showed that synthetic faces allow for accurate matching to original photographs across various viewpoints and produce inversion effects, as originally reported with face photographs by Yin (
1969). A main limitation to this method is its reliance on predefined, generic face features such as eyes and eyebrows to reconstruct each face. Recent research has shown that empirically derived “features” may be more useful in characterizing the perceptually salient information available in faces (see Schyns, Bonnar, & Gosselin,
2002).