Studies of face perception typically rely on features selected using an experimenter's
a priori intuitions, that is, without any meaningful model of segmentation or feature diagnosticity. Features are usually identified by manually marking or “cutting and pasting” a limited number of face images. Such methods have a number of drawbacks that often tend to pass unnoticed, hidden in the Methods section. First, only a small number of features are considered, generally features with high contrast such as the eyes, the mouth, and the nose. Second, different intuitions for parsing faces may lead to different and potentially incommensurate results across studies. For instance, the central brow of a face may be grouped with the eyes, with the nose or with the forehead (Brown & Perrett,
1993; Bruce et al.,
1993)—all options are plausible. Third, manual feature marking is impractical for large databases and large sets of features. One might think that this final concern may be easily addressed by appealing to automatic segmentation algorithms. Indeed, in computer vision, this task has been accomplished by methods for facial feature segmentation (Hammal, Eveno, Caplier, & Coulon,
2006; Saber & Tekalp,
1998; Yuille, Hallinan, & Cohen,
1992). Unfortunately, our first two concerns apply to these methods as well, making them equally problematic. More specifically, most automatic feature segmentation algorithms only extract a limited number of features, such as the eyes and the mouth, and feature selection is dependent on the concrete goal of the algorithm, for example, lip segmentation for automatic lip reading.
In contrast, when addressing segmentation as the foundation for human face recognition (and potentially generic object recognition), there are important theoretical advantages to an a posteriori method for segmenting objects into features, that is, making no assumptions about the nature of the features up front but grounding feature identification in human performance. At least two criteria need be considered in this respect. First, feature identification should mirror the way humans accomplish face segmentation, and second, the utility and plausibility of a segmentation scheme for recognition needs to be assessed. This twofold approach is illustrated by the research reported here. First, we develop a feature segmentation method that exhaustively parses faces into features and, in doing so, attempts to approximate human segmentation performance. Second, we examine the utility and the psychological plausibility of the segmental representation obtained in facial gender recognition. As emphasized below, this latter analysis has rarely been used in evaluating segmentation algorithms.
Interestingly, our investigation of segmental structure in gender categorization enabled us to examine more thoroughly one type of cue relatively under-researched in face recognition: color. In one of the few studies addressing the role of color in face perception, Hill et al. (
1995) compared the use of color and shape information in judgments of facial gender and ethnicity. Their results indicated that color dominated gender judgments while shape dominated ethnicity judgments. However, the authors ascribed the diagnosticity of color to luminance and texture rather than to hue. This interpretation is in line with the idea that while hue may play a role in face recognition, its role is confined to low-level processes such as feature segmentation (Yip & Sinha,
2002). In other words, hue is not expected to facilitate high-level recognition. However, Tarr et al. (
2001) provided evidence for a significant role for hue in gender judgments of faces. More specifically, image analysis revealed that Caucasian male faces tend to be darker and redder than female ones—see Jablonski and Chaplin (
2000) for an extensive analysis of male and female skin luminance. Supporting this analysis, behavioral results indicated that humans take advantage of this difference when shape information is suboptimal. Tarr argued this was due to the red–green ratio in a single perceptual color channel. However, the three color channels, luminance, red–green, and blue-yellow, are likely to provide covarying information. One question therefore concerns the independent contribution of these channels. In addition, Tarr's study color was used to characterize faces globally by their mean luminance and red–green ratios. Here we explore whether the pattern of variation across different regions of the face may provide more fine-grained information that can serve to pull apart information provided by the three channels more reliably and further boost categorization accuracy. Following this line of reasoning, our study examines the diagnosticity of featural and configural color properties of for automatic gender categorization and human gender recognition.
As a final note, any particular pattern of variation with regard to color or any other cue is critically dependent on the feature segmentation schema deployed. Different ways of segmenting faces can lead to different patterns of variation, not all of which may be equally helpful for a given task. Consequently, as mentioned earlier, our approach uses segmentation to study recognition and, conversely, recognition results to assess the utility of feature segmentation. This method allows us to gain a broader perspective on how mechanisms of low-level and high-level face processing interact, as well as providing a tool for examining the role of different cues through multiple processing stages. More specifically, our study examines the utility of cue-specific featural and configural information in gender categorization. Critically, while our framework allows us to implement and test one version of configural processing, our most informative results speak more to how the visual system may select cues and identify specific facial features for a given categorization task.