Free
Research Article  |   March 2008
The prototype effect revisited: Evidence for an abstract feature model of face recognition
Author Affiliations
Journal of Vision March 2008, Vol.8, 20. doi:10.1167/8.3.20
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Guy Wallis, Ulrike E. Siebeck, Kellie Swann, Volker Blanz, Heinrich H. Bülthoff; The prototype effect revisited: Evidence for an abstract feature model of face recognition. Journal of Vision 2008;8(3):20. doi: 10.1167/8.3.20.

      Download citation file:


      © 2016 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements
Abstract

Humans typically have a remarkable memory for faces. Nonetheless, in some cases they can be fooled. Experiments described in this paper provide new evidence for an effect in which observers falsely “recognize” a face that they have never seen before. The face is a chimera (prototype) built from parts extracted from previously viewed faces. It is known that faces of this kind can be confused with truly familiar faces, a result referred to as the prototype effect. However, recent studies have failed to find evidence for a full effect, one in which the prototype is regarded not only as familiar, but as more familiar than faces which have been seen before. This study sought to reinvestigate the effect. In a pair of experiments, evidence is reported for the full effect based on both an old/new discrimination task and a familiarity ranking task. The results are shown to be consistent with a recognition model in which faces are represented as combinations of reusable, abstract features. In a final experiment, novel predictions of the model are verified by comparing the size of the prototype effect for upright and upside-down faces. Despite the fundamentally piecewise nature of the model, an explanation is provided as to how it can also account for the sensitivity of observers to configural and holistic cues. This discussion is backed up with the use of an unsupervised network model. Overall, the paper describes how an abstract feature-based model can reconcile a range of results in the face recognition literature and, in turn, lessen currently perceived differences between the representation of faces and other objects.

Introduction
On any given day, we may interact with hundreds of different people. From as little as a momentary glance, we rapidly decide if the person is familiar or not, where we may have seen them before, if we have a name for them, if they are pleased to see us, and so on. Our ability to recognize and to analyze faces forms an essential part of everyday social interaction, and the ease with which we accomplish this recognition may suggest that it is a relatively simple task. In fact, from a purely computational standpoint, faces are not easy to recognize. They change their shape under rotation (profile, frontal view), they self-occlude (nose), they are non-rigid (smile, frown), and very similar distracters exist (other faces). Despite these inherent difficulties, most of us are consummate experts when it comes to recognizing faces. Indeed it is only relatively recently, and under a limited range of viewing conditions, that machines now out perform us (O'Toole et al., 2007). One of the enduring questions in visual perception is how we achieve this level of discrimination and recognition. Central to answering this question is the issue of how faces are represented in cortex since it is this representation that underlies recognition. 
Within the face recognition literature, there has been considerable debate as to whether faces are recognized as whole entities or as the sum of their constituent parts (Carey & Diamond, 1994; Farah, Wilson, Drain, & Tanaka, 1998; Peterson & Rhodes, 2003; Schwarzer, 1997; Tanaka & Farah, 1993). Support for a holistic model of face processing comes from a number of sources. First, jumbling nameable parts (mouth, nose, eyes) leads to reductions in both recognition speed and accuracy (Farah et al., 1998; Tanaka & Farah, 1993). Second, discrimination based upon a facial part (e.g., eyes) is disrupted by the presence of other, irrelevant facial parts (e.g., nose or mouth) (Hole, 1994; Young, Hellawell, & Hay, 1987). 
As well as being sensitive to the conjunction of nameable parts, we are also sensitive to their placement within a face (Leder & Bruce, 1998; Maurer, Grand, & Mondloch, 2002). Any slight changes in the distance between the eyes or between nose and mouth, etc., can greatly affect recognition performance. This sensitivity to facial configuration (configuration effect) has lead some theorists to argue that we represent faces using a code based upon facial metrics, that is, distances between landmarks such as the eye centers, tip of the nose, etc. (Leder & Bruce, 2000). This approach has also found some support in early face cell recording work (Yamane, Kaji, & Kawano, 1988). See Table 1 for a summary of these and other relevant terms. 
Table 1
 
Summary of the effects and terms referred to in this paper.
Table 1
 
Summary of the effects and terms referred to in this paper.
Holistic effect The obligatory, automatic processing of faces as a whole entity. Relates to the enhanced recognition of a nameable part viewed in the context of other parts seen during training. This occurs even if attention is focused on only one part.
Configural effect Sensitivity to the layout and the spacing of nameable parts. Relates to the enhanced recognition performance seen with faces in which their parts appear in the original and correct facial layout.
Metrics-based representation Explicit representation of a face in terms of the distance between landmark points such as the eye centers, the tip of the nose, the chin, the hair line, etc. Not concerned with the appearance of the parts as such, just facial dimensions.
Structural description Recognition of each nameable part combined with explicit representation of the relative locations of each part.
Abstract features A multi-scale, image-based representation in which the pictorial matching happens first locally and then globally. No explicit representation of layout or distances, no explicit correspondence to nameable parts.
One of the most interesting aspects of both configural and holistic processing is that they are most apparent in processing upright faces. If a face is presented upside down, recognition performance appears to be much more focused on local, featural information (Leder & Bruce, 1998; Maurer et al., 2002; Thompson, 1980). Attempts to incorporate all of this evidence into a model of face recognition have often lead to devising hybrid, two or three stream systems (Maurer et al., 2002; Peterson & Rhodes, 2003). In this paper, we argue that the various findings are actually manifestations of a single underlying model based upon competitive networks of neurons selective for environment specific abstract features. Abstract features are essentially pictorial subelements of a stimulus that display some degree of robustness to natural changes in the appearance of their preferred stimuli. This robustness may take the form of tolerance to changes in appearance of that feature caused by changes in object size, location, orientation in depth, illumination, etc. Features of this type form the basis of several working models of object recognition (Riesenhuber & Poggio, 2000; Torralba, Murphy, & Freeman, 2007; Wallis & Rolls, 1997) and match current evidence from neurophysiological studies (Tanaka, Saito, Fukada, & Moriya, 1991; Tsunoda, Yamane, Nishizaki, & Tanifuji, 2001; Rolls, 1992; Wallis & Bülthoff, 1999). In those studies and models, abstract features are usually simply referred to as features. In this paper, they attract the modifier “abstract” to avoid confusion with use of the term “feature” in the face recognition literature, where it refers to nameable facial parts and colors (Leder & Bruce, 2000). 
As a first step in attempting to understand how the different results in the face recognition literature are consistent with a single model, it is worth restating what the holistic and configural results actually tell us. Holistic effects are evidence for the fact that upright faces are processed in a manner that is sensitive to multiple, integrated aspects of the face. Some have interpreted this as evidence for the “obligatory processing of all of the features of an object, even when instructions direct the observer to focus on only a single part” (Gauthier & Tarr, 2002). Processing of this type is not consistent with an abstract feature-based model that builds a representation from subregions of the face. A feature-based model does allow for some features to be more complex and more complete than others but would not allow for them to be truly complete. This discrepancy forms one of the elements of the investigations in this paper. Configural processing results reveal that spacing between nameable, simple features is important when processing upright faces. This poses less of a problem for an abstract feature model. One simple interpretation of the configural effects is that the features used to discriminate upright faces span nameable parts. This is consistent with the fact that abstract features are not limited to encoding nameable parts (like mouths, noses, eyes, etc.) and hence are likely to alter their response to a face in which the spacing between nameable parts is altered. 
On some level then, any model must contain both configural and (semi)holistic aspects, in so far as it must be sensitive to the configuration of, as well as combination of nameable parts. It should also explain why processing is (more) holistic for upright faces than for inverted ones. This paper describes how such properties naturally emerge from a self-organizing, competitive system that is nonetheless neither truly holistic nor based on facial metrics. Some of the evidence for this is drawn from the experimental work reported in this paper. 
In order to illustrate and to test this proposal, the experimental section investigates the nature of holistic and configural types of facial processing for faces in an attempt to determine just how holistic the processing really is. Despite its use of large and/or complex integrated features, the abstract feature model is still a piecemeal recognition system. One of the consequences of this is that it can be fooled into “recognizing” novel combinations of familiar facial parts in a manner that holistic and metrics-based systems should not. In order to test this, what is required is a manipulation that allows alteration of the holistic and metric form of the face without greatly altering the abstract feature content. Wallis and Bülthoff (1999) suggested that techniques used to study the facial prototype effect (Solso & McCarthy, 1981) might be suitable. The facial prototype effect refers to the discovery that exposure to parts of a predetermined “prototype” face can lead to a novel face (constructed from these parts) being regarded as familiar when it is not. To understand why holistic or metrics-based codes make different predictions to an abstract feature-based code, see Figure 1. In the figure, the two means of representing faces are compared. A truly holistic representation based on pictorial or metrics-based cues will only rarely be fooled by a facial prototype because at least some global aspects of the prototype will always be novel and hence different from the faces seen previously. An abstract feature-based model, on the other hand, would predict regular confusion of the prototype with truly familiar faces. This is because in this case, recognition is based upon a collection of analyzers that only focus on the appearance of local regions or partial features of the face. More importantly, a holistic or a metrics-based model might predict some confusion, but certainly not a preference for the prototype over familiar faces, as was the case in Solso and McCarthy's (1981) study. In fact, the idea that feature-based processing might underlie certain false-memory effects in face recognition has been raised before. In particular, Reinitz, Morrissey, and Demb (1994) argued that their results with line drawings in a divided attention task might reflect naive, feature-specific processes at work. Where their thinking diverges from our own is that they were referring to simple features, that is, nameable parts. They concluded that face recognition must require a structural description of where these features reside, thereby aligning their thoughts with metrics-based and structural models. The abstract feature model proposed here offers an alternative to these as well as fully holistic models. 
Figure 1
 
Rival theories of face representation in the context of an old/new recognition task. (A) A holistic decision process is based upon decision boundaries within a “space” of facial images and predicts a level I but not a level II prototype effect (see text). (B) A feature-based model predicts that the decision process depends on the frequency with which subregions have been seen previously, irrespective of their particular combination. The black dots represent the number of previous exposures to those facial regions. The sections highlighted in green represent the regions that go to make up the prototype face.
Figure 1
 
Rival theories of face representation in the context of an old/new recognition task. (A) A holistic decision process is based upon decision boundaries within a “space” of facial images and predicts a level I but not a level II prototype effect (see text). (B) A feature-based model predicts that the decision process depends on the frequency with which subregions have been seen previously, irrespective of their particular combination. The black dots represent the number of previous exposures to those facial regions. The sections highlighted in green represent the regions that go to make up the prototype face.
As for the prototype effect, many authors have realized its potential importance for understanding the processing of faces. Unfortunately, since the original study by Solso and McCarthy (1981), the literature has produced conflicting results and reports. Part of the reason for this lies in the definition of the effect. There are at least two major forms of the effect that should be distinguished: Level I, the prototype face is regarded as more familiar than novel faces. Level II, the prototype is regarded as more familiar than novel faces but is also regarded as more familiar than faces which have been seen before. Solso and McCarthy described a level II effect in their study based on photo-fit images. Since that time, larger and more detailed studies have been conducted. Several studies have investigated the robustness of the effect by varying the precise form of the discrimination task. The most recent studies have also employed picture plane morphs of photographic images, overcoming criticisms leveled at the use of schematic and photo-fit images (Ellis, 1981). Unfortunately, despite its theoretical importance, many of the recent studies have not tested for the level II effect (Bruce, Doyle, Dench, & Burton, 1991; Cabeza & Kato, 2000). Where it has been tested, researchers have failed to find any reliable advantage for the prototype face over familiar faces. In fact, they have more often found the opposite to be the case (Bruce et al., 1991; Cabeza, Bruce, Kato, & Oda, 1999). It turns out that in terms of the holistic versus abstract feature debate, it is the level II effect that is the most interesting since the two models make quite opposite predictions. 
This paper provides evidence for a robust level II prototype effect. It goes on to describe how this effect is at odds with purely holistic or metrics-based models of face recognition and readily explained by an abstract-feature model. Ultimately, the paper provides new evidence to support the idea that faces are recognized on the basis of many hundreds or thousands of neurons each selective to one of a range of abstract facial features. Through consideration of the results and current models of temporal lobe processing in macaques and humans, the paper then argues against the need for a separate face recognition system and seeks to reintegrate faces into a general theory of object representation and recognition. 
Experiment 1
Introduction
The first experiment sought to investigate the prototype effect for faces using a simple familiar/unfamiliar response task. 
Method
Participants
Twenty participants, with corrected to normal vision, were tested in Experiment 1. All were undergraduate students in the Psychology Department at the University of Queensland. This and all subsequent experiments were conducted in accordance with the university ethics guidelines and with approval of the university's Behavioural and Social Sciences Ethical Review Committee. Informed consent was obtained from all participants. The students received course credit for their participation. 
Stimuli
Participants sat 60 cm from a 24-in. Sony Trinitron monitor observing the images of a head displayed centrally and subtending an angle of approximately 8 × 6 deg. The stimuli were prepared using a subset of the 3D head models database held at the Max Planck Institute in Tübingen. The heads were originally scanned using a Cyberware 3D laser scanner that samples texture and shape information on a regular cylindrical grid, with a resolution of 0.8 degrees horizontally and 0.615 mm vertically (see Figure 2). For the purposes of this experiment, the heads of 55 female volunteers were used, and from these, sets of 35 heads were chosen at random to create the prototype test sets of the type shown in Figure 3. The choice of which head, mouth, nose, and eye region was combined to form any one prototype was made at random, although they originated from four different heads in each case. All other heads were then generated on the basis of the prototype, such that eight faces had three regions in common—familiar/unfamiliar (2) × mouth/head/eyes/nose (4). Twelve faces had two regions in common with the prototype, eight had one region in common, and six had none in common. The number of feature steps from the prototype is referred to henceforth as “distance.” A distance of 2 indicates two facial regions are altered from the original prototype image, for example, eyes and mouth. The facial regions were mixed by cutting the shape and texture information out of each 3D model and “stitching” it together along the lines of overlap. A sigmoidal weighting function was used to integrate the shape contours at the point of overlap. Much of the theory behind this head morphing technology is described elsewhere (Vetter, 1998). Each subject was exposed to two completely separate prototype head sets (each containing 35 heads). A new pair of prototype head sets was generated for each subject. 
Figure 2
 
(a) The stimuli were generated from a set of 3D head models that include separate texture and form information. (b) Four subregions were identified within each head, centered on the eyes, nose, mouth, and surround. Subregions selected from the heads of four individuals were morphed together to build hybrid head shapes. (c) These morphed heads were rendered in seven equally spaced orientations. During training and testing, preselected morphed heads were presented in rapid succession (300 ms per frame), providing the impression of a head rotating from one extreme profile to the other.
Figure 2
 
(a) The stimuli were generated from a set of 3D head models that include separate texture and form information. (b) Four subregions were identified within each head, centered on the eyes, nose, mouth, and surround. Subregions selected from the heads of four individuals were morphed together to build hybrid head shapes. (c) These morphed heads were rendered in seven equally spaced orientations. During training and testing, preselected morphed heads were presented in rapid succession (300 ms per frame), providing the impression of a head rotating from one extreme profile to the other.
Figure 3
 
Example of the stimuli used in the experiments. The prototype-centered face array is divided into familiar faces seen during training and unfamiliar ones seen only during testing. Each face is an amalgam of four facial regions located around the eyes, nose, mouth, and surround. The central face is the prototype. It has a specific number of facial parts in common with the surrounding faces. This number decreases as a function of distance from the prototype, as indicated by the digits (1–4) placed on the concentric rings. A total of 35 heads were required to build each face array.
Figure 3
 
Example of the stimuli used in the experiments. The prototype-centered face array is divided into familiar faces seen during training and unfamiliar ones seen only during testing. Each face is an amalgam of four facial regions located around the eyes, nose, mouth, and surround. The central face is the prototype. It has a specific number of facial parts in common with the surrounding faces. This number decreases as a function of distance from the prototype, as indicated by the digits (1–4) placed on the concentric rings. A total of 35 heads were required to build each face array.
Procedure and design
The experiment was divided into two parts—a training phase and then a testing phase. During training, participants were exposed to half the images of a prototype face array (see Figure 3). Each presentation consisted of the head being displayed in seven different poses corresponding to equal orientation changes around the vertical axis ranging from left to right profile (see Figure 2). By presenting the images in rapid sequential order, the head appeared to rotate. This process was repeated until the subject had seen the head in all poses for a total of 8.4 seconds. All faces were presented twice. During testing, the same 17 heads, plus 18 more (including the prototype), were presented. They were shown in exactly the same manner and for the same duration as during the training phase. Participants were now required to indicate whether the face appeared to be “old” or “new” by means of a key press. 
Results and discussion
The experiment requires remembering a large series of faces, and the first analysis examined how well subjects performed overall. The binary nature of the familiar/unfamiliar decision task lends the data to analysis using signal detection techniques. d-prime was calculated as a function of distance from the prototype. The results appear in Figure 4. A repeated measures ANOVA was then conducted with distance from the prototype as independent variable and d-prime score as dependent variable. ANOVA revealed a significant effect of distance, F(3,57), MSE = 0.595, F = 9.73, p < 0.001, corresponding to better performance as distance from the prototype increased. Despite this trend, performance was still significantly above chance for faces differing by only one facial region from the prototype (and hence two from each other): F(1,19), MSE = 0.4495, F = 30.84, p < 0.001, indicating that subjects were able to distinguish familiar from unfamiliar faces under all four stimulus distance conditions. 
Figure 4
 
d-prime results for Experiment 1. The graph records the ability of the participants to distinguish familiar from unfamiliar faces. For faces sharing many facial regions with the prototype (distance = 1 or 2), performance is worse than for faces which are very different (distance = 3 or 4). Nonetheless, performance was well above chance, suggesting that subjects were able to perform the task well at all four distances from the prototype.
Figure 4
 
d-prime results for Experiment 1. The graph records the ability of the participants to distinguish familiar from unfamiliar faces. For faces sharing many facial regions with the prototype (distance = 1 or 2), performance is worse than for faces which are very different (distance = 3 or 4). Nonetheless, performance was well above chance, suggesting that subjects were able to perform the task well at all four distances from the prototype.
Although the preceding result is encouraging, it says nothing to the major hypotheses being tested. Since the prototype was never shown during training, d-prime cannot be established for this condition. Instead, a further analysis was conducted based upon average response rates for the prototype and the familiar faces (see Figure 5). The logistic function was used to transform raw percent correct scores. The scores were then analyzed using a repeated-measures ANOVA, with distance from the prototype as independent variable. The overall effect was significant, F(4,76), MSE = 4.4212, F = 12.078, p < 0.001, and a post hoc analysis using Dunnett's test revealed that the prototype was perceived as familiar significantly more often than the familiar faces, irrespective of distance from the prototype (see Figure 5). Familiar faces were recognized as familiar around 80% of the time, irrespective of the degree of similarity to the prototype (distance = 0). For unfamiliar faces, performance increased as the number of features it shared with the prototype decreased. Crucially, the prototype was not only often mistaken for familiar, it was actually seen as familiar more often than the truly familiar faces. Hence, the prototype effect has been reproduced. In terms of the underlying hypotheses, the result is consistent with the idea that facial regions are analyzed separately and is inconsistent with a holistic model of face recognition. 
Figure 5
 
The number of times a face was said to be familiar, expressed as a percentage. The red line indicates faces that were actually unfamiliar, and the blue line those that were indeed familiar. Asterisks indicate differences between the prototype and each of the groups of familiar faces (** p < 0.01, * p < 0.05).
Figure 5
 
The number of times a face was said to be familiar, expressed as a percentage. The red line indicates faces that were actually unfamiliar, and the blue line those that were indeed familiar. Asterisks indicate differences between the prototype and each of the groups of familiar faces (** p < 0.01, * p < 0.05).
Experiment 2
Introduction
The second experiment sought to test the robustness of the prototype effect found in Experiment 1 by testing a new set of participants using a different recognition test based on a familiarity rating scale. 
Method
Participants
Nine participants, with corrected to normal vision, were tested in Experiment 2. All were undergraduate students in the Psychology Department at the University of Queensland. All participants received course credit for their participation. 
Stimuli
The second experiment used the same test set as Experiment 1, with each of the nine subjects being trained and tested on two unique face sets of the type shown in Figure 3
Design and procedure
The experiment was set up as a familiarity rating task similar to that used in several earlier studies of the prototype effect (Cabeza & Kato, 2000; Solso & McCarthy, 1981). Faces were once again presented as heads rotating about the vertical axis during both training and testing. During testing, participants were required to select a key labeled 1–10 to indicate the level of subjective familiarity (1 = unfamiliar, 10 = very familiar). 
Results and discussion
The ratings made by our participants appear in Figure 6. Although ANOVA has been used to analyze similar rating data in the past (Howell, 1997; Solso & McCarthy, 1981), we chose to use the non-parametric equivalent due to the inherently non-normal distribution of the rating scores. For an overall measure of difference between the familiarity rating of the prototype relative to the truly familiar faces, a Wilcoxon paired sample analysis was performed on individual rating scores. The prototype received a significantly higher familiarity rating than the familiar faces, and further analysis revealed that the difference remained significant irrespective of distance from the prototype, Z(1, 9) = 2.31, p < 0.05. For comparison with earlier studies, a repeated measures ANOVA of familiar versus prototype ratings was also conducted. The result confirmed the main effect identified by the Wilcoxon analysis, F(1, 8), MSE = 0.616, F = 13.66, p < 0.01. Thus, the effect has been successfully replicated using a different set of participants and stimuli and using a different discrimination task. Consistent with the previous experiment, unfamiliar faces are ranked as less familiar than their truly familiar counterparts at all distances from the prototype. The prototype (distance = 0) is, however, rated as the most familiar face overall, despite it never having been seen during training. Overall, this serves to confirm the robustness of the level II prototype effect. 
Figure 6
 
Results from the rating experiment. The red line indicates unfamiliar faces and the blue line familiar ones. The boxes record the median response for each distance, and the error bars indicate the upper and the lower quartiles. The circles indicate mean rating values. Asterisks indicate differences between the prototype and each of the groups of familiar faces (*p < 0.05).
Figure 6
 
Results from the rating experiment. The red line indicates unfamiliar faces and the blue line familiar ones. The boxes record the median response for each distance, and the error bars indicate the upper and the lower quartiles. The circles indicate mean rating values. Asterisks indicate differences between the prototype and each of the groups of familiar faces (*p < 0.05).
Experiment 3
Introduction
The final experiment investigated the consequences of image inversion, that is, turning the faces upside-down. Face inversion is of interest because it is known that the discrimination of upside-down faces is more reliant on local features than the more holistic sensitivity seen for upright faces (Leder & Bruce, 1998). If the prototype effect described in Experiments 1 and 2 emerges as a consequence of feature-based recognition, one would expect a larger effect for inverted faces than for upright ones. 
Method
Participants
Ten subjects took part in the experiment. Subjects were volunteers from the undergraduate student population. They were paid for their participation and had corrected to normal vision. 
Stimuli
Four new face sets were generated for each subject. Two contained upright faces, and two contained inverted faces. 
Design and procedure
Training and testing followed the format of the first experiment. Subjects were tested on two separate occasions on either the upright or the inverted faces, with the order of testing being counter-balanced across subjects. 
Results and discussion
Data from the four conditions appear in Figure 7. Performance on the inverted faces is considerably worse than for upright faces. This is not in itself surprising and is consistent with earlier studies of face inversion (Yin, 1969). As a first step, an analysis was conducted to confirm the presence of a prototype effect both for upright and inverted faces. The prototype effect was reproduced for upright, F(1,9) = 7.796, MSE = 3.415, p < 0.05, and inverted faces, F(1,9) = 9.421, MSE = 4.808, p < 0.05. For upright faces, the effect size was large: (Cohen's) d = 1.33, but it was even larger for inverted faces: d = 1.73. 
Figure 7
 
Results from Experiment 3 reporting variation in the percentage of trials subjects judged a face to be familiar. The four lines reflect judgment as a function of distance from the prototype for familiar versus unfamiliar and upright versus inverted faces. Note that for unfamiliar faces a high score here corresponds to poor task performance.
Figure 7
 
Results from Experiment 3 reporting variation in the percentage of trials subjects judged a face to be familiar. The four lines reflect judgment as a function of distance from the prototype for familiar versus unfamiliar and upright versus inverted faces. Note that for unfamiliar faces a high score here corresponds to poor task performance.
Having confirmed the presence of a level II effect under both conditions, attention switched to the relative size of the effect. The results appear in Figure 8. In the case of the prototype, the change in performance was small. A paired t-test confirmed that this difference was not statistically significant, t(9) = 0.246, n.s. For all other distances, a large and consistent drop in performance of around 10% was apparent, and this was statistically significant in each case, t(9) = 2.31, 3.68, 7.48, 3.37, all p < 0.05. This differential reduction in performance between the prototype and the other faces indicates that the prototype effect was stronger for inverted faces than for upright ones, an assertion consistent with the increase in effect size described above. 
Figure 8
 
Difference in overall performance (% correct) between upright and inverted faces as a function of distance from the prototype.
Figure 8
 
Difference in overall performance (% correct) between upright and inverted faces as a function of distance from the prototype.
Model
Introduction
This paper is proposing that faces, amongst other objects, are represented using pictorial features which each exhibit a degree of transform invariance (to rotations, size changes, etc.) In common with other models of recognition in inferior temporal lobe cortex, the suggestion is that the features reflect the statistics of the visual environment and that the neurons in these regions are organized into multiple layers of competitive networks (Fukushima, 1988; Riesenhuber & Poggio, 2000; Wallis & Rolls, 1997). The consequences of such a model for recognition performance are manifold. This section describes a simple competitive system and considers patterns of selectivity that emerge during exposure to an array of input images. The network's behavior is seen to parallel various behavioral phenomena described in the face recognition literature. 
The neural network model
The model is kept simple as its role is purely illustrative. Central to its design are two core elements, which it shares with all self-organizing, competitive systems: (i) a rule for synaptic adaptation—in this case based on simple Hebbian principles, and (ii) a form of global competition between neural classifiers (Hertz, Krogh, & Palmer, 1990; Wallis & Rolls, 1997). 
The network contains a total of n neurons consisting of a single output based on three inputs. The response of the nth neuron y n is simply the product of its weight vector w n and the current input vector x. The components (i) and (ii) listed above are implemented as follows:  
w n = r x μ n + ( 1 r ) w n
(1)
 
μ n = y n ɛ n y max ,
(2)
where r is the learning rate, y max is the output of the most strongly firing neuron, and ɛ is the rank of the neural activation, such that the most active neuron has rank 1 and the nth most active has rank n. Dividing by y max normalizes activity across the network on each stimulus presentation, which has the effect of ensuring that the amount of synaptic modification of the most strongly firing neuron is constant for each pattern presented. This normalization step also implements a degree of global competition. Dividing by ɛ enhances the degree of competition. 
Note that the weight and the input vectors are always positive and constrained to unit length, causing the result to lie in the positive octant of a sphere of unit radius centered at the origin. The diagrams in Figures 8, 9, and 10 are projections of this surface onto a plane. 
Figure 9
 
Output of the competitive network after 1000 presentations of the 500 patterns (cyan dots) with 6 classifiers (red crosses). Black lines are the Voronoi diagram classification boundaries indicating the range over which each classifier “wins.” Classification is achieved with little sensitivity to the second, low variance dimension.
Figure 9
 
Output of the competitive network after 1000 presentations of the 500 patterns (cyan dots) with 6 classifiers (red crosses). Black lines are the Voronoi diagram classification boundaries indicating the range over which each classifier “wins.” Classification is achieved with little sensitivity to the second, low variance dimension.
10.1167/8.3.20.M1
Figure 10
 
Output of the network after 1000 presentations of the 500 patterns with 30 classifiers. Note that the classifiers are now sensitive to both stimulus dimensions, and that they form more symmetric clusters.
Figure 10
 
Output of the network after 1000 presentations of the 500 patterns with 30 classifiers. Note that the classifiers are now sensitive to both stimulus dimensions, and that they form more symmetric clusters.
10.1167/8.3.20.M2
Expertise and the holistic versus local debate
To illustrate the implications of a competitive system on recognition behavior, it is instructive to consider two phenomena in the recognition literature, namely, expertise and holistic processing. It turns out that in a competitive classifier these two issues go hand in hand. 
The action of a competitive network is to produce selectivity in its neurons such that they are active approximately equal amounts of the time (Hertz et al., 1990). Neurons can satisfy this constraint by employing a mixture of two strategies: (i) A neuron focuses in on a narrow region of the input space in which only a few, regularly seen exemplars exist. Despite its limited range of selectivity, it is activated relatively often because of the common occurrence of its preferred stimuli. (ii) Alternatively, a neuron may choose to be less selective, responding to a broad range of stimuli which individually are seldom experienced but which, as a group, are seen as often as regularly experienced patterns. 
In practice of course, neurons do not choose which strategy to employ; this simply emerges from the statistical properties of the input space and the neurons' initial selectivity. What is significant for recognition processes is that the effect of regular exposure to a particular object class, that is, the development of expertise, now becomes apparent. Patterns falling within an area of expertise are seen very often and some neurons respond by becoming highly selective for a small range of these highly familiar stimuli. Competition with other similarly tuned neurons forces these “expert” neurons to use ever more specific aspects of their favored stimuli, integrating more and more information from across the entire stimulus, resulting in neurons selective for more holistic, but nonetheless abstract features. In contrast, neurons in regions of more sporadic activity experience less crowding of the input space and so win by remaining relatively unspecialized. 
To illustrate this point, we can contrast the type of selectivity seen in the model described above. If only a few classifiers are used to represent a large set of input patterns, they will tend to be widely spaced and to seek out dimensions in the input space with the largest variance, aligning themselves with the major principle components of the input pattern distribution. In the example in Figure 9, six classifiers are trained ( n = 6). The input patterns are drawn from a two-dimensional Gaussian probability distribution in which the standard deviation along the first dimension is three times that along the second. In this case, the classifiers are seen to efficiently split the space into similarly sized groups but do so without any regard for the second dimension of variability. Repeated exposure to these patterns will eventually attract more classifiers to represent the space of inputs, and at this point sensitivity to a single dimension is no longer sufficient to ensure exclusivity to a particular classifier. Figure 10 reveals the very different pattern that emerges in the case that 30 classifiers cover this region of input space. The point being made here is simply that greater specialization means greater exposure to a small set of patterns. This brings in more neurons/classifiers that produce selectivity across ever-increasing input dimensions that in turn equates to a more globally sensitive and hence holistic representation. 
Note that a similar argument can be used to explain why our relative inexperience with upside-down faces leads to them being processed in a more piecemeal, local manner than upright faces (Leder & Bruce, 1998). 
The “other-race effect”
Another result from the face recognition literature that can be understood in terms of a competitive model is the “other-race effect” (Chance, Turner, & Goldstein, 1982). This effect relates to the fact that people from one's own race are more accurately and rapidly discriminated from one another than those belonging to an unfamiliar race. This effect is interesting because it provides telling evidence for the fact that the perceptual quality of facial distinctiveness is learned. Earlier models of the effect have demonstrated how experience dependent image dimension reduction might explain these effects. O'Toole, Deffenbacher, Abdi, and Bartlett (1991), for example, demonstrated how a system designed to approximate principle component analysis could reproduce the effect. In many ways, their model differs quite considerably from the abstract-feature model being proposed here. It uses supervised training, seeks principle components rather than data clusters, and treats the facial image holistically. Nonetheless, the work draws a clear link between experience-based dimension reduction and the other-race effect. 
To simulate the effect using the abstract feature model, the model was now run with two sets of input patterns. One contained 500 patterns distributed evenly across the two dimensions. The second had identical statistical properties but was shifted along the first input dimension and contained just 50 exemplars. The two groups represent own race and other-race faces, respectively. As can be seen from Figure 11, the model naturally concentrates most of its classifiers on the same race faces but also assigns a number of classifiers more broadly to the other-race faces. The broader spacing of the other-race classifiers clearly brings with it a reduction in sensitivity to small changes in appearance (corresponding to shifts along either or both of the input dimensions). 
Figure 11
 
Simulation of the other-race effect. The cyan dots represent a set of 500 input patterns spread evenly across the two feature dimensions. These are the familiar, same race faces. The dark blue dots correspond to a set of other-race faces with equal variability but a different centre of mass and only 50 exemplars.
Figure 11
 
Simulation of the other-race effect. The cyan dots represent a set of 500 input patterns spread evenly across the two feature dimensions. These are the familiar, same race faces. The dark blue dots correspond to a set of other-race faces with equal variability but a different centre of mass and only 50 exemplars.
10.1167/8.3.20.M3
If we consider the system's reaction to a new same-race face, it is clear that it will be able to track fine differences across multiple dimensions due to the high concentration of closely bounded classifiers. In contrast, an other-race face is likely to activate more broadly tuned classifiers. As a result, differences in its appearance to previously viewed other-race faces are more likely to go undetected. 
Recognition and categorization
The purpose of this model has been to explain how many effects described in the face recognition literature emerge directly from a self-organizing, feature-based system. The role of the feature-based system is to provide a wholly object-based representation, one in which orientation, size, illumination, and other irrelevant information have been removed. As it stands, the system cannot solve explicit tasks or arrive at decisions. For that to happen, there must also exist a final decision layer that can perform arbitrary grouping of the feature-level neural responses. 
Training a system to associate these object-level descriptions with either natural categories (gender, age) or arbitrary, expert ones (vehicle manufacturer, type of tree) can all be achieved through the occurrence of natural similarities (more feature-similarity clustering) or via an external tuition signal, perhaps from a child's parent or schoolteacher. In other words, seeing the spatial or textural similarity between two male faces and two female faces allows gender identification to emerge without direct tuition. On the other hand, knowing that “a” and “A” are equivalent on some level does require tuition. 
In the case of an old/new recognition task, a final layer would be required to decide if the pattern of firings current in the abstract-feature level had occurred before—that is, whether the face was familiar or not. In order to test the ability of the abstract-feature model to support the solving of this simple behavioral task, and beyond that, in explaining the prototype effect, a second layer was added to the network. This layer contained five neurons that were once again arranged to form a competitive network. They each received input from 16 abstract-feature classifiers. These classifiers were chosen to split each facial input dimension into four equally sized segments. The input dimensions corresponded to the four facial regions manipulated in the behavioral tasks (eye, mouth, nose, surround). The five output neurons were then trained to recognize 17 of 35 faces set up in accordance with the behavioral task described in Experiment 1. In order to encode the faces, each unique nose, mouth, surround, or pair of eyes was assigned a random value from 0 to 1 along its corresponding input dimension, thereby activating one of the four classifiers assigned to each dimension. The output neurons then modified their input weights in accordance with the learning rules outlined in Equations 1 and 2
After training had produced a stable response to the 17 faces (200 iterations), the entire set of 35 faces were shown to the network (see Figure 3). Classification of each face as either familiar or unfamiliar was then made on the basis of whether one or more of the five output neurons fired above a preset threshold level of 1.5. Old/new discrimination performance for 20 face sets appears in Figure 12, revealing a remarkable accord between the behavioral results and performance of this rudimentary classifier network. Hence, by restricting the model to looking for familiar distributions of firing in the feature layer rather than facial wholes, the network has largely solved the task correctly but nonetheless incorrectly “recognizes” the prototype almost every time and certainly more often than the truly familiar faces. 
Figure 12
 
Results from the simulation of the prototype effect. By comparison with the behavioral results seen in Figure 5, it is clear that a simple model of the type described in the text is able to reproduce all of the important characteristics of the effect. The red line indicates performance for unfamiliar faces and blue for familiar ones.
Figure 12
 
Results from the simulation of the prototype effect. By comparison with the behavioral results seen in Figure 5, it is clear that a simple model of the type described in the text is able to reproduce all of the important characteristics of the effect. The red line indicates performance for unfamiliar faces and blue for familiar ones.
Discussion
Summary
The results presented in this paper reveal how two forms of an everyday facial discrimination task rely on the combination of multiple local feature analyzers rather than global information. This was true despite the high fidelity of the facial images and the opportunity afforded our participants to view the faces from multiple directions during training and testing. Feature-based processing of this type is at odds with models of face processing based on holistic cues (Bartlett & Searcy, 1993; Maurer et al., 2002; Tanaka & Farah, 1993). Instead, the results support a model in which multiple abstract feature analyzers act in concert to represent and to recognize faces (Torralba et al., 2007; Wallis & Bülthoff, 1999). 
Abstract features
At the core of the model being proposed lie neurons sensitive to specific pictorial subregions or broad shape cues such as head shape. Some features may correspond to simple nameable parts, but it seems likely that this would be the exception rather than the rule. The best source of information which we currently have as to what these features might be like comes from single cell recording and optical imaging studies in the macaque. This work has revealed cells responsive selectively to faces (or other objects), which can be effectively stimulated by subparts of a full face (or object) stimulus (Logothetis, 2000; Tanaka et al., 1991; Tsunoda et al., 2001; Wang, Tanaka, & Tanifuji, 1996; Yamane et al., 1988), consistent with a piecewise, feature-based mode of representation. However, attempts made with other neurons have often failed to produce an effective substimulus, leading some researchers to suggest that holistic representations are present too (Desimone, 1991; Logothetis, 2000; Tanaka et al., 1991). In practice, finding an effective substimulus may be an almost impossible task under normal, time limited, cellular recording conditions, especially for highly specialized neurons. A more feasible route to establishing the level of stimulus specificity is to test an array of same-category objects (e.g., other faces, in the case of face cells). Studies of this type on face cells suggest that even neurons from the most anterior parts of the temporal lobe respond to many of the faces tested (Abbott, Rolls, & Tovee, 1996; Perrett, Hietanen, Oram, & Benson, 1992; Young & Yamane, 1992). What is more, it seems unlikely that representations become any more specialized beyond this level since similar levels of stimulus selectivity have been reported for neurons in areas receiving input from the temporal lobe, such as the ventral striatum, frontal lobe, and amygdala (Leonard, Rolls, Wilson, & Baylis, 1985; Williams, Rolls, Leonard, & Stern, 1993; Wilson, Scalaidhe, & Goldman-Rakic, 1993). 
There are in fact good theoretical grounds for thinking that the representations should not become more holistic. Not least of these concerns one of the most serious problems facing any experienced-based model, namely, our ability to accurately recognize and to discriminate novel exemplars that fall within a region of expertise. Indeed, some authors have cited this as evidence that other recognition mechanisms must exist (Wang, Obama, Yamashita, Sugihara, & Tanaka, 2005). In fact, our ability to instantly generalize to novel exemplars is actually one of the best reasons to think that recognition is performed by multiple abstract-feature classifiers. As mentioned at the end of the modeling section above, models of this type are able to recombine a large number of specialist feature analyzers to create a unique description of the new input pattern (Torralba et al., 2007), bringing with it some degree of generalization across known transformations (Hinton, McClelland, & Rumelhart, 1986; Wallis & Bülthoff, 1999), something a truly holistic representation would struggle to achieve. Note that the degree of instant generalization exhibited by such a model must be good, but need not be perfect, since we know that humans are less than perfect at generalizing across the depth rotation of novel faces (Patterson & Baddeley, 1977). Indeed, many aspects of object recognition are not truly transform invariant without large amounts of exposure to a particular stimulus (see Edelman & Bülthoff, 1992; Graf, 2006). 
The issue of expertise
One of the most topical areas of current debate in the object and face recognition literature is that of expertise. There is good evidence that some neurons in the temporal lobe demonstrate remarkable selectivity for specific stimuli, most notably for those stimuli with which the animal is particular familiar. Face-selective neurons, in particular, dominate certain patches of the inferior part of the temporal lobe in monkey studies (Perrett et al., 1992; Tsao, Freiwald, Tootell, & Livingstone, 2006), mirroring the discovery of facial “hotspots” in human brain imaging studies (Kanwisher, McDermott, & Chun, 1997; Puce, Allison, Gore, & McCarthy, 1995; Sergent, Ohta, & McDonald, 1992). This has lead to the suggestion that some regions of visual cortex are designed especially for dealing with certain classes of visual stimulus (Kanwisher et al., 1997). 
The arguments laid out in the modeling work described above are largely consistent with the alternative view expressed by Gauthier and colleagues (Gauthier, Behrmann, & Tarr, 1999; Gauthier & Logothetis, 2000) that there is no specialist region for face processing per se. Instead, they argue that faces are an example of an object category in which most of us are experts. Gauthier and colleagues accept the existence of a functionally discrete system in cortex, but one filled with “expert neurons,” which are predisposed to fine, within-category discrimination—not specifically to faces. The degree to which representations are compartmentalized across temporal cortex remains an issue of current debate (Cohen & Tong, 2001; Grill-Spector, Sayres, & Ress, 2006; Haxby et al., 2001), but if such an area exists, we would argue that it may simply represent one endpoint of the ventral stream's convergent hierarchy. At such a level, cells are privy to information from the largest expanse of the visual array, allowing them to form the greatest level of invariance to stimulus transformations and hence the greatest level of stimulus selectivity (Perrett et al., 1992; Wallis & Rolls, 1997). 
The idea that expertise corresponds to greater selectivity and hence more holistic representations is consistent with some earlier models of face recognition learning in children (Carey & Diamond, 1994) and single cell recording studies of cells selective to highly trained, non-facial stimuli (Kobatake, Tanaka, & Wang, 1998; Logothetis, 2000; Miyashita, 1993; Rolls, Baylis, Hasselmo, & Nalwa, 1989). The theory has also received direct support in a more recent study in which single cells were shown to develop highly specialized object analyzers only after prolonged exposure to specific exemplars (Baker, Behrmann, & Olson, 2002). 
Conclusion
On the basis of the behavioral and modeling work presented in this paper, we would argue that the holistic versus parts debate in the face recognition literature can be resolved by considering a hierarchical, competitive model of object representation in which neurons learn to respond to pictorial features. These features will exhibit varying degrees of selectivity, transformation tolerance, and extent as a direct result of competitive processes within the visual-processing stream. The precise response properties will be a product of an individual's level of exposure to a stimulus class. More exposure leads to greater numbers of neurons representing the stimuli with ever finer sensitivity to changes in appearance. Increasing the concentration of neural resources to a particular object class naturally produces more integrated and specialized selectivity and hence an ever more holistic representation. All of these phenomena emerge naturally from a single, self-organizing model which, we would argue, treats faces in an identical fashion to other objects. 
Acknowledgments
We are grateful to the editor, Bosco Tjan, and two anonymous reviewers for detailed comments. Also to Thomas Vetter who pioneered much of the head morphing technology used in this paper and to Niko Troje for scanning and preparing the head models. Thanks also go to Adrian Schwaninger, James Bartlett, and Alice O'Toole for comments and discussion. This research was supported by the Australian Research Council Grant DP0343522, Human Frontiers Program Grant RGP 03/2006 and by the Max Planck Society. 
Commercial relationships: none. 
Corresponding author: Guy Wallis. 
Email: gwallis@hms.uq.edu.au. 
Address: School of Human Movement Studies, University of Queensland, QLD 4072, Australia. 
References
Abbott, L. F. Rolls, E. T. Tovee, M. J. (1996). Representational capacity of face coding in monkeys. Cerebral Cortex, 6, 498–505. [PubMed] [Article] [CrossRef] [PubMed]
Baker, C. I. Behrmann, M. Olson, C. R. (2002). Impact of learning on representation of parts and wholes in monkey inferotemporal cortex. Nature Neuroscience, 2002, 1210–1216. [PubMed] [Article] [CrossRef]
Bartlett, J. C. Searcy, J. (1993). Inversion and configuration of faces. Cognitive Psychology, 25, 281–316. [PubMed] [CrossRef] [PubMed]
Bruce, V. Doyle, T. Dench, N. Burton, M. (1991). Remembering facial configurations. Cognition, 38, 109–144. [PubMed] [CrossRef] [PubMed]
Cabeza, R. Bruce, V. Kato, T. Oda, M. (1999). The prototype effect in face recognition: Extension and limits. Memory & Cognition, 27, 139–151. [PubMed] [CrossRef] [PubMed]
Cabeza, R. Kato, T. (2000). Features are also important: Contributions of featural and configural processing to face recognition. Psychological Science, 11, 429–433. [PubMed] [CrossRef] [PubMed]
Carey, S. Diamond, R. (1994). Are faces perceived as configurations more by adults than by children? Visual Cognition, 1, 253–274. [CrossRef]
Chance, J. E. Turner, A. L. Goldstein, A. G. (1982). Development of differential recognition for own-and other-race faces. Journal of Psychology, 112, 29–37. [PubMed] [CrossRef] [PubMed]
Cohen, J. D. Tong, F. (2001). Neuroscience: The face of controversy. Science, 293, 2405–2407. [PubMed] [CrossRef] [PubMed]
Desimone, R. (1991). Face-selective cells in the temporal cortex of monkeys. Journal of Cognitive Neuroscience, 3, 1–8. [CrossRef] [PubMed]
Edelman, S. Bülthoff, H. H. (1992). Orientation dependence in the recognition of familiar and novel views of three-dimensional objects. Vision Research, 32, 2385–2400. [PubMed] [CrossRef] [PubMed]
Ellis, H. Davies,, G. Ellis,, H. Shepherd, J. (1981). Theoretical aspects of face recognition. Perceiving and remembering faces. London, UK: Academic Press.
Farah, M. J. Wilson, K. D. Drain, M. Tanaka, J. N. (1998). What is “special” about face perception? Psychological Review, 105, 482–498. [PubMed] [CrossRef] [PubMed]
Fukushima, K. (1988). Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Networks, 1, 119–130. [CrossRef]
Gauthier, I. Behrmann, M. Tarr, M. J. (1999). Can face recognition really be dissociated from object recognition? Journal of Cognitive Neuroscience, 11, 349–370. [PubMed] [CrossRef] [PubMed]
Gauthier, I. Logothetis, N. (2000). Is face recognition not so unique after all? Cognitive Neuropsychology, 17, 125–142. [CrossRef] [PubMed]
Gauthier, I. Tarr, M. J. (2002). Unraveling mechanisms for expert object recognition: Bridging brain activity and behavior. Journal of Experimental Psychology: Human Perception and Performance, 28, 431–446. [PubMed] [CrossRef] [PubMed]
Graf, M. (2006). Coordinate transformations in object recognition. Psychological Bulletin, 132, 920–945. [PubMed] [CrossRef] [PubMed]
Grill-Spector, K. Sayres, R. Ress, D. (2006). High-resolution imaging reveals highly selective nonface clusters in the fusiform face area. Nature Neuroscience, 9, 1177–1185. [PubMed] [CrossRef] [PubMed]
Haxby, J. V. Gobbini, M. I. Furey, M. L. Ishai, A. Schouten, J. L. Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293, 2425–2430. [PubMed] [CrossRef] [PubMed]
Hertz, J. Krogh, A. Palmer, R. (1990). Introduction to the theory of neural computation. Redwood City, CA: Addison-Wesley.
Hinton, G. E. McClelland, J. L. Rumelhart, D. E. Rumelhart, D. E. McClelland, J. L. (1986). Distributed representations. Parallel distributed processing. (1, pp. 77–109). Cambridge, MA: MIT Press.
Hole, G. J. (1994). Configurational factors in the perception of unfamiliar faces. Perception, 23, 65–74. [PubMed] [CrossRef] [PubMed]
Howell, D. (1997). Statistical methods for psychology. Belmont, CA: Wadsworth.
Kanwisher, N. McDermott, J. Chun, M. M. (1997). The fusiform face area: A module in human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17, 4302–4311. [PubMed] [Article] [PubMed]
Kobatake, E. Tanaka, K. Wang, G. (1998). Effects of shape discrimination learning on the stimulus selectivity of inferotemporal cells in adult monkeys. Journal of Neurophysiology, 80, 324–330. [PubMed]
Leder, H. Bruce, V. (1998). Local and relational aspects of face distinctiveness. Quarterly Journal of Experimental Psychology A: Human Experimental Psychology, 51, 449–473. [PubMed] [CrossRef]
Leder, H. Bruce, V. (2000). When inverted faces are recognized: The role of configural information in face recognition. Quarterly Journal of Experimental Psychology A: Human Experimental Psychology, 53, 513–536. [PubMed] [CrossRef]
Leonard, C. M. Rolls, E. T. Wilson, F. A. Baylis, G. C. (1985). Neurons in the amygdala of the monkey with the responses selective for faces. Behavioral Brain Research, 15, 159–176. [PubMed] [CrossRef]
Logothetis, N. K. (2000). Object recognition: Holistic representations in the monkey brain. Spatial Vision, 13, 165–178. [PubMed] [CrossRef] [PubMed]
Maurer, D. Grand, R. L. Mondloch, C. J. (2002). The many faces of configural processing. Trends in Cognitive Sciences, 6, 255–260. [PubMed] [CrossRef] [PubMed]
Miyashita, Y. (1993). Inferior temporal cortex: Where visual perception meets memory. Annual Review of Neuroscience, 16, 245–263. [PubMed] [CrossRef] [PubMed]
O'Toole, A. Deffenbacher, K. Abdi, H. Bartlett, J. (1991). Simulating the “other-race effect” as a problem in perceptual learning. Connection Science, 3, 163–178. [CrossRef]
O'Toole, A. J. Jonathon Phillips, P. Jiang, F. Ayyad, J. Penard, N. Abdi, H. (2007). Face recognition algorithms surpass humans matching faces over changes in illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 1642–1646. [PubMed] [CrossRef] [PubMed]
Patterson, K. E. Baddeley, A. D. (1977). When face recognition fails. Journal of Experimental Psychology: Human Learning and Memory, 3, 406–417. [PubMed] [CrossRef] [PubMed]
Perrett, D. I. Hietanen, J. K. Oram, M. W. Benson, P. J. (1992). Organization and functions of cells responsive to faces in the temporal cortex. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 335, 23–30. [PubMed] [CrossRef]
Peterson, M. Rhodes, G. (2003). Perception of faces, objects and scenes: Analytic and holistic processes. Oxford, UK: Oxford University Press.
Puce, A. Allison, T. Gore, J. C. McCarthy, G. (1995). Face-sensitive regions in human extrastriate cortex studied by functional MRI. Journal of Neurophysiology, 74, 1192–1199. [PubMed] [PubMed]
Reinitz, M. T. Morrissey, J. Demb, J. (1994). Role of attention in face encoding. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 161–168. [CrossRef]
Riesenhuber, M. Poggio, T. (2000). Models of object recognition. Nature Neuroscience, 3, 1199–1204. [PubMed] [CrossRef] [PubMed]
Rolls, E. T. (1992). Neurophysiological mechanisms underlying face processing within and beyond the temporal cortical visual areas. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 335, 11–21. [PubMed] [CrossRef]
Rolls, E. T. Baylis, G. C. Hasselmo, M. E. Nalwa, V. (1989). The effect of learning on the face selective responses of neurons in the cortex in the superior temporal sulcus of the monkey. Experimental Brain Research, 76, 153–164. [PubMed] [CrossRef] [PubMed]
Schwarzer, G. (1997). Kategorisierung von Gesichten bei Kindern und Erwachsenen: Die Rolle konzeptuellen Wissens [Development of face categorization: The role of conceptual knowledge]. Sprache und Kognition, 16, 14–30.
Sergent, J. Ohta, S. McDonald, B. (1992). Functional neuroanatomy of face and object processing: A positron emission tomography study. Brain, 115, 15–36. [PubMed] [CrossRef] [PubMed]
Solso, R. McCarthy, J. (1981). Prototype formation of faces: A case of pseudo-memory. British Journal of Psychology, 72, 499–503. [CrossRef]
Tanaka, J. W. Farah, M. J. (1993). Parts and wholes in face recognition. Quarterly Journal of Experimental Psychology A: Human Experimental Psychology, 46, 225–245. [PubMed] [CrossRef]
Tanaka, K. Saito, H. Fukada, Y. Moriya, M. (1991). Coding visual images of objects in the inferotemporal cortex of the macaque monkey. Journal of Neurophysiology, 66, 170–189. [PubMed] [PubMed]
Thompson, P. (1980). Margaret Thatcher: A new illusion. Perception, 9, 483–484. [PubMed] [CrossRef] [PubMed]
Torralba, A. Murphy, K. P. Freeman, W. T. (2007). Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 854–869. [PubMed] [CrossRef] [PubMed]
Tsao, D. Y. Freiwald, W. A. Tootell, R. B. Livingstone, M. S. (2006). A cortical region consisting entirely of face-selective cells. Science, 311, 670–674. [PubMed] [CrossRef] [PubMed]
Tsunoda, K. Yamane, Y. Nishizaki, M. Tanifuji, M. (2001). Complex objects are represented in macaque inferotemporal cortex by the combination of feature columns. Nature Neuroscience, 4, 832–838. [PubMed] [Article] [CrossRef] [PubMed]
Vetter, T. (1998). Synthesis of novel views from a single face image. International Journal of Computer Vision, 28, 103–116. [CrossRef]
Wallis, G. Bülthoff, H. (1999). Learning to recognize objects. Trends in Cognitive Sciences, 3, 22–31. [PubMed] [CrossRef] [PubMed]
Wallis, G. Rolls, E. (1997). A model of invariant object recognition in the visual system. Progress in Neurobiology, 51, 167–194. [CrossRef] [PubMed]
Wang, G. Obama, S. Yamashita, W. Sugihara, T. Tanaka, K. (2005). Prior experience of rotation is not required for recognizing objects seen from different angles. Nature Neuroscience, 8, 1768–1775. [PubMed] [CrossRef] [PubMed]
Wang, G. Tanaka, K. Tanifuji, M. (1996). Optical imaging of functional organization in the monkey inferotemporal cortex. Science, 272, 1665–1668. [PubMed] [CrossRef] [PubMed]
Williams, G. V. Rolls, E. T. Leonard, C. M. Stern, C. (1993). Neuronal responses in the ventral striatum of the behaving macaque. Behavioral Brain Research, 55, 243–252. [PubMed] [CrossRef]
Wilson, F. A. Scalaidhe, S. P. Goldman-Rakic, P. S. (1993). Dissociation of object and spatial processing domains in primate prefrontal cortex. Science, 260, 1955–1958. [PubMed] [CrossRef] [PubMed]
Yamane, S. Kaji, S. Kawano, K. (1988). What facial features activate face neurons in the inferotemporal cortex of the monkey? Experimental Brain Research, 73, 209–214. [PubMed] [CrossRef] [PubMed]
Yin, R. K. (1969). Looking at upside-down faces. Journal of Experimental Psychology, 81, 141–145. [CrossRef]
Young, A. W. Hellawell, D. Hay, D. C. (1987). Configurational information in face perception. Perception, 16, 747–759. [PubMed] [CrossRef] [PubMed]
Young, M. P. Yamane, S. (1992). Sparse population encoding of faces in the inferotemporal cortex. Science, 256, 1327–1331. [PubMed] [CrossRef] [PubMed]
Figure 1
 
Rival theories of face representation in the context of an old/new recognition task. (A) A holistic decision process is based upon decision boundaries within a “space” of facial images and predicts a level I but not a level II prototype effect (see text). (B) A feature-based model predicts that the decision process depends on the frequency with which subregions have been seen previously, irrespective of their particular combination. The black dots represent the number of previous exposures to those facial regions. The sections highlighted in green represent the regions that go to make up the prototype face.
Figure 1
 
Rival theories of face representation in the context of an old/new recognition task. (A) A holistic decision process is based upon decision boundaries within a “space” of facial images and predicts a level I but not a level II prototype effect (see text). (B) A feature-based model predicts that the decision process depends on the frequency with which subregions have been seen previously, irrespective of their particular combination. The black dots represent the number of previous exposures to those facial regions. The sections highlighted in green represent the regions that go to make up the prototype face.
Figure 2
 
(a) The stimuli were generated from a set of 3D head models that include separate texture and form information. (b) Four subregions were identified within each head, centered on the eyes, nose, mouth, and surround. Subregions selected from the heads of four individuals were morphed together to build hybrid head shapes. (c) These morphed heads were rendered in seven equally spaced orientations. During training and testing, preselected morphed heads were presented in rapid succession (300 ms per frame), providing the impression of a head rotating from one extreme profile to the other.
Figure 2
 
(a) The stimuli were generated from a set of 3D head models that include separate texture and form information. (b) Four subregions were identified within each head, centered on the eyes, nose, mouth, and surround. Subregions selected from the heads of four individuals were morphed together to build hybrid head shapes. (c) These morphed heads were rendered in seven equally spaced orientations. During training and testing, preselected morphed heads were presented in rapid succession (300 ms per frame), providing the impression of a head rotating from one extreme profile to the other.
Figure 3
 
Example of the stimuli used in the experiments. The prototype-centered face array is divided into familiar faces seen during training and unfamiliar ones seen only during testing. Each face is an amalgam of four facial regions located around the eyes, nose, mouth, and surround. The central face is the prototype. It has a specific number of facial parts in common with the surrounding faces. This number decreases as a function of distance from the prototype, as indicated by the digits (1–4) placed on the concentric rings. A total of 35 heads were required to build each face array.
Figure 3
 
Example of the stimuli used in the experiments. The prototype-centered face array is divided into familiar faces seen during training and unfamiliar ones seen only during testing. Each face is an amalgam of four facial regions located around the eyes, nose, mouth, and surround. The central face is the prototype. It has a specific number of facial parts in common with the surrounding faces. This number decreases as a function of distance from the prototype, as indicated by the digits (1–4) placed on the concentric rings. A total of 35 heads were required to build each face array.
Figure 4
 
d-prime results for Experiment 1. The graph records the ability of the participants to distinguish familiar from unfamiliar faces. For faces sharing many facial regions with the prototype (distance = 1 or 2), performance is worse than for faces which are very different (distance = 3 or 4). Nonetheless, performance was well above chance, suggesting that subjects were able to perform the task well at all four distances from the prototype.
Figure 4
 
d-prime results for Experiment 1. The graph records the ability of the participants to distinguish familiar from unfamiliar faces. For faces sharing many facial regions with the prototype (distance = 1 or 2), performance is worse than for faces which are very different (distance = 3 or 4). Nonetheless, performance was well above chance, suggesting that subjects were able to perform the task well at all four distances from the prototype.
Figure 5
 
The number of times a face was said to be familiar, expressed as a percentage. The red line indicates faces that were actually unfamiliar, and the blue line those that were indeed familiar. Asterisks indicate differences between the prototype and each of the groups of familiar faces (** p < 0.01, * p < 0.05).
Figure 5
 
The number of times a face was said to be familiar, expressed as a percentage. The red line indicates faces that were actually unfamiliar, and the blue line those that were indeed familiar. Asterisks indicate differences between the prototype and each of the groups of familiar faces (** p < 0.01, * p < 0.05).
Figure 6
 
Results from the rating experiment. The red line indicates unfamiliar faces and the blue line familiar ones. The boxes record the median response for each distance, and the error bars indicate the upper and the lower quartiles. The circles indicate mean rating values. Asterisks indicate differences between the prototype and each of the groups of familiar faces (*p < 0.05).
Figure 6
 
Results from the rating experiment. The red line indicates unfamiliar faces and the blue line familiar ones. The boxes record the median response for each distance, and the error bars indicate the upper and the lower quartiles. The circles indicate mean rating values. Asterisks indicate differences between the prototype and each of the groups of familiar faces (*p < 0.05).
Figure 7
 
Results from Experiment 3 reporting variation in the percentage of trials subjects judged a face to be familiar. The four lines reflect judgment as a function of distance from the prototype for familiar versus unfamiliar and upright versus inverted faces. Note that for unfamiliar faces a high score here corresponds to poor task performance.
Figure 7
 
Results from Experiment 3 reporting variation in the percentage of trials subjects judged a face to be familiar. The four lines reflect judgment as a function of distance from the prototype for familiar versus unfamiliar and upright versus inverted faces. Note that for unfamiliar faces a high score here corresponds to poor task performance.
Figure 8
 
Difference in overall performance (% correct) between upright and inverted faces as a function of distance from the prototype.
Figure 8
 
Difference in overall performance (% correct) between upright and inverted faces as a function of distance from the prototype.
Figure 9
 
Output of the competitive network after 1000 presentations of the 500 patterns (cyan dots) with 6 classifiers (red crosses). Black lines are the Voronoi diagram classification boundaries indicating the range over which each classifier “wins.” Classification is achieved with little sensitivity to the second, low variance dimension.
Figure 9
 
Output of the competitive network after 1000 presentations of the 500 patterns (cyan dots) with 6 classifiers (red crosses). Black lines are the Voronoi diagram classification boundaries indicating the range over which each classifier “wins.” Classification is achieved with little sensitivity to the second, low variance dimension.
10.1167/8.3.20.M1
Figure 10
 
Output of the network after 1000 presentations of the 500 patterns with 30 classifiers. Note that the classifiers are now sensitive to both stimulus dimensions, and that they form more symmetric clusters.
Figure 10
 
Output of the network after 1000 presentations of the 500 patterns with 30 classifiers. Note that the classifiers are now sensitive to both stimulus dimensions, and that they form more symmetric clusters.
10.1167/8.3.20.M2
Figure 11
 
Simulation of the other-race effect. The cyan dots represent a set of 500 input patterns spread evenly across the two feature dimensions. These are the familiar, same race faces. The dark blue dots correspond to a set of other-race faces with equal variability but a different centre of mass and only 50 exemplars.
Figure 11
 
Simulation of the other-race effect. The cyan dots represent a set of 500 input patterns spread evenly across the two feature dimensions. These are the familiar, same race faces. The dark blue dots correspond to a set of other-race faces with equal variability but a different centre of mass and only 50 exemplars.
10.1167/8.3.20.M3
Figure 12
 
Results from the simulation of the prototype effect. By comparison with the behavioral results seen in Figure 5, it is clear that a simple model of the type described in the text is able to reproduce all of the important characteristics of the effect. The red line indicates performance for unfamiliar faces and blue for familiar ones.
Figure 12
 
Results from the simulation of the prototype effect. By comparison with the behavioral results seen in Figure 5, it is clear that a simple model of the type described in the text is able to reproduce all of the important characteristics of the effect. The red line indicates performance for unfamiliar faces and blue for familiar ones.
Table 1
 
Summary of the effects and terms referred to in this paper.
Table 1
 
Summary of the effects and terms referred to in this paper.
Holistic effect The obligatory, automatic processing of faces as a whole entity. Relates to the enhanced recognition of a nameable part viewed in the context of other parts seen during training. This occurs even if attention is focused on only one part.
Configural effect Sensitivity to the layout and the spacing of nameable parts. Relates to the enhanced recognition performance seen with faces in which their parts appear in the original and correct facial layout.
Metrics-based representation Explicit representation of a face in terms of the distance between landmark points such as the eye centers, the tip of the nose, the chin, the hair line, etc. Not concerned with the appearance of the parts as such, just facial dimensions.
Structural description Recognition of each nameable part combined with explicit representation of the relative locations of each part.
Abstract features A multi-scale, image-based representation in which the pictorial matching happens first locally and then globally. No explicit representation of layout or distances, no explicit correspondence to nameable parts.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×