Free
Research Article  |   April 2009
Biological “bar codes” in human faces
Author Affiliations
Journal of Vision April 2009, Vol.9, 2. doi:https://doi.org/10.1167/9.4.2
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Steven C. Dakin, Roger J. Watt; Biological “bar codes” in human faces. Journal of Vision 2009;9(4):2. https://doi.org/10.1167/9.4.2.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The structure of the human face allows it to signal a wide range of useful information about a person's gender, identity, mood, etc. We show empirically that facial identity information is conveyed largely via mechanisms tuned to horizontal visual structure. Specifically observers perform substantially better at identifying faces that have been filtered to contain just horizontal information compared to any other orientation band. We then show, computationally, that horizontal structures within faces have an unusual tendency to fall into vertically co-aligned clusters compared with images of natural scenes. We call these clusters “bar codes” and propose that they have important computational properties. We propose that it is this property makes faces “special” visual stimuli because they are able to transmit information as reliable spatial sequence: a highly constrained one-dimensional code. We show that such structure affords computational advantages for face detection and decoding, including robustness to normal environmental image degradation, but makes faces vulnerable to certain classes of transformation that change the sequence of bars such as spatial inversion or contrast-polarity reversal.

Introduction
Faces are uniquely important visual stimuli to social primates such as humans. A sizable body of research indicates that we are able to extract an enormous range of information from faces, including traits such as gender, race, age, and states such as emotion, direction of gaze, and physical health (Bruce, 1988). 
In recent years, substantial advances have been made in understanding of encoding of facial information using the concept of face space (Leopold, O'Toole, Vetter, & Blanz, 2001; Valentine, 1991; Wilson, Loffler, & Wilkinson, 2002). Under this view, identity is represented as a locus in a multi-dimensional space, where the dimensions are independent perceptual attributes of faces. This approach proposes that encoding along each facial dimension is relative to properties of the average face. Within such a framework, facial distinctiveness is encoded as distance from the average, and caricatures are faces that have been shifted away from the average by moving them along an identity axis stretching from the average to the individual. Such approaches have proved useful for describing effects of caricature, poor discrimination of other-race faces, and shifts in appearance following adaptation to faces. 
At present, however, the dimensions being used to characterize an individual within face–space are typically the ( x, y) positions of an arbitrary collection of (usually manually selected) image control points (e.g., the center of the eyes, tip of the nose, etc.) in a suitable intrinsic spatial co-ordinate framework (intrinsic is where distance and direction are determined by the face not an arbitrary camera position). However, there are a number of low-level effects in face recognition that cannot be accounted for by this. For example, neither reversing the contrast polarity of the image (photographic negation) nor illumination of the face from below changes the location of any control point but both render faces much more difficult to recognize. Similarly, inverting a face does not change any intrinsic spatial relations between control points but renders recognition difficult. 
There is strong evidence for sensitivity to certain image properties (such as orientation and spatial frequency) early in visual processing (Hubel & Wiesel, 1968). The early stages appear to only detect simple elongated features such as edges and lines, neither points in an image nor arbitrarily complex structures. There is considerable evidence bearing on the issue of which spatial frequencies are used to convey different forms of facial information (for a review, see Ruiz-Soler & Beltran, 2006). In particular, low spatial frequencies (2–8 cycles per face) are thought to support holistic/configurational properties of the face (Goffaux, Hault, Michel, Vuong, & Rossion, 2005; Goffaux & Rossion, 2006) and only crude emotional information (Schyns & Oliva, 1999), while higher spatial frequencies (8–16 cycles per face) are thought to convey identity (Costen, Parker, & Craw, 1996; Gold, Bennett, & Sekuler, 1999) and more detailed expression (Norman & Ehrlich, 1987; Schyns & Oliva, 1999). Identity and facial expression are generally thought to be processed though a pathway from V1 to the fusiform area (Haxby et al., 2001; Kanwisher, McDermott, & Chun, 1997) relying on higher spatial frequencies with the exception of fearful facial expressions which are though to be supported by a direct subcortical projection to the amygdala (LeDoux, 1996) using lower spatial frequencies (Vuilleumier, Armony, Driver, & Dolan, 2003). The latter is not an uncontroversial view since (a) high spatial frequency around the eyes may contribute to fearful expressions (Smith, Cottrell, Gosselin, & Schyns, 2005) and (b) the amygdala is responsive to a range of non-fearful emotions (Winston, O'Doherty, & Dolan, 2003). 
Facial identity is largely conveyed by horizontal image structure
Figures 1a1c show three face images: panels a and b are real faces of two celebrities while panel c is an average of the two (generated by morphing the faces into registration and then averaging the two resulting images). We can restrict the orientation information in these images by filtering them; specifically Fourier transforming them and weighting the power of components with a wrapped Gaussian profile centered on one orientation with a particular orientation pass-band (here the standard deviation of the Gaussian is 20° selected to broadly match the orientation properties of neurons in the primary visual cortex). Figures 1d1f show the result of only allowing horizontal information to pass, while Figure 1g1i show the same analysis using vertical information. To put it another way, the images shown in Figures 1d1f and Figures 1g1i give an idea of what information would be passed by a bank of horizontal and vertically tuned V1 neurons respectively distributed across both space and spatial frequency. Notice that horizontal information conveys a great deal more information about facial identity than vertical information. This is due to internal features driving the response of horizontal filters strongly, while vertical filters are responsive only to the edges of head, the bridge of the nose and the center of the eyes. 
Figure 1
 
(a, b) Original face images and (c) their morphed average. (d–f) Horizontal and (g–i) vertical information contained in the three face images (bandwidth is σ = 20°).
Figure 1
 
(a, b) Original face images and (c) their morphed average. (d–f) Horizontal and (g–i) vertical information contained in the three face images (bandwidth is σ = 20°).
Next we can visually assess the contribution of each orientation component to the identity of the face, in the presence of other orientation structure, simply by combining individual horizontal or vertical components of the average with a perpendicular orientation from a particular face. Such “orientation-hybrid” images are illustrated in Figure 2. Figures 2a and 2b take their horizontal structure from the two famous faces and their vertical structure from the average; the identity is plain. On the other hand, if we take their vertical structure from famous faces and the horizontal from the average we get Figures 2c and 2d. The identity is now much more difficult to see. This would seem to be a special property of the horizontal filter response. Figures 2e2h show a similar analysis using filters at 45° and 135° and now identity remains uniformly ambiguous. 
Figure 2
 
(a–d) Orientation-hybrid images composed of (a, b) vertical information from the average face ( Figure 1c) and horizontal information from the two original faces. (c, d) Same but taking horizontal structure from the average and vertical information from the original faces. Notice that faces are recognizable only in panels a and b, indicating that identity is conveyed by horizontal structure. In support of this view, panels e–h show the same combinations using 45° and 135° filters; identity is consistently ambiguous.
Figure 2
 
(a–d) Orientation-hybrid images composed of (a, b) vertical information from the average face ( Figure 1c) and horizontal information from the two original faces. (c, d) Same but taking horizontal structure from the average and vertical information from the original faces. Notice that faces are recognizable only in panels a and b, indicating that identity is conveyed by horizontal structure. In support of this view, panels e–h show the same combinations using 45° and 135° filters; identity is consistently ambiguous.
Recognition of orientation filtered faces
In order to examine the relative importance of different orientations for conveying information about identity, we conducted a psychophysical experiment. We used images of celebrities collected from the Internet (color insets in Figure 3)—all with 3:4 aspect ratio and gamma corrected with a value of 0.45 (the average value used for viewing images). The only constraint for the selection of an image was that it was relatively high-resolution (>256 pixels in width) and that the head was oriented somewhere between a three quarter and frontal view (i.e., profile or near-profile views were rejected). We first determined which of 100 individual celebrity images our observers were able to correctly identify with unlimited viewing time. Ability to identify the faces ranged from 20% to 70%. A series of novel images of the celebrities each observer was previously able to identify were then used in the second phase of testing. Images were filtered according to one of eight interleaved conditions, with orientations of −90° (vertical) through 0° (horizontal) to +90° in steps of 22.5° (wrapped Normal distributions with an orientation bandwidth σ = 23°). Images were normalized to have equal RMS contrast across orientation conditions. No source image of a celebrity was ever presented twice to a subject, ensuring they could not recognize instances of certain images. Subject responded verbally and the experimenter recorded their response as correct or incorrect. Stimuli were presented for 250 ms on a linearized CRT monitor viewed at 1.0 m so that stimuli subtended 8.8 × 6.6 deg. Seven observers, all with normal or corrected-to-normal vision, participated. 
Figure 3
 
(Top) Stimuli from the psychophysical experiment. Each panel shows a face stimulus filtered to a single orientation band (indicated in red) along with the source image (top-right color inset in each panel). (Bottom) Percentage correct identification of filtered images as a function of orientation information (solid line shows the least-squares-fit of a Gaussian function)
Figure 3
 
(Top) Stimuli from the psychophysical experiment. Each panel shows a face stimulus filtered to a single orientation band (indicated in red) along with the source image (top-right color inset in each panel). (Bottom) Percentage correct identification of filtered images as a function of orientation information (solid line shows the least-squares-fit of a Gaussian function)
Results are plotted in the lower section of Figure 3. Data points show the average percent correct identification of faces as a function of the orientation of information presented. Error bars indicate 95% confidence intervals on performance (derived assuming binomial error). Subjects show a substantial advantage when presented with the horizontal information (∼56% identification) from faces that gradually declines as one moves towards vertical (∼35% identification). Note that this is likely a conservative estimate of the performance advantage for horizontal filtering since we did not attempt to control for any external cues that are likely to provide cues to identity regardless of filter orientation. 
Analysis: Horizontal “bar code” structure in faces
Our psychophysical results indicate that, in terms of face recognition, observers benefit most from the inclusion of information in a horizontal band. We next explored the nature of the image structure contained in this band by conducting a computational analysis of the features within filtered images of faces and comparing the results to similar analyses conducted using images of flowers (another class of objects) and natural scenes (representative of a typical visual diet). Figure 4a shows example images from each class. All images were 512 pixels square. In the case of scenes (1000 images), the images were derived by down-sampling original images at a ratio of 1:2 (each pixel in the output is the sum of a non-overlapping 2 × 2 pixel area in the input); a 512 square portion of the result was taken at a random location. For the flowers (300 images), the image was similarly down-sampled and then a 512 square was taken centered on the flower. For the faces (300 images), a face image was down-sampled so that the distance from the center of the mouth to a point midway between the two eyes and aligned with the center of the pupils corresponded to 128 pixels. The image was also rotated slightly so that the line joining these two points was strictly vertical. A 512 square image was then cropped from the result so that the midpoint between the eyes lay at the image center. 
Figure 4
 
Image analysis. (a) Three examples from each of the image classes. (b) The sine (even-symmetric) filter outputs in response to the (centra) face image at four spatial scales (2.5, 5, 10, and 20 cycle per face height). Superimposed are dots marking the loci of local extrema in the filter responses. (c) Histogram of relative frequency of occurrence (1 indicates uniform distribution) of inter-feature distances (in units of filter wavelengths) for horizontally filtered patterns (red: scenes; blue: flowers; yellow: faces). (d) Similar histogram of the orientation of lines connecting feature midpoints. (e) Scattergram of relative frequency (1 = uniform) of peak positions of blobs for the three classes of images examined. Note that only faces have an unusual tendency to produce short-distance, vertically arranged clusters of features.
Figure 4
 
Image analysis. (a) Three examples from each of the image classes. (b) The sine (even-symmetric) filter outputs in response to the (centra) face image at four spatial scales (2.5, 5, 10, and 20 cycle per face height). Superimposed are dots marking the loci of local extrema in the filter responses. (c) Histogram of relative frequency of occurrence (1 indicates uniform distribution) of inter-feature distances (in units of filter wavelengths) for horizontally filtered patterns (red: scenes; blue: flowers; yellow: faces). (d) Similar histogram of the orientation of lines connecting feature midpoints. (e) Scattergram of relative frequency (1 = uniform) of peak positions of blobs for the three classes of images examined. Note that only faces have an unusual tendency to produce short-distance, vertically arranged clusters of features.
For the analysis, the images were filtered with a set of log-Gabor filters using a fast Fourier transform in the MatLab (MathWorks Ltd.) programming environment. The filters and hence the filtered images were constructed so that the cosine (even-symmetric) form was in the real values and the sine (odd-symmetric) form in the imaginary values. This generates four phases of filter response: positive and negative each for cosine and sine form. Local extrema in the filter outputs were identified by standard numerical means, and each filtered image was then represented by a set of ( x, y) co-ordinate pairs grouped into 4 separate phases. 
To illustrate this process, Figure 4b shows the horizontal filter output in response to the central face image, at various spatial frequencies. Inspecting the regions of high filter activity, it is immediately striking that the face generates simple horizontal elongated features that are clustered so that their centers align along the vertical axes. 
Analysis 1: Clustering
In the first analysis, we quantified the extent of this regularity by examining the relative positions of local extrema in the filter outputs. We assigned a position, orientation, and phase to each local extremum ( Figure 4b); orientation is derived from the overall filter output around the extremum and is typically close to, but not necessarily identical to, the filter orientation. The spatial relation between any pair of extrema is then described as a vector (distance and direction) from one to the other, using the orientation of the first as a reference direction. The extent to which images contain “bar codes” is revealed by the statistical distribution of such vectors specifically as a peak at both short distances and at directions at right angles to the filtered image orientation. There is reason to suppose that peaks in adjacent response phases will tend to lie close to each other simply because the responses in adjacent filter phases are correlated. To avoid this uninteresting result, we examined only the spatial relation between extrema falling in regions of the same polarity. Figures 4c4d show histograms of the frequency with which the peak-to-peak vectors have a particular (c) distance and (d) direction. Distance data show a tendency for face images to produce peaks that are slightly closer to each other. The direction data are essentially flat for scenes and flowers but for faces have a pronounced peak at a direction of 90°. The same data are shown as two-dimensional scattergram in Figure 4e: again face images have a much higher tendency to show extrema that are locally aligned orthogonal to the underlying orientation structure. This analysis reveals what is special about face images: they generate locally parallel vertically aligned clustering of horizontal structures. We will refer to this type of pattern as a bar code
Analysis 2: Ubiquity of barcodes
We next conducted a simple scale-space analysis to examine how the bar code structure changes across spatial scale and to determine if filtered face images all generate the same form of bar codes. A typical scale-space analysis of a single face image is shown in Figures 5a and 5b and the full face set in Figure 5c. Individual face images were prepared and filtered as above, over a range of spatial scales from 0.5 to 10 cycles per face. Note the presence of the coarse-scale bar code arising from the global configuration of the features and of the local bar code structure at the location of internal features. Note also the persistence of the structure, particularly around the mouth, across several octaves of scale. 
Figure 5
 
(a, b) Scale space diagram for a slice through the vertical midline of an individual face, with either (a) coarse-scale or (b) fine-scale information, superimposed. The scale refers to the wavelength relative to the vertical distance between the midpoint of the eyes and the center of the mouth. (c) A scale space analysis of 300 faces. The regions of the diagram where all faces have the same sign of filter response (saturated colors) or 95% have the same sign of filter response (pale colors).
Figure 5
 
(a, b) Scale space diagram for a slice through the vertical midline of an individual face, with either (a) coarse-scale or (b) fine-scale information, superimposed. The scale refers to the wavelength relative to the vertical distance between the midpoint of the eyes and the center of the mouth. (c) A scale space analysis of 300 faces. The regions of the diagram where all faces have the same sign of filter response (saturated colors) or 95% have the same sign of filter response (pale colors).
For the full face-set analysis, consider one spatial scale and only even symmetric filter outputs. We take the filtered image at this scale from each face image, making 300 filtered images. Next we compute, for each point in the image, what proportion of the filtered-images have a positive filter output at this point and what proportion have a negative output. We then take the higher proportion and record its sign and the proportion value itself. This builds a new image in which each pixel is a measure of how likely a filtered face image is to have a particular sign of filter output at that point. So the values range from −1 (all images have negative output) through 0 (either sign of filter output equally likely) to +1 (all images have a positive filter output). Where this image has values close to ±1, then there is a structure that is common to all of our face images. 
This process was repeated at all spatial scales, leading to a 3D image with dimensions of x- and y-image space and z-spatial scale. In order to visualize the result, we take a slice through the vertical midline of the face showing filter response sign properties as a function of vertical (midline) face position and spatial scale. This is shown in Figure 5c. The results are clear and indicate the presence of coarse-scale bar code structure in every face tested. This tends to just be available in a single spatial scale, but there is a tendency for the structure to be more prominent in slightly finer spatial scales around the mouth region. In addition to this, there is a universal fine scale structure around the mouth with a completely reliable topological connection to the relevant parts of the coarse scale bar code. In the case of the eye region, we found a very common, but not universal pattern of the same type. In this case, inspection of filtered images suggests that different face images have different eye configurations—eyelids at different degrees of opening, eyebrows raised to different extents—so that that our use of simple spatial averaging will fail to reveal commonality of filter output at finer scales around the eyes. 
Discussion
To summarize, we have presented evidence that the information contained in the horizontal orientation structure of the human face is particularly informative with respect to facial identity. We have also shown that the horizontal image structure arising from faces is special, when compared to other comparable classes of image, in that it contains bar codes: clusters of locally co-aligned parallel features. We finish by considering what computational advantage this confers. For simplicity, we split the perceptual processes involved in face recognition into two stages: face detection and face decoding. The first stage involves deciding where in an image there are faces, the second in extracting sufficient information about each face to be able to recognize identity, for example. 
The first benefit for detection is evident from Figure 4: faces are unusual in generating bar codes. It follows that the presence of a bar code in an image is a good cue to the possible presence of a face (but not more). Moreover, bar codes are easy to detect and have a reliable sequence. This is because the bar code arises from the physical structure of the face and the assumption of lighting from-above (or at least not from-below). In part, the bar code is due to reflectance properties of the face (exposed skin on forehead and cheeks tends to be shiny; eyebrows separating them tend to be darker and matte; lips tend to be dark). The bar code also arises because the normal effects of lighting are to generate bright highlights on forehead and cheeks and shadow in the eye sockets and beneath the nose. Recently, Sadr, Jarudi, and Sinha (2003) have reported that erasing eyebrows is more disruptive to recognition than erasing the eyes suggesting that these features are also key to recognition. Eyebrows are singularly important in driving the horizontal response. One has only to look at Figure 4b to see that the eyebrows are driving a substantial proportion of the response at both high and low spatial frequencies. 
A reliable sequence is important both for face detection and for face decoding. Bar codes that contain the wrong sequence of stripes can be discounted as potential faces. A reliable sequence means that individual stripes can convey useful information that can be decoded by virtue of their position in the sequence. Since the different stripes in the bar code reliably correspond to different parts of the face (independently of the shape of the face), they convey information about that part of the face. For example, Figure 4a suggests that the widest dark stripe lies over the eyes. This appears to be a sound assumption under normal viewing conditions. For example, we have inspected 150 face images and found in every one of them the same pattern of stripes around the forehead and eyes (see Figures 4a and 4b). 
Moreover, the finer scale eye stripes ( Figure 4b) invariably overlap in space the coarse scale dark eye stripe. So, once the coarse scale bar code has been found, it can be used directly to index into the finer scale structure. Indexing facial features by reliable stripe sequence rather than some significantly more complex two-dimensional pattern process is bound to be more successful. 
The potential computational benefits of face bar codes are the same as those for commercial bar codes on merchandise which are now almost universal and arise fundamentally from the manner in which the pattern information in a bar code can be represented by a one-dimensional sequence or ordering. In the next section, we explore the extent to which there is evidence to support the hypothesis that human faces and human visual systems do indeed gain these benefits. 
Evidence for the importance of the bar code sequence
In general, face recognition is remarkably tolerant of many ecologically valid manipulations of the face: changes in viewing distance, facial pose, etc. A strong reason for this might be that the sequence of stripes in a bar code is normally robust against environmental degradation. So, an image transformation that leaves the sequence unchanged should be tolerated by face recognition performance. Image transformations that alter the sequence should have strong adverse effects on face recognition. Figure 6 illustrates that this appears to be so. The top row shows a variety of images of the same individual, the middle row horizontally filtered version of the same, and the bottom row schematic bar codes of the coarse-scale information. 
Figure 6
 
Below each image in the top row is its' horizontal information and below that a schematic “bar code” for the low SF component. (a) Both the polarity and the spatial inverse of the sequence are also bar codes that are highly dissimilar to the original. (b) Bar codes are tolerant to isotropic and anisotropic distortions that will maintain the order, polarity and the bar height compared to the overall height of the sequence. This will confer resistance to the effects of pose change. (c) Since the component stripes are essentially binary the effects of disrupting luminance of the source image will be minor provided that the sign of resulting features (above or below the mean-luminance) is preserved.
Figure 6
 
Below each image in the top row is its' horizontal information and below that a schematic “bar code” for the low SF component. (a) Both the polarity and the spatial inverse of the sequence are also bar codes that are highly dissimilar to the original. (b) Bar codes are tolerant to isotropic and anisotropic distortions that will maintain the order, polarity and the bar height compared to the overall height of the sequence. This will confer resistance to the effects of pose change. (c) Since the component stripes are essentially binary the effects of disrupting luminance of the source image will be minor provided that the sign of resulting features (above or below the mean-luminance) is preserved.
Figure 6b illustrates some quite drastic spatial transformations that are not disruptive to recognition. Observers can tolerate severe horizontal and vertical spatial distortions of the face image (Sinha, Balas, Ostrovsky, & Russell, 2006) and pose change in the subject. Affine spatial distortions (such as spatial compressions) do not change the bar code sequence or the relative properties of the stripes and consequently are not significant for bar code recognition. Viewpoint changes typically arise from head rotation in the sagittal (i.e., forward and backward head tilt) or transverse planes (i.e., head turn). Head rotation has a similar effect to horizontal compression; head tilt has a similar effect to vertical compression. Similarly, Figure 6c shows the effect of half-toning the face and presenting the subject under strong/variable illumination. Because these transformations have little effect of the output of horizontal filters, they leave the sequence of stripes intact. Figure 6a, however, shows the effect of spatial and contrast-polarity inversion, both transformations that are highly disruptive to human face recognition. Because the bar code is essentially a one-dimensional sequence of features, polarity and spatial inversion have the effect of producing bar codes that are sign or order reversed, respectively. These codes are highly (in the case of polarity-inversion, maximally) dissimilar to the original code. So these are cases where the bar code sequence is disrupted and face recognition is disrupted. Interestingly, neither transformation alters the intrinsic spatial relations between any points in the face image (such as control points). 
We can use the hybrid technique described above to investigate our intolerance to spatial and contrast-polarity inversion of faces. Previously, it has been shown that subjects' difficulty with recognizing contrast polarity-inverted faces stems from the phase reversal of the low spatial frequency components of the image (Hayes, Morrone, & Burr, 1986). Figure 7 (upper row) demonstrates that the disruptive effects of contrast reversal are also selective for horizontal information. Figure 7a shows an image composed of the sum of horizontal and vertical components drawn from a real face. Reversing the polarity of both components (Figure 7b) drastically reduces our ability to identify the face. Switching the polarity of just the horizontal component (Figure 7c) is equally disruptive, while reversal of the vertical component (Figure 7d) produces an image that is still easy to recognize. 
Figure 7
 
(a–d) Summed horizontal and vertical components of a face. Switching the contrast polarity of both components from (a) positive to (b) negative greatly disrupts recognition. Flipping the polarity of just the (c) horizontal or (d) the vertical component reveals that one's percept follows the horizontal structure. (e) The Thompson (1980) illusion. Inversion disrupts both face recognition and the ability to spot when features have been inverted. (f–i) Composites composed of horizontal and vertical components of the original and feature-inverted image. In each, the two components are drawn from the two (mismatched) inset images; note how ones percept follows the horizontal component.
Figure 7
 
(a–d) Summed horizontal and vertical components of a face. Switching the contrast polarity of both components from (a) positive to (b) negative greatly disrupts recognition. Flipping the polarity of just the (c) horizontal or (d) the vertical component reveals that one's percept follows the horizontal structure. (e) The Thompson (1980) illusion. Inversion disrupts both face recognition and the ability to spot when features have been inverted. (f–i) Composites composed of horizontal and vertical components of the original and feature-inverted image. In each, the two components are drawn from the two (mismatched) inset images; note how ones percept follows the horizontal component.
Spatial inversion of parts of a face can produce striking perceptual effects, as demonstrated by the well-known Thompson illusion: flipping the orientation of facial features (compared to the overall face orientation) is highly visible within upright but not within inverted faces (Thompson, 1980). Figure 7 (lower row) shows that this effect, like contrast-polarity reversal, is restricted to the horizontal information. Figure 7e shows the basic illusion. Figures 7f7i show horizontal–vertical hybrids containing one (H or V) component from the original face and one from the feature-inverted face; left and right insets of each panel show the source image used to generate horizontal and vertical components, respectively. Figures 7f and 7g have normal horizontal components and vertical components drawn from a feature-inverted face; Figures 7h and 7i have feature-inverted horizontal and normal vertical components. Only the latter pairings show the effect. Inverting the vertical components of the image does not change the sequence of bar codes; inverting the horizontal components does. 
Conclusions
The work presented here has shown that the horizontal component of face images is particularly informative for face recognition, and we have suggested that this is a consequence of face images eliciting locally one-dimensional clusters of horizontal filter responses that we refer to as bar codes. We propose that bar codes are resistant to variability in face images that arises from ecological transformations of the head (e.g., lighting, pose). We have shown that two ecologically unlikely transformations—spatial and polarity inversion—selectively disrupt horizontal information in faces, a finding that is consistent with such transformations being maximally disruptive to a one-dimensional bar code sequence. As far as we are aware, this is the first theory of low-level visual face processing where the disruptive effects of these two transformations emerge naturally. 
Acknowledgments
This research was funded by a project grant from the Wellcome Trust to SCD and by a Leverhulme Research Fellowship to RJW. 
Commercial relationships: none. 
Corresponding author: Steven C. Dakin. 
Email: s.dakin@ucl.ac.uk. 
Address: UCL Institute of Ophthalmology, University College London, Bath Street, London EC1V 9EL, UK. 
References
Bruce, V. (1988). Recognising faces. London: Lawrence Erlbaum.
Costen, N. P. Parker, D. M. Craw, I. (1996). Effects of high-pass and low-pass spatial filtering on face identification. Perception & Psychophysics, 58, 602–612. [PubMed] [CrossRef] [PubMed]
Goffaux, V. Hault, B. Michel, C. Vuong, Q. C. Rossion, B. (2005). The respective role of low and high spatial frequencies in supporting configural and featural processing of faces. Perception, 34, 77–86. [PubMed] [CrossRef] [PubMed]
Goffaux, V. Rossion, B. (2006). Faces are “spatial:” Holistic face perception is supported by low spatial frequencies. Journal of Experimental Psychology: Human Perception and Performance, 32, 1023–1039. [PubMed] [CrossRef] [PubMed]
Gold, J. Bennett, P. J. Sekuler, A. B. (1999). Identification of band-pass filtered letters and faces by human and ideal observers. Vision Research, 39, 3537–3560. [PubMed] [CrossRef] [PubMed]
Haxby, J. V. Gobbini, M. I. Furey, M. L. Ishai, A. Schouten, J. L. Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293, 2425–2430. [PubMed] [CrossRef] [PubMed]
Hayes, T. Morrone, M. C. Burr, D. C. (1986). Recognition of positive and negative bandpass-filtered images. Perception, 15, 595–602. [PubMed] [CrossRef] [PubMed]
Hubel, D. H. Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology, 195, 215–243. [PubMed] [Article] [CrossRef] [PubMed]
Kanwisher, N. McDermott, J. Chun, M. M. (1997). The fusiform face area: A module in human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17, 4302–4311. [PubMed] [Article] [PubMed]
LeDoux, J. (1996). Emotional networks and motor control: A fearful view. Progress in Brain Research, 107, 437–446. [PubMed] [PubMed]
Leopold, D. A. O'Toole, A. J. Vetter, T. Blanz, V. (2001). Prototype-referenced shape encoding revealed by high-level aftereffects. Nature Neuroscience, 4, 89–94. [PubMed] [CrossRef] [PubMed]
Norman, J. Ehrlich, S. (1987). Spatial frequency filtering and target identification. Vision Research, 27, 87–96. [PubMed] [CrossRef] [PubMed]
Ruiz-Soler, M. Beltran, F. S. (2006). Face perception: An integrative review of the role of spatial frequencies. Psychological Research, 70, 273–292. [PubMed] [Article] [CrossRef] [PubMed]
Sadr, J. Jarudi, I. Sinha, P. (2003). The role of eyebrows in face recognition. Perception, 32, 285–293. [PubMed] [CrossRef] [PubMed]
Schyns, P. G. Oliva, A. (1999). Dr Angry and Mr Smile: When categorization flexibly modifies the perception of faces in rapid visual presentations. Cognition, 69, 243–265. [PubMed] [CrossRef] [PubMed]
Sinha, P. Balas, B. Ostrovsky, Y. Russell, R. (2006). Face recognition by humans: 20 results all computer vision researchers should know about. Proceedings of the IEEE, 94, 1948–1962. [CrossRef]
Smith, M. L. Cottrell, G. W. Gosselin, F. Schyns, P. G. (2005). Transmitting and decoding facial expressions. Psychological Science, 16, 184–189. [PubMed] [CrossRef] [PubMed]
Thompson, P. (1980). Margaret Thatcher: A new illusion. Perception, 9, 483–484. [PubMed] [CrossRef] [PubMed]
Valentine, T. (1991). A unified account of the effects of distinctiveness, inversion, and race in face recognition. Quarterly Journal of Experimental Psychology A: Human Experimental Psychology, 43, 161–204. [PubMed] [CrossRef]
Vuilleumier, P. Armony, J. L. Driver, J. Dolan, R. J. (2003). Distinct spatial frequency sensitivities for processing faces and emotional expressions. Nature Neuroscience, 6, 624–631. [PubMed] [CrossRef] [PubMed]
Wilson, H. R. Loffler, G. Wilkinson, F. (2002). Synthetic faces, face cubes, and the geometry of face space. Vision Research, 42, 2909–2923. [PubMed] [CrossRef] [PubMed]
Winston, J. S. O'Doherty, J. Dolan, R. J. (2003). Common and distinct neural responses during direct and incidental processing of multiple facial emotions. NeuroImage, 20, 84–97. [PubMed] [CrossRef] [PubMed]
Figure 1
 
(a, b) Original face images and (c) their morphed average. (d–f) Horizontal and (g–i) vertical information contained in the three face images (bandwidth is σ = 20°).
Figure 1
 
(a, b) Original face images and (c) their morphed average. (d–f) Horizontal and (g–i) vertical information contained in the three face images (bandwidth is σ = 20°).
Figure 2
 
(a–d) Orientation-hybrid images composed of (a, b) vertical information from the average face ( Figure 1c) and horizontal information from the two original faces. (c, d) Same but taking horizontal structure from the average and vertical information from the original faces. Notice that faces are recognizable only in panels a and b, indicating that identity is conveyed by horizontal structure. In support of this view, panels e–h show the same combinations using 45° and 135° filters; identity is consistently ambiguous.
Figure 2
 
(a–d) Orientation-hybrid images composed of (a, b) vertical information from the average face ( Figure 1c) and horizontal information from the two original faces. (c, d) Same but taking horizontal structure from the average and vertical information from the original faces. Notice that faces are recognizable only in panels a and b, indicating that identity is conveyed by horizontal structure. In support of this view, panels e–h show the same combinations using 45° and 135° filters; identity is consistently ambiguous.
Figure 3
 
(Top) Stimuli from the psychophysical experiment. Each panel shows a face stimulus filtered to a single orientation band (indicated in red) along with the source image (top-right color inset in each panel). (Bottom) Percentage correct identification of filtered images as a function of orientation information (solid line shows the least-squares-fit of a Gaussian function)
Figure 3
 
(Top) Stimuli from the psychophysical experiment. Each panel shows a face stimulus filtered to a single orientation band (indicated in red) along with the source image (top-right color inset in each panel). (Bottom) Percentage correct identification of filtered images as a function of orientation information (solid line shows the least-squares-fit of a Gaussian function)
Figure 4
 
Image analysis. (a) Three examples from each of the image classes. (b) The sine (even-symmetric) filter outputs in response to the (centra) face image at four spatial scales (2.5, 5, 10, and 20 cycle per face height). Superimposed are dots marking the loci of local extrema in the filter responses. (c) Histogram of relative frequency of occurrence (1 indicates uniform distribution) of inter-feature distances (in units of filter wavelengths) for horizontally filtered patterns (red: scenes; blue: flowers; yellow: faces). (d) Similar histogram of the orientation of lines connecting feature midpoints. (e) Scattergram of relative frequency (1 = uniform) of peak positions of blobs for the three classes of images examined. Note that only faces have an unusual tendency to produce short-distance, vertically arranged clusters of features.
Figure 4
 
Image analysis. (a) Three examples from each of the image classes. (b) The sine (even-symmetric) filter outputs in response to the (centra) face image at four spatial scales (2.5, 5, 10, and 20 cycle per face height). Superimposed are dots marking the loci of local extrema in the filter responses. (c) Histogram of relative frequency of occurrence (1 indicates uniform distribution) of inter-feature distances (in units of filter wavelengths) for horizontally filtered patterns (red: scenes; blue: flowers; yellow: faces). (d) Similar histogram of the orientation of lines connecting feature midpoints. (e) Scattergram of relative frequency (1 = uniform) of peak positions of blobs for the three classes of images examined. Note that only faces have an unusual tendency to produce short-distance, vertically arranged clusters of features.
Figure 5
 
(a, b) Scale space diagram for a slice through the vertical midline of an individual face, with either (a) coarse-scale or (b) fine-scale information, superimposed. The scale refers to the wavelength relative to the vertical distance between the midpoint of the eyes and the center of the mouth. (c) A scale space analysis of 300 faces. The regions of the diagram where all faces have the same sign of filter response (saturated colors) or 95% have the same sign of filter response (pale colors).
Figure 5
 
(a, b) Scale space diagram for a slice through the vertical midline of an individual face, with either (a) coarse-scale or (b) fine-scale information, superimposed. The scale refers to the wavelength relative to the vertical distance between the midpoint of the eyes and the center of the mouth. (c) A scale space analysis of 300 faces. The regions of the diagram where all faces have the same sign of filter response (saturated colors) or 95% have the same sign of filter response (pale colors).
Figure 6
 
Below each image in the top row is its' horizontal information and below that a schematic “bar code” for the low SF component. (a) Both the polarity and the spatial inverse of the sequence are also bar codes that are highly dissimilar to the original. (b) Bar codes are tolerant to isotropic and anisotropic distortions that will maintain the order, polarity and the bar height compared to the overall height of the sequence. This will confer resistance to the effects of pose change. (c) Since the component stripes are essentially binary the effects of disrupting luminance of the source image will be minor provided that the sign of resulting features (above or below the mean-luminance) is preserved.
Figure 6
 
Below each image in the top row is its' horizontal information and below that a schematic “bar code” for the low SF component. (a) Both the polarity and the spatial inverse of the sequence are also bar codes that are highly dissimilar to the original. (b) Bar codes are tolerant to isotropic and anisotropic distortions that will maintain the order, polarity and the bar height compared to the overall height of the sequence. This will confer resistance to the effects of pose change. (c) Since the component stripes are essentially binary the effects of disrupting luminance of the source image will be minor provided that the sign of resulting features (above or below the mean-luminance) is preserved.
Figure 7
 
(a–d) Summed horizontal and vertical components of a face. Switching the contrast polarity of both components from (a) positive to (b) negative greatly disrupts recognition. Flipping the polarity of just the (c) horizontal or (d) the vertical component reveals that one's percept follows the horizontal structure. (e) The Thompson (1980) illusion. Inversion disrupts both face recognition and the ability to spot when features have been inverted. (f–i) Composites composed of horizontal and vertical components of the original and feature-inverted image. In each, the two components are drawn from the two (mismatched) inset images; note how ones percept follows the horizontal component.
Figure 7
 
(a–d) Summed horizontal and vertical components of a face. Switching the contrast polarity of both components from (a) positive to (b) negative greatly disrupts recognition. Flipping the polarity of just the (c) horizontal or (d) the vertical component reveals that one's percept follows the horizontal structure. (e) The Thompson (1980) illusion. Inversion disrupts both face recognition and the ability to spot when features have been inverted. (f–i) Composites composed of horizontal and vertical components of the original and feature-inverted image. In each, the two components are drawn from the two (mismatched) inset images; note how ones percept follows the horizontal component.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×