September 2010
Volume 10, Issue 11
Free
Research Article  |   September 2010
Identifying regions that carry the best information about global facial configurations
Author Affiliations
  • Fatos Berisha
    Cognitive, Perceptual and Brain Sciences, UCL, London, UK
    Centre for Mathematics and Physics in the Life Sciences and Experimental Biology (CoMPLEX), UCL, London, UKfatos.berisha@capgemini.com
  • Alan Johnston
    Cognitive, Perceptual and Brain Sciences, UCL, London, UK
    Centre for Mathematics and Physics in the Life Sciences and Experimental Biology (CoMPLEX), UCL, London, UKhttp://www.psychol.ucl.ac.uk/vision/Lab_Site/Home.htmla.johnston@ucl.ac.uk
  • Peter W. McOwan
    Department of Computer Science, School of Electronic Engineering and Computer Science, Queen Mary University of London, London, UK
    Centre for Mathematics and Physics in the Life Sciences and Experimental Biology (CoMPLEX), UCL, London, UKhttp://www.eecs.qmul.ac.uk/~pmco/pmco@dcs.qmul.ac.uk
Journal of Vision September 2010, Vol.10, 27. doi:10.1167/10.11.27
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Fatos Berisha, Alan Johnston, Peter W. McOwan; Identifying regions that carry the best information about global facial configurations. Journal of Vision 2010;10(11):27. doi: 10.1167/10.11.27.

      Download citation file:


      © 2017 Association for Research in Vision and Ophthalmology.

      ×
  • Supplements
Abstract

Regions of the face are not equally important in conveying information about configural change. The bubbles spatial occlusion technique has proved to be a good method for revealing which areas carry diagnostic facial information for different perceptual categorization tasks. We have applied it here within a performance-driven mimicry system implemented using a computer-generated model of the face designed to automatically retarget the behavior of one face onto another face. Our bubbles technique, mapping an occluded face into a PCA model of the same face, revealed the areas around and including the mouth and eyebrows as the most important for facial image reconstruction. These regions overlapped with but interestingly were not identical to areas of maximum pixel-value variance. Here we show a system that is indifferent to stimulus content and uses the correlation between vectors in face space as a criterion, rather than just pixel-value correlation, identifies the eyebrows and mouth as important regions. This implies that the importance of the eyebrows and the mouth in dynamic face perception may depend not on the information content of the features per se but on the degree to which these regions of the face provide information about the global form of the face.

Introduction
Visual information from different areas of the face does not appear to contribute equally to human observer's ability to process faces (Buchan, Pare, & Munhall, 2007). In categorization tasks such as gender recognition and expression detection, subjects were found to use different visual information from the same visual input depending upon task (Gosselin & Schyns, 2001). The bubbles method has been used to reveal the diagnostic information employed in these categorization tasks. Essentially, observers are shown stimuli whose contrast is modulated by Gaussian windows of various sizes, distributed at random across the image. A record is kept of the locations and extent of the windows that led to accurate performance, thereby identifying locations on which discrimination performance depends. This method has now been used for the identification of critical regions for a great variety of categorization tasks, such as infant perceptual categorization (Humphreys, Gosselin, Schyns, & Johnson, 2006), perception of ambiguous figures (Bonnar, Gosselin, & Schyns, 2002), categorization of natural scenes (McCotter, Gosselin, Sowden, & Schyns, 2005), spatiotemporal dynamics of face recognition (Vinette, Gosselin, & Schyns, 2004), and even pigeons' visual discrimination behavior (Gibson, Wasserman, Gosselin, & Schyns, 2005). In the majority of these studies, the stimuli consisted of static images of faces, the tested subjects were human or animal, and the tasks were binary categorization tasks, i.e., “is the face male or female, expressive or non-expressive?”. 
Our aim was to study region-based information processing in faces for an ideal observer. We exploited a system designed to produce performance-driven photo-realistic facial mimicry (Cowe, 2003). This system extracts an actor's facial movements and allows for these movements to be automatically projected onto another person's face model, or avatar, without any need for marker assignment and tracking. The retargeting is achieved using a biologically plausible optic flow algorithm (Johnston, McOwan, & Benton, 1999; Johnston, McOwan, & Buxton, 1992) to determine correspondences between each frame and a reference frame and to morph the images onto that reference. The vector fields and morphed images for each frame are then subject to Principal Component Analysis (PCA), which describes the variation in these measures over an image sequence. We can exploit this synthetic face generation system to gauge the reconstruction fidelity of an occluded face in generating an original uncovered sequence. This will allow us to discover which regions of a face in motion are most important for the accurate recovery of the rest of the face. Thus, this procedure will reveal which local areas of the face carry most information about the global configuration of the face. 
During preliminary experiments, in which we generated performance-driven self-mimicries using driver sequences with arbitrary rectangular occlusions placed over certain facial features, it became clear from visual inspection that visual information from certain areas of the face was more important for the performance of our face model and overall recovery of actions from those occluded facial features. Here we try to locate these areas more precisely using the more principled bubbles method. 
Methods
We recorded and digitized two subjects, each telling a joke, using a JVC GR-DVL9600 digital camera. We used joke telling to engender natural animated movement of the faces (Hill & Johnston, 2001). Final image resolution was reduced from VGA resolution using bi-cubic interpolation to 120 × 80 pixels at 25 frames/s. These two sequences were the starting point for the production of avatars and occluded and non-occluded drivers. 
PCA-based face model and production of mimicries
Figure 1a illustrates how these sequences were vectorized, frame by frame. Each sequence frame is represented as a concatenated one-dimensional vector. The vector contains, for each pixel, the RGB values for that pixel together with the vector that describes the local motion required to warp the target image onto a selected reference frame (Blanz & Vetter, 1999). This flow field calculation was done using an adaptation of the Multi-channel Gradient Model (McGM), an optic flow algorithm modeled on the processing of the human visual system (Johnston et al., 1999, 1992). All frames are now represented as points in a high-dimensional space (Figure 1b). This representation allows nearby linear combinations of these frames to appear as plausible poses of that face. PCA was then performed on these concatenated image vectors, extracting a compact basis set that spans the examples and generates a virtual set of controls necessary for performance-driven animation (Figure 1c). Animation can be accomplished using this method because human faces are similar enough in feature locations and movement patterns. 
Figure 1
 
Vectorization and application of PCA on facial motion sequence. (a) A sequence frame is converted into a long N-dimensional vector (N = h × w × 5, the 5 denoting RGB and x and y values) by concatenating the texture and shape information of every pixel in the image. They can now be thought of as points in an N-dimensional space. (b) By application of PCA, a new improved orthonormal coordinate system centered on the mean μ (or some selected reference frame) is defined, with axes chosen to point in directions of maximum variance. (c) Columns show the first 3 principal components from a vectorized sequence of facial motion, while rows show −2, 0, and +2 standard deviations from the mean.
Figure 1
 
Vectorization and application of PCA on facial motion sequence. (a) A sequence frame is converted into a long N-dimensional vector (N = h × w × 5, the 5 denoting RGB and x and y values) by concatenating the texture and shape information of every pixel in the image. They can now be thought of as points in an N-dimensional space. (b) By application of PCA, a new improved orthonormal coordinate system centered on the mean μ (or some selected reference frame) is defined, with axes chosen to point in directions of maximum variance. (c) Columns show the first 3 principal components from a vectorized sequence of facial motion, while rows show −2, 0, and +2 standard deviations from the mean.
Having found a new coordinate system representing an individual's face, any facial movement taken from a sequence of any individual can be projected onto this basis, provided it is vectorized in the same manner and centered on its own sequence mean (Figure 2). 
Figure 2
 
Block diagram of the mimicry generation process. Morph vectorization used in this process enables the combination of image warping and image blending for realistic synthesis of facial movement without blur and without losing iconic or lighting changes (Cowe, 2003).
Figure 2
 
Block diagram of the mimicry generation process. Morph vectorization used in this process enables the combination of image warping and image blending for realistic synthesis of facial movement without blur and without losing iconic or lighting changes (Cowe, 2003).
So given a set of T training vectors from the face we wish to drive,
x
1,
x
2, …,
x
T , and a set of D driving vectors from the face we will use as a driver,
y
1,
y
2, …,
y
D , both sets are centered on their means and put into matrices Φ and Ψ, s.t. Φ = {
φ
1,
φ
2, …,
φ
T }, where
φ
i =
x
i
μ
T , and Ψ = {
ψ
1,
ψ
2, …,
ψ
D }, where
ψ
i =
y
i
μ
D . PCA then gives a set of basis vectors
b
1,
b
2, …,
b
P , where PT
To project the D-dimensional vector into the new P-dimensional subspace described by the principal component basis, we apply the basis transformation matrix B = {
b
1,
b
2, …,
b
P }: 
c i = B ψ i .
(1)
 
Each element of
c
i corresponds to a weighting on the respective basis vector
b
i . In order to transform this projection back to the original space translated to the standard origin, we apply the inverse transformation and add the training mean. Since PCs represent eigenvectors, i.e., guaranteeing orthonormality, then BB = I, which in turn implies that B is the inverse transformation, so 
z i = B c i + μ T .
(2)
 
This new vector
z
i is finally devectorized into an image frame of the generated sequence (Cowe, 2003). Note that a partial vector projected in to the PCA space allows the recovery of a whole face. 
Application of bubbles
The bubbles method was introduced by Haig (1985) and developed further by Gosselin and Schyns (2001). Gosselin and Schyns occluded the face with masks punctured by a number of randomly located Gaussian windows, or bubbles. Across trials, masks that revealed enough facial information for their human subjects to correctly categorize the occluded face were added up and divided by the sum of all masks, resulting in the ProportionPlane (Gosselin & Schyns, 2001). The averaged ProportionPlane is a measure of the relative importance of the image areas for the given task. Our face model, or avatar, was driven by instances of the same sequence (31 frames, 120 × 80 pixels), all occluded with 5000 random bubble masks (23 bubbles each, standard deviation: 5 pixels; Figure 3). 
Figure 3
 
Applied bubble mask. This is one example of the 5000 random bubble masks applied to the moving face driving sequence. The sequences were then processed and used to drive the avatar, resulting in 5000 mimicries.
Figure 3
 
Applied bubble mask. This is one example of the 5000 random bubble masks applied to the moving face driving sequence. The sequences were then processed and used to drive the avatar, resulting in 5000 mimicries.
A ground-truth reconstruction was generated by driving the avatar with a non-occluded sequence. The resulting occlusion-affected reconstructions were compared to the ground-truth reconstruction using a Pearson correlation metric. Initially, this metric was used to make image-based comparisons of reconstructions, comparing RGB values between the ground-truth and the bubble-masked reconstruction, frame by frame and pixel by pixel. This method failed to pick up on the comparatively subtle image changes in the reconstructions caused by the occlusions. It was presumably constrained by the inherent similarity of the images compared, i.e., all faces of the same person, and the minute variances in pixel intensity in most of the image. Thus, the resulting correlation values between the ground-truth and other reconstructions were all in the interval between 0.988 and 0.99, and high correlations did not correspond well to similarity as judged by inspection of the reconstruction fidelity by human observers. 
An accurate metric should be able to capture the structural information and/or sense the structural changes in the image signals. With this in mind, we decided to measure the similarity between principal component weightings
c
i extracted from the 5000 occluded driver sequence vectors
ψ
i , and those from the ground truth, again by calculating the Pearson correlation between the weightings used to produce the occluded reconstructions and those that resulted in the ground truth. 
These coefficients refer to particular principal components, which in turn code for specific, non-random structural changes in the faces. The correlation values obtained this way ranged between 0.16 and 0.91 in value (Figure 4). They corresponded very well with similarity judgments by human observers and therefore provide a good metric for our categorization task. 
Figure 4
 
Self-mimicry. (a) The bubble-mask occluded driver sequence (leftmost faces, occluded) was used to drive the avatar and produce the reconstructions (middle faces). These were compared using our correlation metric to the ground-truth reconstruction (rightmost faces), which was produced by a non-occluded driver. It can be seen that the model recovers the facial expressions quite successfully despite the occlusions, albeit in a somewhat muted form. (b) Points denote PC coefficient similarity between ground-truth and bubble-masked reconstructions. Take for instance the reconstruction generated by applying random mask number 728 (seen in the graph as the lowest lying point between mask numbers 500 and 1000). Its generating PC coefficients are shown in the graph to have very low correlation with those of the ground truth (corr = 0.16169), and upon visually inspecting the produced reconstruction 728, there was hardly any facial movement reproduction, due to the inauspicious locations of the applied occlusion mask. (c) The histogram view of (b). Only the masks that produced the reconstructions with PC coefficients highly correlated with ground-truth coefficients (represented by darker green colors in our histogram) were classified as “good” and used to derive the proportion plane.
Figure 4
 
Self-mimicry. (a) The bubble-mask occluded driver sequence (leftmost faces, occluded) was used to drive the avatar and produce the reconstructions (middle faces). These were compared using our correlation metric to the ground-truth reconstruction (rightmost faces), which was produced by a non-occluded driver. It can be seen that the model recovers the facial expressions quite successfully despite the occlusions, albeit in a somewhat muted form. (b) Points denote PC coefficient similarity between ground-truth and bubble-masked reconstructions. Take for instance the reconstruction generated by applying random mask number 728 (seen in the graph as the lowest lying point between mask numbers 500 and 1000). Its generating PC coefficients are shown in the graph to have very low correlation with those of the ground truth (corr = 0.16169), and upon visually inspecting the produced reconstruction 728, there was hardly any facial movement reproduction, due to the inauspicious locations of the applied occlusion mask. (c) The histogram view of (b). Only the masks that produced the reconstructions with PC coefficients highly correlated with ground-truth coefficients (represented by darker green colors in our histogram) were classified as “good” and used to derive the proportion plane.
Results
We used the original bubbles method described in the previous section to derive a ProportionPlane for both our test sequences. Gosselin and Schyns used human subjects to select the correctly categorized occluded faces, but here we chose the masks resulting in the top 10% correlation values to be our preferred masks, which we then used for deriving the ProportionPlane. It should be mentioned that, quite remarkably, it was clear from visual inspection of the reconstructions driven by sequences occluded with the preferred masks that they produced high fidelity reproduction of facial actions, despite the massive occlusions. This demonstrated that our PCA face model is robust enough to successfully recover aspects of expressions in those areas occluded in the driver sequence. However, the reproduced expressions in the avatar are slightly muted (Figure 4a). 
The preferred masks (i.e., those producing weightings with the top 10% correlational values with the ground-truth weightings) were first added together and then divided by the sum of all 5000 masks, resulting in the ProportionPlane. Figure 5 shows the emergence of the ProportionPlane as we refined our set of preferred masks (going from using all the best 2500, then the best 1333, 1250, and so on until we had the best 500). As we excluded more and more of the poor masks, the face map gradually reveals those facial areas that are most important for photo-realistic animation of our face model. This map can be seen as a measure of the relative importance of the regions of the 2D image for the task at hand. The important facial regions were the mouth and the eyebrows (Figure 6). 
Figure 5
 
Face map. As we gradually selected out the bubble masks corresponding to low correlation values, brighter areas emerged from the noisy image. The images provide a rough representation of facial areas important for photo-realistic animation of our face model. These are not simply the maximum pixel-value variance areas.
Figure 5
 
Face map. As we gradually selected out the bubble masks corresponding to low correlation values, brighter areas emerged from the noisy image. The images provide a rough representation of facial areas important for photo-realistic animation of our face model. These are not simply the maximum pixel-value variance areas.
Figure 6
 
Statistically significant diagnostic regions. Red areas denote the regions that attained statistical significance using our cluster test.
Figure 6
 
Statistically significant diagnostic regions. Red areas denote the regions that attained statistical significance using our cluster test.
The regions seem to overlap with the areas of maximum pixel-value variance (denoting the pixel intensity variation for each pixel in the image, over the whole sequence), but importantly they are not identical to them. This suggests that the method is not simply locating parts of the face that display most movement in our test sequences. 
To derive the statistical significance of diagnostic regions, we used an accurate statistical test for smooth classification images (Chauvin, Worsley, Schyns, Arguin, & Gosselin, 2005). This test is based on the probability that, above a threshold t, a certain pixel-size cluster in our Z-scored classification image has occurred by chance. The derivation of the significant regions in our face map was done with a standard cluster test technique from the Stat4Ci MATLAB toolbox, with p ≤ 0.05, σ = 5 pixels, and Z-score threshold of 3.1. This threshold level was low enough to both indicate significance and be sufficiently sensitive to detect wider regions of contiguous pixels. Figure 6 displays the obtained thresholded classification images from both sequences. The areas that attained statistical significance are shown using the red pixels. The face used in the experiment was overlaid to facilitate interpretation. These areas are indeed areas suggested as important for accurate reconstruction by the face map. 
To demonstrate the importance of these areas, we generated reconstructions by using only the information from these statistically significant diagnostic regions to drive the avatar. As expected, the resulting reconstructions were, visually, of very high fidelity. The correlation between these face-map-driven reconstructions and the ground truth was very high (0.88438 and 0.89984, well within the top 1% of original bubble-mask coefficients). 
Discussion
Facial motion provides important information for the identification and recognition of faces (Hill & Johnston, 2001; Knight & Johnston, 1997; Lander, Christie, & Bruce, 1999; O'Toole, Roark, & Abdi, 2002) and facial expressions (Cunningham & Wallraven, 2009). The bubbles technique has been extended to dynamic stimuli and used as a way to identify regions of a face that are important for perceptual discrimination. However, here we show, by studying the quality of image reconstruction from a machine learning technique with no explicit knowledge of faces, that regions are significant because they reflect the important sources of variation in the facial image. Thus, these areas are not necessarily important because of their functional roles (e.g., in visual speech or non-verbal communication) or because they are encoded by specialized neural modules but because these regions carry most information about the facial configuration, as indicated by the ease of reconstructing the face when these areas are present in a driving face. 
Gosselin and Schyns (2001) compared discrimination maps from humans and an ideal observer. They found qualitative differences between the two. However, their ideal observer was an image-based winner-takes-all model, which made a decision about which face was presented by comparing the Gaussian-windowed image against 32 equivalently Gaussian-windowed alternatives. The category (e.g., male or female) of the best match was taken as the categorical response. PCA encodes correlated changes in the face that result from physical and articulatory constraints and in the case of facial speech by the functional requirements of fluid verbal and non-verbal communication. The ideal observer employed here depends upon the similarity of PCA coefficients for the windowed and complete face, thus it reflects the degree to which the visible parts of the face can recover the whole face, i.e., the degree to which local information can recover global information. The PCA-based ideal observer indicated that the important regions of the face are around the mouth and the eyebrows. This is consistent with reports about the important locations of the face for recognition in human observers (Davies, Ellis, & Shepherd, 1977; Dupuis-Roy, Fortin, Fiset, & Gosselin, 2009; Haig, 1985; Schyns, Bonnar, & Gosselin, 2002). 
Although it is generally considered that the bubbles technique identifies diagnostic features present in the visible regions that subject can use for discrimination, the alternative is that these regions allow access to more global information. Caldara et al. (2005) in a bubbles experiment have shown that a prosopagnosic patient, PS, uses the mouth for discrimination of familiar faces unlike controls who tend to use the upper part of the face around the eyes. One hypothesis they consider, among others, is that the eye region may carry more information about the global configuration of the face than other regions and that the controls can exploit this while PS, not having access to configural information, cannot. This interpretation would seem to be supported by our analysis. 
In this study, faces are represented as deviations from a mean facial configuration. This is consistent with growing evidence from biological vision that faces are represented in terms of their deviation from a prototype (Rhodes, Brennan, & Carey, 1987; Valentine, 1991). Leopold, O'Toole, Vetter, and Blanz (2001) described a shift in the appearance of static average faces after adaptation to inverse caricatures or “anti-faces.” Adaptation makes the average face look more like the original face used to generate the inverse caricature. More recently, Leopold, Bondar, and Giese (2006) have shown that most neurons in the inferotemporal cortex of macaque monkeys increase their firing rate along the identity axis between the mean and the face to which the cells respond, although some showed monotonic decreases in firing rate. These experiments indicate that a viable representation system for faces could be based on some mean prototype with axes of deviation radiating from this mean. The evidence provided here suggests that the representation may not need to include the whole face as some regions are more informative about shape than others. 
Motion may have a key role to play in forming norm-based representations of the face (Johnston, in press). It seems clear that there would be little point in encoding a facial feature, the length of the nose for example if the feature was constant across faces. Similarly, there would be little point in separately encoding the length of the nose and the width of the mouth if these were perfectly correlated across faces. The importance of motion is that it provides information about what attributes change together in the face and what changes can be dissociated and therefore require that the representation be elaborated. The expressions of a single face can be thought of as an equivalence class containing all expressions that can be made by transforming that face (Johnston, 1992). Thus, motion allows face space to be subdivided into a set of equivalence classes. In this way, motion would seem to be fundamental in building representations of both moving and static faces. 
We identified significant regions by windowing them and then determining how effectively one could recover the original PCA-based description of the face. This exploits the fact that PCA allows complete faces to be recovered from partial data. Knowledge of important regions can improve the efficiency of encoding by providing a means of only encoding critical information. Finally, we showed that facial features (mouth and eyebrows) can be distinguished from the whole face in terms of their information content by using the strategy of spatial sampling and projection into a global PCA, thus linking features and configurations. 
Acknowledgments
This work was supported by the EPSRC. 
Commercial relationships: none. 
Corresponding author: Alan Johnston. 
Email: a.johnston@ucl.ac.uk. 
Address: Cognitive, Perceptual and Brain Sciences, UCL, London, UK. 
References
Blanz V. Vetter T. (1999). A morphable model for the synthesis of 3D faces.. Paper presented at the Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques.
Bonnar L. Gosselin F. Schyns P. G. (2002). Understanding Dali's slave market with the disappearing bust of Voltaire: A case study in the scale information driving perception. Perception, 31, 683–691. [CrossRef] [PubMed]
Buchan J. N. Pare M. Munhall K. G. (2007). Spatial statistics of gaze fixations during dynamic face processing. Social Neuroscience, 2, 1–13. [CrossRef] [PubMed]
Caldara R. Schyns P. Mayer E. Smith M. L. Gosselin F. Rossion B. (2005). Does prosopagnosia take the eyes out of face representations? Evidence for a defect in representing diagnostic facial information following brain damage. Journal of Cognitive Neuroscience, 17, 1652–1666. [CrossRef] [PubMed]
Chauvin A. Worsley K. J. Schyns P. G. Arguin M. Gosselin F. (2005). Accurate statistical tests for smooth classification images. Journal of Vision, 5, (9):1, 659–667, http://www.journalofvision.org/content/5/9/1, doi:10.1167/5.9.1. [PubMed] [Article] [CrossRef] [PubMed]
Cowe G. A. (2003). Example-based computer-generated facial mimicry. Unpublished Ph.D. thesis, University College London, London.
Cunningham D. W. Wallraven C. (2009). Dynamic information for the recognition of conversational expressions. Journal of Vision, 9, (13):7, 1–17, http://www.journalofvision.org/content/9/13/7, doi:10.1167/9.13.7. [PubMed] [Article] [CrossRef] [PubMed]
Davies G. Ellis H. Shepherd J. (1977). Cue saliency in faces as assessed by the “Photofit” technique. Perception, 6, 263–269. [CrossRef] [PubMed]
Dupuis-Roy N. Fortin I. Fiset D. Gosselin F. (2009). Uncovering gender discrimination cues in a realistic setting. Journal of Vision, 9, (2):10, 11–18, http://www.journalofvision.org/content/9/2/10, doi:10.1167/9.2.10. [PubMed] [Article] [CrossRef] [PubMed]
Gibson B. M. Wasserman E. A. Gosselin F. Schyns P. G. (2005). Applying bubbles to localize features that control pigeons' visual discrimination behavior. Journal of Experimental Psychology: Animal Behavior Processes, 31, 376–382. [CrossRef] [PubMed]
Gosselin F. Schyns P. G. (2001). Bubbles: A technique to reveal the use of information in recognition tasks. Vision Research, 41, 2261–2271. [CrossRef] [PubMed]
Haig N. D. (1985). How faces differ—A new comparative technique. Perception, 14, 601–615. [CrossRef] [PubMed]
Hill H. Johnston A. (2001). Categorizing sex and identity from the biological motion of faces. Current Biology, 11, 880–885. [CrossRef] [PubMed]
Humphreys K. Gosselin F. Schyns P. G. Johnson M. H. (2006). Using “Bubbles” with babies: A new technique for investigating the informational basis of infant perception. Infant Behavior and Development, 29, 471–475. [CrossRef] [PubMed]
Johnston A. (1992). Object constancy in face processing: Intermediate representations and object forms. Irish Journal of Psychology, 13, 425–438. [CrossRef]
Johnston A. (in press). Movement: A foundation for facial representation. In Curio C. Giese M. Bulthoff H. H. (Eds.), Dynamic faces: Insights from experiments and computation. Cambridge, MA: MIT Press.
Johnston A. McOwan P. W. Benton C. P. (1999). Robust velocity computation from a biologically motivated model of motion perception. Proceedings of the Royal Society of London B, 266, 509–518. [CrossRef]
Johnston A. McOwan P. W. Buxton H. (1992). A computational model of the analysis of some first-order and second-order motion patterns by simple and complex cells. Proceedings of the Royal Society of London B, 250, 297–306. [CrossRef]
Knight B. Johnston A. (1997). The role of movement in face recognition. Visual Cognition, 4, 265–273. [CrossRef]
Lander K. Christie F. Bruce V. (1999). The role of movement in the recognition of famous faces. Memory and Cognition, 27, 974–985. [CrossRef] [PubMed]
Leopold D. A. Bondar I. V. Giese M. A. (2006). Norm-based face encoding by single neurons in the monkey inferotemporal cortex. Nature, 442, 572–575. [CrossRef] [PubMed]
Leopold D. A. O'Toole A. J. Vetter T. Blanz V. (2001). Prototype-referenced shape encoding revealed by high-level aftereffects. Nature Neuroscience, 4, 89–94. [CrossRef] [PubMed]
McCotter M. Gosselin F. Sowden P. Schyns P. (2005). The use of visual information in natural scenes. Visual Cognition, 12, 938–953. [CrossRef]
O'Toole A. J. Roark D. A. Abdi H. (2002). Recognizing moving faces: A psychological and neural synthesis. Trends in Cognitive Sciences, 6, 261–266. [CrossRef] [PubMed]
Rhodes G. Brennan S. Carey S. (1987). Identification and ratings of caricatures: Implications for mental representations of faces. Cognitive Psychology, 19, 473–497. [CrossRef] [PubMed]
Schyns P. G. Bonnar L. Gosselin F. (2002). Show me the features! Understanding recognition from the use of visual information. Psychological Science, 13, 402–409. [CrossRef] [PubMed]
Valentine T. (1991). A unified account of the effects of distinctiveness, inversion, and race in face recognition. Quarterly Journal of Experimental Psychology A, 43, 161–204. [CrossRef]
Vinette C. Gosselin F. Schyns P. (2004). Spatio-temporal dynamics of face recognition in a flash: It's in the eyes. Cognitive Science, 28, 289–301.
Figure 1
 
Vectorization and application of PCA on facial motion sequence. (a) A sequence frame is converted into a long N-dimensional vector (N = h × w × 5, the 5 denoting RGB and x and y values) by concatenating the texture and shape information of every pixel in the image. They can now be thought of as points in an N-dimensional space. (b) By application of PCA, a new improved orthonormal coordinate system centered on the mean μ (or some selected reference frame) is defined, with axes chosen to point in directions of maximum variance. (c) Columns show the first 3 principal components from a vectorized sequence of facial motion, while rows show −2, 0, and +2 standard deviations from the mean.
Figure 1
 
Vectorization and application of PCA on facial motion sequence. (a) A sequence frame is converted into a long N-dimensional vector (N = h × w × 5, the 5 denoting RGB and x and y values) by concatenating the texture and shape information of every pixel in the image. They can now be thought of as points in an N-dimensional space. (b) By application of PCA, a new improved orthonormal coordinate system centered on the mean μ (or some selected reference frame) is defined, with axes chosen to point in directions of maximum variance. (c) Columns show the first 3 principal components from a vectorized sequence of facial motion, while rows show −2, 0, and +2 standard deviations from the mean.
Figure 2
 
Block diagram of the mimicry generation process. Morph vectorization used in this process enables the combination of image warping and image blending for realistic synthesis of facial movement without blur and without losing iconic or lighting changes (Cowe, 2003).
Figure 2
 
Block diagram of the mimicry generation process. Morph vectorization used in this process enables the combination of image warping and image blending for realistic synthesis of facial movement without blur and without losing iconic or lighting changes (Cowe, 2003).
Figure 3
 
Applied bubble mask. This is one example of the 5000 random bubble masks applied to the moving face driving sequence. The sequences were then processed and used to drive the avatar, resulting in 5000 mimicries.
Figure 3
 
Applied bubble mask. This is one example of the 5000 random bubble masks applied to the moving face driving sequence. The sequences were then processed and used to drive the avatar, resulting in 5000 mimicries.
Figure 4
 
Self-mimicry. (a) The bubble-mask occluded driver sequence (leftmost faces, occluded) was used to drive the avatar and produce the reconstructions (middle faces). These were compared using our correlation metric to the ground-truth reconstruction (rightmost faces), which was produced by a non-occluded driver. It can be seen that the model recovers the facial expressions quite successfully despite the occlusions, albeit in a somewhat muted form. (b) Points denote PC coefficient similarity between ground-truth and bubble-masked reconstructions. Take for instance the reconstruction generated by applying random mask number 728 (seen in the graph as the lowest lying point between mask numbers 500 and 1000). Its generating PC coefficients are shown in the graph to have very low correlation with those of the ground truth (corr = 0.16169), and upon visually inspecting the produced reconstruction 728, there was hardly any facial movement reproduction, due to the inauspicious locations of the applied occlusion mask. (c) The histogram view of (b). Only the masks that produced the reconstructions with PC coefficients highly correlated with ground-truth coefficients (represented by darker green colors in our histogram) were classified as “good” and used to derive the proportion plane.
Figure 4
 
Self-mimicry. (a) The bubble-mask occluded driver sequence (leftmost faces, occluded) was used to drive the avatar and produce the reconstructions (middle faces). These were compared using our correlation metric to the ground-truth reconstruction (rightmost faces), which was produced by a non-occluded driver. It can be seen that the model recovers the facial expressions quite successfully despite the occlusions, albeit in a somewhat muted form. (b) Points denote PC coefficient similarity between ground-truth and bubble-masked reconstructions. Take for instance the reconstruction generated by applying random mask number 728 (seen in the graph as the lowest lying point between mask numbers 500 and 1000). Its generating PC coefficients are shown in the graph to have very low correlation with those of the ground truth (corr = 0.16169), and upon visually inspecting the produced reconstruction 728, there was hardly any facial movement reproduction, due to the inauspicious locations of the applied occlusion mask. (c) The histogram view of (b). Only the masks that produced the reconstructions with PC coefficients highly correlated with ground-truth coefficients (represented by darker green colors in our histogram) were classified as “good” and used to derive the proportion plane.
Figure 5
 
Face map. As we gradually selected out the bubble masks corresponding to low correlation values, brighter areas emerged from the noisy image. The images provide a rough representation of facial areas important for photo-realistic animation of our face model. These are not simply the maximum pixel-value variance areas.
Figure 5
 
Face map. As we gradually selected out the bubble masks corresponding to low correlation values, brighter areas emerged from the noisy image. The images provide a rough representation of facial areas important for photo-realistic animation of our face model. These are not simply the maximum pixel-value variance areas.
Figure 6
 
Statistically significant diagnostic regions. Red areas denote the regions that attained statistical significance using our cluster test.
Figure 6
 
Statistically significant diagnostic regions. Red areas denote the regions that attained statistical significance using our cluster test.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×