September 2018
Volume 18, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2018
Deep Neural Network Identifies Dynamic Facial Action Units from Image Sequences
Author Affiliations
  • Tian Xu
    Institute of Neuroscience and Psychology, University of Glasgow, Scotland, UK
  • Oliver Garrod
    Institute of Neuroscience and Psychology, University of Glasgow, Scotland, UK
  • Chaona Chen
    Institute of Neuroscience and Psychology, University of Glasgow, Scotland, UK
  • Rachael Jack
    Institute of Neuroscience and Psychology, University of Glasgow, Scotland, UK
  • Philippe Schyns
    Institute of Neuroscience and Psychology, University of Glasgow, Scotland, UK
Journal of Vision September 2018, Vol.18, 606. doi:https://doi.org/10.1167/18.10.606
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Tian Xu, Oliver Garrod, Chaona Chen, Rachael Jack, Philippe Schyns; Deep Neural Network Identifies Dynamic Facial Action Units from Image Sequences. Journal of Vision 2018;18(10):606. https://doi.org/10.1167/18.10.606.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The face is equipped with a large number of independent muscles that generate observable face movements such as nose wrinkling or smiling. These individual movements are called Action Units (AUs) in the Facial Action Coding System (FACS). FACS identifies about 40 AUs, each one of which has a variable amplitude. Here, we developed a two-stage deep neural network (DNN) that accurately categorized the underlying AUs from a sequence of image frames. We first trained a 10-layer ResNet AU decoder with 800,000 independent facial images of randomly activated AU combinations. Each image was generated by a 3D animation system that rendered the combination of up to 4 randomly selected AUs (from 42 possible AUs) with variable amplitude. The training outputs were the predicted amplitudes (range 0~1) of each AU (42 AUs). We next trained a LSTM (Long Short Term Memory) network to aggregate the predicted AU amplitudes learned from multiple images (i.e. a sequence of 30 frames) via ResNet. The final output is a binary AU vector that indicates the activated AUs over the sequence. We test our DNN on a dataset of 720 dynamic facial expression models, where each face model consists of 30 sequential image frames and the corresponding activated AUs. The d' of each AU's prediction demonstrates that the prediction of most AUs are reliable. Moreover, by comparing the representational dissimilarity matrices (RDMs) between each pair of AU vectors, we can observe that the output similarity pattern matches the input. To our knowledge, the proposed two-stage DNN is the first network to treat the AU prediction as a decoder problem, which can not only predict the activation of AUs, but also can predict the amplitude of activation for multiple AUs. This work will help further in decoding the relationship between emotion and action units.

Meeting abstract presented at VSS 2018

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×