August 2010
Volume 10, Issue 7
Free
Vision Sciences Society Annual Meeting Abstract  |   August 2010
Turn that frown upside-down! Inferring facial actions from pairs of images in a neurally plausible computational model
Author Affiliations
  • Joshua Susskind
    Department of Psychology, Boston University
  • Adam Anderson
    Department of Psychology, Boston University
  • Geoffrey Hinton
    Department of Psychology, Boston University
Journal of Vision August 2010, Vol.10, 666. doi:https://doi.org/10.1167/10.7.666
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Joshua Susskind, Adam Anderson, Geoffrey Hinton; Turn that frown upside-down! Inferring facial actions from pairs of images in a neurally plausible computational model. Journal of Vision 2010;10(7):666. https://doi.org/10.1167/10.7.666.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Most approaches to image recognition focus on the problem of inferring a categorical label or action code from a static image, ignoring dynamic aspects of appearance that may be critical to perception. Even methods that examine behavior over time, such as in a video sequence, tend to label each image frame independently, ignoring frame-to-frame dynamics. This viewpoint suggests that it is time-independent categorical information that is important, and not the patterns of actions that relate stimulus configurations together across time. The current work focuses on face perception and demonstrates that there is important information that can be extracted from pairs of images by examining how the face transforms in appearance from one image to another. Using a biologically plausible neural network model called a conditional Restricted Boltzmann Machine that performs unsupervised Hebbian learning, we show that the network can infer various facial actions from a sequence of images (e.g., transforming a frown into a smile or moving the face from one location of the image frame to another). Critically, after inferring the actions relating two face images from one individual, the network can apply the transformation to a test face from an unknown individual, without any knowledge of facial identity, expressions, or muscle movements. By visualizing the factors that encode and break down facial actions into a distributed representation, we demonstrate a kind of factorial action code that the network learns in an unsupervised manner to separate identity characteristics from rigid (affine) and non-rigid expression transformations. Models of this sort suggest that neural representations of action can factor out specific information about a face or object such as its identity that remain constant from its dynamic behavior, both of which are important aspects of perceptual inference.

Susskind, J. Anderson, A. Hinton, G. (2010). Turn that frown upside-down! Inferring facial actions from pairs of images in a neurally plausible computational model [Abstract]. Journal of Vision, 10(7):666, 666a, http://www.journalofvision.org/content/10/7/666, doi:10.1167/10.7.666. [CrossRef]
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×