Abstract
The recognition of dynamic facial expressions crucial for the social communication in primates. However, the underlying detailed neural circuits remain unclear, and relevant neurophysiological data is just being gathered now. Different computational mechanisms might account for the processing of dynamic facial expressions, which are inspired by mechanisms known from the processing of dynamic body stimuli and static faces. We present two fundamentally different neural models for the recognition of dynamic faces that imply quite different behaviors of dynamic face-selective neurons at the single-cell level. METHODS: Both models are hierarchical neural network models that process video sequences. The lower levels of the models consist of a hierarchy of feature detectors, either from a physiologically-inspired model of the visual pathway or implemented as deep neural network model (VGG16). The highest levels of the models are fundamentally different. One model uses shape detector neuros, trained with key frames from movies. These neurons are embedded in a recurrent neural network model that makes their responses sequence-selective. Models of this type account for details of single-cell data of hand- and body motion selective neurons. The second model exploits a norm-referenced encoding mechanism with neurons that represent difference vectors in feature space between the neutral face and the extreme frames of the encoded expressions. The firing rate of these neurons varies continuously with expression strength. This encoding mechanism has been shown to account for the identity tuning of face-selective neurons in monkey area IT. Adding an output mechanism with differentiator neurons that respond to changes of the output of such expression-encoding neurons, accounts for selectivity for dynamic expressions. RESULTS: Both mechanism are suitable to distinguish facial expressions of humans and monkeys. They make different predictions for stimuli that morph between neutral and highly expressive movements, which can be compared to the results of ongoing recordings.