Abstract
The most influential models of face perception over the past three decades have regarded the generation of an at least partially invariant representation as the necessary first step in most face processing tasks. Correspondingly, models of “high-level” face perception tasks such as personality judgment have relied on features likely to be measured or preserved in an invariant representation (e.g. physiognomy, pose). Several recent papers have proposed deep convolutional networks as a biologically-plausible mechanism for creating invariant representations, and have shown that those networks, when trained on recognition tasks, are able to predict activity in anterior portions of the ventral visual stream. We have created a series of models that predict human judgments of personality traits (such as trustworthiness and dominance). In these models, performance of “shallow” v1-like feature sets compares highly favorably to that of deep convolutional feature sets. By probing the feature space of these shallow models we show that the best-performing models’ performance is driven by low-level features that are highly variable, even among images of a single individual. These features can be modified to change the relevant attribute rating for a given image without changing humans’ subjective perception of invariant face characteristics such as identity or physiognomy. These results call into doubt the necessity or pre-eminence of an invariant face representation in the judgment of personality traits from face images. This in turn suggests that models of the face perception system – and of personality trait judgments – that operate on invariant representations of the type efficiently generated by biologically plausible deep convolutional networks may not capture the types of features most relevant for some high-level face perception tasks.
Meeting abstract presented at VSS 2015