Abstract
Recognizing facial emotions constitutes a core component of human social interactions. Previous research has discovered several cortical sub-regions ("face patches"), primarily localized in the primate inferior temporal (IT) cortex, that respond relatively selectively to faces. However, one can build algorithms that decode faces from the responses of more heterogenous samples of IT neurons. Here we test the link between psychophysical (behavioral) facial emotion discrimination ability and decoding models constructed from neurons with varying "face-selectivity". Adult human participants (n=12) discriminated the emotions (happy vs. fearful) in 80 images (8 identities) presented for 100 ms. We generated a hypothesis space linking IT activity and facial emotion behavior using convolutional neural network(CNN) models of primate vision (adapted to perform emotion discrimination). The estimate of "face-selectivity" for each model-IT neuron was proportional to the difference in its response to faces vs. other objects. We determined the goodness of a linking algorithm by correlating the CNN-IT-based emotional intensity predictions and the patterns of human behavioral discrimination. Interestingly, we observed that some CNNs (e.g., ResNets) generated stronger predictions when decoding algorithms contained more "face-selective" units (100 units per decoder). However, for many other models (e.g., Cornet-S), any random sample of their IT units demonstrated equally predictive decodes. To ask which of these hypotheses is consistent with primate visual processing, we performed large-scale recordings in the IT cortex of two macaques (~200 sites). Preliminary results show that macaque IT-based decodes incorporating more "face-selective" neurons better predict facial emotion discrimination. Interestingly, the predictions' strength and dependence on "face-selectivity" only prevailed during early (70-110ms) IT responses. In contrast, facial identity decodes were strongest during the late (150-200ms) IT responses. The high "face-selectivity" and early (putatively feedforward) IT response-based decoding that our results reveal, facilitate discriminating among and constraining the development of the next-generation brain models of facial emotion processing.