Purchase this article with an account.
Lauren Kogelschatz, Elan Barenholtz; Matching voice and face identity from static images. Journal of Vision 2012;12(9):1023. doi: https://doi.org/10.1167/12.9.1023.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Previous research has suggested that people are unable to correctly choose which unfamiliar voice and face belong to the same person, based on a static image of the face. Here, we present evidence that people can perform this task with greater than chance accuracy. On each trial, participants saw pictures of two, same-gender faces, while simultaneously listening to a recording of the corresponding voice of one of the faces; participants had to choose which of the faces they thought belonged to the same person as the recorded voice. All of the stimuli were drawn from Caucasian undergraduate students (mean age: 21). In Experiment 1, the visual stimuli were frontal headshots (including the shoulders and neck) and the auditory stimuli were recordings of spoken sentences. Experiment 2 used the same auditory stimuli as Experiment 1 but used pictures in which only facial information (excluding the shoulders and neck) was visible. Experiment 3 used the same pictures as Experiment 1 but the auditory stimuli consisted of recordings of a single spoken word. In Experiments 1 and 2, people could choose, at better than chance levels, which faces and voices belonged to the same person. In Experiment 3, where less auditory information was available, performance was at chance levels. These results suggest that people do have the ability to correctly infer facial characteristics from vocal characteristics, given sufficient information. The mechanism underlying this ability is not currently clear, since additional analyses found no evidence that participants were correctly choosing based on matching basic physical traits such as age, weight, height and attractiveness. Previous negative findings may have been due to insufficient auditory information or too homogenous a sample of faces and voices. With ample auditory information and a sufficiently heterogeneous sample, people can choose what face and voice go together, based on static pictures.
Meeting abstract presented at VSS 2012
This PDF is available to Subscribers Only