Abstract
Face swapping algorithms, popularly known as "deep fakes", generate synthetic faces whose movements are driven by an actor's face. To create face swaps, users construct training datasets consisting of the two faces being applied and replaced. Despite the availability of public code bases, creating a compelling, convincing face swap remains an art rather than a science because of the parameter tuning involved and the unclear consequences of parameter choices. In this paper, we investigate the effect of different dataset properties and how they influence the uncanny, eerie feeling viewers experience when watching face swaps. In one experiment, we present participants with video from the FaceForensics++ Deep Fake Detection dataset. We ask them to score the clips on bipolar adjective pairs previously designed to measure the uncanniness of computer-generated characters and faces within three categories: humanness, eeriness, and attractiveness. We find that responses to face swapped clips are significantly more negative than to unmodified clips. In another experiment, participants are presented with video stimuli of face swaps generated using deepfake models that have been trained on deficient data. These deficiencies include low resolution images, lowered numbers of images, deficient/mismatched expressions, and mismatched poses. We find that mismatches in resolution, expression, and pose and deficient expressions all induce a higher negative response compared to using an optimal training dataset. Our experiments indicate that face swapped videos are generally perceived to be more uncanny than original videos, but certain dataset properties can increase the effect, such as image resolution and quality characteristics including expression/pose match-ups between the two faces. These insights on dataset properties could be directly used by researchers and practitioners who work with face swapping to act as a guideline for higher-quality dataset construction. The presented methods additionally open up future directions for perceptual studies of face swapped videos.