Abstract
Deepfake videos pose a threat to the integrity of our digital society by fueling the spread of misinformation. It is essential to develop techniques that increase both human and machine detection of doctored videos. Here, we introduce a new deepfake detection framework that uses human supervision to estimate locations of video artifacts, and increases their detectability. Specifically, we introduce a semi-supervised artifact attention module, which combines human annotations of distortions in deepfakes with internally-learned salience maps, to create artifact attention maps that highlight distorted regions of videos. This module makes two contributions. First, it improves model detection of deepfakes: a deepfake detector that leverages the artifact attention module in its self-attention layers achieves 98.21% detection accuracy, averaged over four different datasets, outperforming five baseline comparisons. Additionally, our detector generalizes better: when trained on one dataset and tested on another, we observe a 2% to 13% increase in generalization performance compared to competing approaches. Second, it allows us to generate novel ”Deepfake Caricatures”, video transformations where subtle unnatural movements are exacerbated to improve human detection. Videos are passed through a dedicated distortion module that detects and amplifies artifacts by taking the difference between consecutive frames and weighting them by the artifact attention maps. This procedure magnifies artificial distortions while leaving real videos untouched. In a behavioral experiment (N= 41, 2-alternative forced choice over 200 real and 200 deepfake videos), human participants were able to correctly detect 71% of deepfakes to which the module had been applied, compared to 42% without artifact magnification. Overall, our approach blends human and artificial supervision to yield high deepfake detection, and crucially, grants humans the ability to make their own judgment about the trustworthiness of visual media.