September 2021
Volume 21, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2021
Human Detection of Deepfakes: A Role for Holistic Face Processing
Author Affiliations
  • Matthew Groh
    MIT Media Lab
  • Ziv Epstein
    MIT Media Lab
  • Rosalind Picard
    MIT Media Lab
  • Chaz Firestone
    John Hopkins University
Journal of Vision September 2021, Vol.21, 2390. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Matthew Groh, Ziv Epstein, Rosalind Picard, Chaz Firestone; Human Detection of Deepfakes: A Role for Holistic Face Processing. Journal of Vision 2021;21(9):2390.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Two of the most significant recent advances in artificial intelligence are (1) the ability of machines to outperform humans on many perceptual tasks, and (2) the ability of machines to synthesize highly realistic images of people, objects, and scenes. Nevertheless, here we report a surprising human advantage at the intersection of these two domains: The ability to detect Deepfakes. Deepfakes are machine-manipulated media in which one person’s face is swapped with another to make someone falsely appear to do or say something they did not — and it is of major theoretical and practical importance to develop methods that can tell Deepfakes from authentic media. Here, we pit the winning computer vision model from the Deepfake Detection Contest (DFDC) against ordinary human participants in a massive online study enrolling 7,241 people. Participants saw authentic and manipulated videos, and were asked to either (a) select which of two videos is a Deepfake (Experiment 1) or (b) share how confidently they think a video is a Deepfake (Experiment 2). In the two-alternative forced-choice design, the average completely untrained participant outperformed the very best computer vision model. In the single-stimulus design, the average participant outperformed the model on a sample of politically salient videos but underperformed the model on a sample of DFDC holdout videos. (Though approximately one fourth of participants outperformed the model on the DFDC sample.) Follow-up experiments revealed that holistic face processing partly explains this human edge: When the actors’ faces were inverted, misaligned, or occluded, participants’ ability to identify Deepfakes was significantly impaired (whereas the model’s performance was not impaired for misaligned or occluded videos but impaired for inverted videos). These results reveal a human advantage in identifying Deepfakes today and suggest that harnessing specialized visual processing could be a promising “defense” against machine-manipulated media.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.