Abstract
Intelligent machines now match human-level benchmarks in categorizing natural images, and even predict human brain activity—raising the exciting possibility that such systems meaningfully replicate aspects of human visual processing. However, a powerful reason to doubt such resemblance is that DNNs commit bizarre and unhumanlike errors. Such errors can be elicited by artificially generated “adversarial images”, which induce high-confidence misclassifications from DNNs. But even more alarmingly, high-confidence misclassifications can also arise from unmodified natural images that could actually appear in the real world—as when a honeycomb-patterned umbrella is misclassified as a “chainlink fence”, or a bird casting a shadow is called a “sundial”. These errors are widely taken to reveal a fundamental disconnect between human and machine vision, because of how surprising and unpredictable they seem from a human’s perspective. But are they really so counterintuitive? Here, three experiments (N=600) ask whether naive human subjects can anticipate when and how machines will misclassify natural images. Experiment 1 showed subjects ordinary images that machines classify correctly or incorrectly, and simply asked them to guess whether the machines got them right or wrong; remarkably, ordinary people could accurately predict which images machines would misclassify (E1). Follow-up experiments showed that subjects were also sensitive to how such images would be misclassified, by successfully predicting which images would receive which incorrect labels (E2), and in ways that couldn’t be explained by motivational factors or task demands (since “fake” labels failed to produce similar results; E3). That humans can intuit the (mis)perceptions of sophisticated machines has implications not only for the practical purpose of anticipating machine failures (e.g., when a human must decide to take the wheel from an auto-pilot), but also for the theoretical purpose of evaluating the representational overlap between human and machine vision.