Abstract
Artificial neural networks (ANNs) have revolutionized multiple fields, and were initially inspired by models of biological neural networks (BNNs). A growing body of work finds similarities between behaviors and representations in ANNs and BNNs. However, despite these similarities and shared foundations, ANNs exhibit surprising properties and failures that are generally believed not to exist in biological networks. One of the most dramatic of these failures is a susceptibility to adversarial perturbations, where a nearly imperceptible perturbation added to an input can cause an ANN to behave in a dramatically different fashion, for instance mislabeling an initially correctly identified school bus as an ostrich. Is human vision also susceptible to adversarial perturbations? Past work has shown that when images with adversarial perturbations of intermediate magnitude (±32 of 256 intensity levels) are shown to humans for a short time (∼70 ms), human object classification judgments are perturbed in the same direction as ANN’s. Other work has found that humans can identify adversarial examples with large magnitude perturbations and extended exposure times. Here, we find that human susceptibility to adversarial examples extends beyond these settings. We display images with small magnitude perturbations (between ±2 and ±32 out of 256 intensity levels) for an unlimited exposure time. We find that even at the smallest ±2 magnitude, these images perturb human judgments of object class in the same direction as an ANN trained to classify images. These results demonstrate that humans exhibit a peculiarity that was once assumed to be specific to machines, and are suggestive that perception in ANNs and BNNs is more similar than commonly believed.