Abstract
Convolutional neural networks (CNNs) have begun to rival human performance in terms of identifying classes of images from specific datasets. The CNN architecture roughly mimics the hierarchical visual processing of the ventral stream and these CNNs have been used successfully to explain a significant portion of neural responses in the human ventral cortex. We compared performance on an animal/non-animal categorization task based on 1) human behavior, 2) CNN scores, and 3) classification using EEG measurements. Images differed in their complexity, as indexed by their Spatial Coherence (SC) and Contrast Energy (CE) and were unknown to both the network and the human observers. Our results show a remarkable overlap in the behavior of the CNN, EEG classification and human behavior. All three perform best for images with a medium complexity and worst for images with a high complexity. Importantly, inspecting the type of errors, we observe that for images with a low and medium complexity by far most mistakes are made on trials in which an animal is present and the CNN, humans and EEG classifier fail to pick this up. For complex images the proportion of false alarms and misses are much more comparable. Looking at the results from the EEG classifier, we further observe that the classification reliability of animal vs. non-animal peaks between 300 and 400 ms. These results indicate that the object recognition process of both human (behavior and EEG) and CNN is influenced by the complexity of an image. This overlap strengthens the idea that CNNs are accurate models for human visual perception.
Meeting abstract presented at VSS 2016