Abstract
The layers of Deep Neural Networks (DNNs) have shown some commonality with processing in the visual cortical hierarchy. It is unclear, however, whether DNNs capture other behavioral regularities of natural scenes, e.g. the representativeness of an image to its category. Humans are better at categorizing and detecting good (more representative of category) than bad exemplars of natural scenes. Similarly, prior work has shown that good exemplars are decoded better than bad exemplars in V1 as well as higher visual areas such as the retrosplenial cortex (RSC) and the parahippocampal place area (PPA). Here we ask whether a DNN, that was not explicitly informed about representativeness of its training set, shows a similar good exemplar advantage, and if so in which layers do we see this effect. We used good and bad exemplars from six categories of natural scenes (beaches, city streets, forests, highways, mountains and offices) and processed them through the pre-trained Places205-AlexNet. We asked both whether this DNN could distinguish good from bad scenes, as well as whether 6-way category classification differed for good and bad exemplars. Classification was performed using the feature space of each layer separately. Our results show that while in the lowest layer (conv1) there is insufficient information to make a good vs. bad discrimination, the layer (fc7) (the second highest fully connected layer) can clearly make this discrimination. Furthermore, the six-way categorization was better for good exemplars than bad exemplars in lower and higher layers, although categorization accuracy increases overall at the higher layers. This parallels results seen in V1, RSC and PPA, and suggests that the DNN learns statistical regularities that distinguish good from bad exemplars without such information being explicitly encoded into the training set.
Meeting abstract presented at VSS 2017