Abstract
Convolutional neural networks (CNNs) have achieved amazing successes in visual object categorization tasks in recent years. Some have considered them to be good working models of the primate visual system, with different CNN layers corresponding to different levels of visual processing in the primate brain. However, much remains unknown about how visual information is processed within a CNN, making it more like a blackbox than a well understood system. Using methods developed in human neuroscience, here we examined information processing in several CNNs using two approaches. In our first approach, we compared the representational similarities of visual object categories from the different CNN layers to those obtained in retino-topically and functionally defined human occipito-temporal regions through fMRI studies. Regardless of the specific CNN examined, when natural object categories were used, the representational similarities of the early CNN layers aligned well with those obtained from early visual areas, presumably because these layers were modeled directly after these brain regions. However, representational similarities of the later CNN layers diverged from those of the higher ventral visual object processing regions. When artificial shape categories were used, both early and late CNN layers diverged from the corresponding human visual regions. In our second approach, we compared visual representations in CNNs and human visual processing regions in terms of their tolerance to changes in image format, position, size, and the spatial frequency content of an image. Again, we observed several differences between CNNs and the human visual regions. Thus despite CNNs’ ability to successfully perform visual object categorization, they process visual information somewhat differently from the human brain. As such current CNNs may not be directly applicable to model the details of the human visual system. Nevertheless, such brain and CNN comparisons can be useful to guide the training of more human brain-like CNNs.