Purchase this article with an account.
Gemma Roig, Anna Volokitin, Tomaso Poggio; Do Deep Neural Networks Suffer from Crowding?. Journal of Vision 2018;18(10):902. doi: 10.1167/18.10.902.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Crowding is a visual effect suffered by humans, in which an object that can be recognized in isolation can no longer be recognized when other objects, so called clutter, are placed close to it. In this work, we study the effect of crowding in artificial Deep Neural Networks (DNNs) for object recognition. We analyze both deep convolutional neural networks (DCNNs) as well as an extension of DCNNs that are multi-scale and that change the receptive field size of the convolution filters with their position in the image, called eccentricity-dependent models. The latter networks have been recently proposed for modeling the feedforward path of the primate visual cortex. Our results reveal that incorporating clutter into the images of the training set for learning the DNNs does not lead to robustness against clutter not seen at training. Also, when DNNs are trained on objects in isolation, we find that recognition accuracy of DNNs falls the closer the clutter is to the target object and the more clutter there is. We find that visual similarity between the target and clutter also plays a role and that pooling in early layers of the DNN leads to more crowding. Finally, we show that the eccentricity-dependent model trained on objects in isolation can recognize such target objects in clutter if the objects are near the center of the image, whereas the DCNN cannot.
Meeting abstract presented at VSS 2018
This PDF is available to Subscribers Only