Abstract
Deep convolutional neural networks (DCNNs) have attracted considerable interest as models of human perception. Previous research showed that, unlike humans, DCNNs have no sensitivity to global object shape. We investigated whether this limitation, involving spatial relations among parts, may be an instance of a more general insensitivity to abstract visual relations. We tested DCNNs’ learning and generalization of displays involving three relations: Same/Different, Enclosure, and More/Fewer. For Same/Different, we generated 20 shapes and used them in training images, each containing a pair of shapes that was either same or different. ImageNet-trained DCNNs were trained to respond same or different for varied positions and sizes of shapes. We then tested whether learning generalized to new shape pairs. For Enclosure, each training image consisted of one closed contour and 22 open contour fragments with a red dot placed either inside or outside the closed contour. We retrained DCNNs to report whether the dot was inside or outside the closed shape and then tested whether learning generalized to contours with lengths that differed from the training examples. For More/Fewer, we generated pairs of polygons with differing numbers of sides. One polygon was always red while the other had a random non-red color. We trained DCNNs to judge whether the red polygon had more or fewer sides than the other polygon and then tested generalization to polygon pairs with fewer sides. Results: Across all experiments, DCNNs achieved some degree of successful classification in tasks with the given set of training stimuli but showed no evidence of generalization of learning to modestly different cases. The results suggest that the relations were not learned. DCNNs appear to have crucial limitations that derive from their lack of computations involving abstraction and relational processing of the sort that are fundamental in human perception.