Purchase this article with an account.
Seoyoung Ahn, Gregory Zelinsky, Gary Lupyan; Exploring the effects of linguistic labels on learned visual representations using convolutional neural networks. Journal of Vision 2020;20(11):612. doi: https://doi.org/10.1167/jov.20.11.612.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Deep convolutional neural networks (CNNs) trained on large numbers of images are now capable of human-level visual recognition in some domains (Rawat and Wang, 2017). Analyses of the visual representations learned by CNNs show some resemblance to human visual representations (Kriegeskorte and Douglas, 2018), suggesting that these networks may offer a good model of human object recognition. Here, we use CNNs as a model of visual recognition for the purpose of exploring the effects of different types of labels on learning of visual categories. To what extent are labels necessary to distinguish between different kinds of tools, animals, and foods? One idea is that certain visual categories are so distinct that no guidance from labels is necessary. Alternatively, labels may help or even be necessary to discover certain types of categories. This question is difﬁcult to answer in human learners because we cannot control their prior visual and semantic experience. We trained multiple CNNs on the same set of images while manipulating the labels they receive: none, basic-level labels (dog, hummingbird, hammer, van), superordinate labels (mammal, bird, tool, vehicle), or various combinations. We then correlated the model’s choices on a triad task (given three images, select the one that is most different) with people’s choices. The performance of unsupervised models was strongly dependent on low-level visual differences, highlighting the importance of labels to the training process. More surprisingly, the best performance was achieved by models trained using the coarser-grained superordinate labels (vehicle, tool, etc.) rather than basic-level labels, even when predicting triads where all three objects came from the same or from different superordinate categories (e.g., banana, a bee, and a screwdriver). The beneﬁts of training with superordinate-level labels will be further discussed in the context of representational efﬁciency and generalizability.
This PDF is available to Subscribers Only