Abstract
Semantic features are an essential part in the study of human knowledge about objects and have been playing an important role in understanding the role of semantics in object representations. Existing semantic feature norms have been produced by human participants and require extensive manual curation by experimenters. This process is not only time consuming and costly - humans are also known to omit important features. Recent advances in Natural Language Processing have led to large language prediction models that yield impressive performance in various language tasks. Here we asked to what degree recent language models can be used to automatically produce feature norms for arbitrary object concepts. To test this, we probed GPT-3 to generate semantic features for 1,854 objects from the THINGS database, using three human-generated examples as training data. Mirroring experimental approaches that merge data across human participants, we collected 30 runs, each finetuned with different example sentences. For better comparability to humans, we automatically preprocessed and normalized generated features. In comparison to existing norms, GPT-3 generated a broader set of features than humans, yet showed strong representational similarities with existing norms, highlighting their overall external validity. When predicting independent human similarity judgments and superordinate object categories, GPT-3 was competitive with human judgments. Explained variance in similarity judgments was mostly shared between human ratings and GPT-3, demonstrating that both humans and GPT-3 rely on similar conceptual information for producing object features. Together, our results demonstrate that semantic feature norms can be generated automatically using GPT-3 with comparable quality to human judgments. Further, we provide a new broad feature norm of 1,854 objects which may help improve our understanding of object knowledge and object representations. Finally, these results offer a new approach for comparing knowledge learned by humans to knowledge in language models.