Abstract
Applying machine learning algorithms to automatically infer relationships between concepts from large-scale collections of documents (embeddings) presents a unique opportunity to investigate at scale how human visual and semantic knowledge is organized, including how people judge fundamental relationships, such as similarity between concepts (‘How similar are a cat and a bear?’) and the features that describe them (e.g., size, furriness). However, efforts to date have shown a substantial discrepancy between algorithm predictions and human empirical judgments. Here, we introduce a novel approach of generating embeddings motivated by the psychological theory that semantic context plays a critical role in human judgments (i.e., the topic or domain being considered in the documents, such as descriptions of the natural world vs. writings about travel and transportation). Specifically, we train state-of-the-art machine learning algorithms to generate contextually-constrained embeddings using contextually-relevant text corpora (subsets of Wikipedia containing tens of millions of words). We show that by incorporating insights from human cognition into the training procedure of machine learning algorithms, we can greatly improve their ability to predict empirical visual and semantic similarity judgments and feature ratings of contextually-relevant concepts: our method exceeds 90% of maximum achievable performance in predicting similarity judgments, as well as the best performance to date in predicting feature ratings (e.g., size) for concrete real-world objects (e.g., ‘bear’). Furthermore, our method outperforms models trained on billions of words, which suggests that qualitative, psychologically relevant factors may be as important as sheer data quantity in constructing training sets for use with machine learning methods of investigating cognitive phenomena. By improving the correspondence between representations derived automatically by machine learning methods (embeddings) and empirical measurements of human judgments, the approach we describe helps advance the use of large-scale text corpora to understand the structure of human visual and semantic knowledge.