Abstract
Near-scale, reach-relevant environments are the interface of our manual interactions with the physical world. Recent efforts have begun to probe perceptual and neural representations of naturalistic visual experience at this scale (Josephs & Konkle, 2019; Josephs & Konkle, 2020). Here, we use a computational approach to uncover major dimensions that can parameterize these spaces and predict human similarity judgments. In a large-scale online experiment, 1.2 million odd-one-out judgments were obtained on triplets sampled from 987 images of reachable-scale spaces (hereafter “reachspaces”), drawn from 329 different categories (N = 3,112 Turkers). This yielded a partial sampling of the similarity structure among the images. We then generated a Sparse Positive Similarity Embedding (Hebart et al., 2020), which is a predictive model of the full similarity structure, in which each image is formulated as a point in a multi-dimensional space, and the dimensions are inferred given sparse and positive encoding constraints. This procedure yielded a 31-dimensional embedding that predicted odd-one-out judgments with 59.8% accuracy (chance=33%, noise ceiling=66.3%). In a validation experiment (N=322), we fully sampled pairwise dissimilarities among a subset of 45 images, and found that this closely correlated with the dissimilarity structure predicted by the model (r = 0.87). K-Means clustering over pairwise dissimilarities derived from the embedding showed two major distinctions among reachspaces: food-related vs non-food reachspaces, and digital vs analogue reachspaces. Additionally, examination of the 31 dimensions comprising the embedding revealed interpretable attributes, e.g. “entertainment-related” (i.e. chessboard, poker table), “navigation-related” (i.e. steering wheels, cockpits), “storage-related” (i.e. drawers, shelves), and “cluttered” (i.e. messy desks or tables). Overall, these dimensions highlight differences in the functions, affordances, and visual appearances of different reach-relevant spaces and suggest that the similarity structure among reachspaces is related to the actions they support. These results provide a novel accounting of the representational structure of the reachable world.