We collected human similarity judgments in an odd-one-out task, where participants were shown three object images of and asked to choose which was most different from the other two. This paradigm (
Roberson, Davidoff, & Braisby, 1999;
Zheng et al., 2019) is especially well suited for our goal because we can select the objects in this task to differ in their level in the semantic hierarchy. We did this by varying the number of unique superordinate categories that appeared in a triplet. For example, when all three images in a triplet come from the same superordinate category (e.g., “lemon,” ”orange,” “banana”), the perceived similarity will be compared at the basic level. However, when only two images in a triplet belong to the same superordinate category (e.g., “lemon,” “orange,” “minivan”), the semantic oddity of the other image will focus the similarity comparison at the superordinate level. Each triplet consisted of three exemplar objects from the 30 categories used for our model training. All exemplar images came from
Zheng et al. (2019), except for “crate,” “hammer,” “harmonica,” and “screwdriver,” which were replaced with new exemplars to improve image quality and category representativeness. There are 4060 possible triplets that can be generated from all 30 categories, but constraints on behavioral data collection required that we sample only a subset of these. This subset included (1) the ten triplets having objects coming from the same superordinate category (e.g., mammals: “orangutan,” “lion,” “gazelle”), (2) all 435 triplets where two objects came from the same superordinate category (e.g., “orangutan,” “lion,” “minivan”), and (3) 1375 triplets where all objects came from different categories (e.g., “orangutan,” “minivan,” “lemon”), yielding 1820 unique triplets in total. Participants were 51 Amazon Mechanical Turk workers, each making responses on ∼200 triplets (5% , 42%, and 52% of these triplets belong to the subsets 1, 2, and 3, respectively). After removing responses having reaction times below 500 ms, we obtained 9697 similarity judgments where each unique triplet was viewed by 5.6 workers on average.