All scene elements that were present in any of the 105 target present and absent scenes were first identified to form a set of all possible scene element labels. Elements were defined as objects (e.g., pencil), groups of densely overlapping objects (e.g., pencils), and surfaces (e.g., desk, wall) within a scene. Then, from this global set of labels, each label was mapped to an individual element or elements within each scene using the Computer Vision Annotation Tool (CVAT,
https://github.com/opencv/cvat) (
Figure 2a).
Labels corresponding with the segmented elements were used to generate surface rankings for each target in each scene. Only unique and singular labels from the segmented scenes were used for the ranking task for each scene. Any repeated or plural labels were subsequently re-added during analysis and given the same weight as the unique and singular labels, respectively. Although we did not analyze the target present scenes in the present study, we still acquired their surface rankings. To decrease confusion to participants for these target present scenes, we excluded labels that were synonyms of the target object in these scenes. For the target “painting,” the following labels were excluded: drawing, drawings, picture, pictures, painting, paintings, poster, posters. For the target “drinking glass,” the following labels were excluded: glass, glasses, cup, cups, mug, mugs. For the target, “garbage bin,” the following labels were excluded: trashcan, dumpster, trash bin, bin.