Abstract
As vision scientists, we are interested in understanding how humans and other animals process visually-presented objects. Selecting object stimuli and categories for probing object processing has remained challenging, for at least two reasons. First, small stimulus sets may suffer from selection bias, while large-scale stimulus sets may over-represent objects common in the stimulus pool. Second, selecting objects requires deciding the level of categorization for each object (e.g. "cod" or "fish"), which may or may not be relevant to the observer. To address these issues, we developed a large-scale database of objects in natural context, based on a list of 40,000 generally known English word lemmas (Brysbaert et al., 2014). From this list, we selected all nouns that were rated as concrete, that could be depicted, and that were not scenes. After carrying out word-sense disambiguation using WordNet, we determined the relevance of each object label to human observers. To this end, we selected one object image per label from Google and asked Amazon Mechanical Turk workers to label each object. After exclusion of objects that were rated highly inconsistently or provided a more general level of description (e.g. a picture of a "cod" called "fish"), this left us with a set of 1,850 object categories. We rank ordered each category based on a combination of word frequency and concreteness. For the selection of object images, we crawled Google Images, Flickr, and Imagenet based on manual screening of images (Mehrer et al., 2017). From those images, we manually selected a minimum of 12 object images per category, with natural background and cropped to square size for better comparability. The complete set of more than 24,000 object images, together with the rank ordering of categories, provides a novel resource for the study of object processing.
Meeting abstract presented at VSS 2018