Abstract
What are the diagnostic dimensions of human object recognition and how do we find them? Here we use a convolutional neural network, trained for object classification, to construct a two-dimensional visual object space which we explore using psychophysics. In particular, we extracted two main visual dimensions explaining most variance (“animate-inanimate” and “stubby-spiky”), and projected images onto this space. This approach provides us with information about visual object properties, free from semantics and perceptual biases. We then administered a foraging task (N=73) where subjects clicked on each instance of one target object (e.g. toaster) but avoided any instance of another distractor object (e.g. television). The stimulus set included fake objects, real inanimate objects, and faces matched for visual qualities. We find that foraging speed correlated positively with target-distractor visual distances of fake objects, and to a lesser extent for real inanimate objects in the inanimate-stubby and inanimate-spiky quadrants of visual object space. Foraging speed was uncorrelated with visual distances of faces, and inversely related to visual distances for real inanimate objects within the animate part of object space, where objects had visual properties in contrast with their identity. In addition, we constructed a semantic object space for real inanimate objects to gain information about their semantic differences without the confound of visual factors. Semantic distances between object pairs explained additional variability in foraging speed in all cases. In summary, when people must discriminate between objects, visual qualities appear to be weighted highly for unknown objects lacking semantics. For real objects, some additional weight is put on semantic properties. In cases where an object’s appearance is in contrast with its identity, visual properties appear to be downweighted or even negatively weighed. The described paradigm and data thereby provide new insights into the factors that underlie object recognition.