Abstract
Convolutional neural networks (CNNs) have recently received a lot of attention in the vision sciences as candidate models of core visual object recognition. At the behavioral level, these models show near-human object classification performance, allow for oftentimes excellent prediction of object-related choices, and explain significant proportions of variance in object similarity judgments. Despite these parallels, CNNs continue to exhibit a performance gap in explaining object-based representations and behavior. Here we aimed at identifying what factors determine the similarities and differences between CNN and human object representations. Paralleling object similarity judgments in humans, we generated 20 million in-silico triplet odd-one-out choices on 22,248 natural object images, using the penultimate layer activations of a pretrained VGG-16 model. Next, we applied a gradient-based similarity embedding technique that yielded 57 sparse, non-negative dimensions that were hi ghly predictive of the CNN’s odd-one-out choices. These dimensions were interpretable, reflecting properties of objects that are both visual (e.g. color, shape, texture) and conceptual (e.g. high-level category, value) in nature. While recent work indicated that CNNs respond to the texture of an object rather than its shape, our results reveal robust shape-related dimensions, indicating that texture bias may not be a general representational limitation. To probe the representational content of individual dimensions, we developed a dimension prediction approach, allowing us to (1) generate optimal stimuli for individual dimensions, (2) reveal image regions for driving these dimensions, and (3) causally manipulate individual image features to identify the dimensions’ representational nature. Despite strong parallels between CNNs and humans, a one-to-one mapping of CNN dimensions to human representational dimensions revealed striking differences for a subset of images, reve aling novel image biases that limit a CNNs generalization ability. Together, this interpretability technique offers a powerful new approach for understanding the similarities and differences between representations derived from behavior and CNNs.