Abstract
A classic signature of high-performing neural network models of visual cortex is their strong accuracy on object classification tasks. However, recent work suggests that classification accuracy alone is an impoverished metric that fails to capture the complexity of biologically relevant visual representations. Here we sought to gain a richer understanding of the representational structure in a diverse set of deep neural networks (DNNs, N=492) by examining multiple geometric properties of their object manifolds (e.g., dimensionality, radius, between-category separation). We also examined the encoding performance of these networks for predicting scene-evoked fMRI responses in human visual cortex using the Natural Scenes Dataset. Our findings show that geometric properties of object manifolds are in some cases robust predictors of encoding performance, and they reveal the specific ways in which the representational spaces of these networks are structured. Rather than compressing images onto low-dimensional concept manifolds, the best models appear to rely on high-dimensional manifolds that are, nonetheless, well separated in representational space. This representational strategy allows models to separate categories in meaningful ways while still maintaining rich information about image-to-image variation within each category. Together, these findings suggest that the geometry of object manifolds offers a promising lens on the key characteristics of biologically relevant visual representations