Abstract
Deep Neural Networks (DNNs) trained for object classification have remarkably similar internal feature representations to neural representations in the primate ventral visual stream. This has led to the widespread use of encoding models of the visual cortex utilizing linear combinations of pre-trained DNN unit activities. However, DNNs struggle with generalization under distribution shifts, particularly when faced with out-of-distribution (OOD) samples. While DNNs excel at interpolating between training data points, they perform poorly when extrapolating beyond the bounds of the training data (e.g., Hasson et al., 2020). We characterized the generalization capabilities of DNN-based encoding models when predicting neuronal responses from the primate visual ventral stream. Using a large-scale dataset of neuronal responses from the macaque inferior temporal cortex to over 100,000 images, we simulated the effect of OOD neural activity prediction by dividing the images into multiple training and test sets, holding out subsets of the data to introduce different OOD domain shifts. This includes OOD low-level image features like contrast, hue, and size; OOD high-level features like animate vs inanimate, food vs non-food, different semantic object categories; and OOD K-means clusters in the distributed representations of ResNet features and neural data. For each feature, an OOD test set was constructed by defining a parametric value for that feature, and withholding from training a subset of possible values for testing. Overall, models performed much worse when predicting out-of-distribution image responses compared to standard cross-validation. Prediction on an IID test set with no distribution shift had an r^2 = 0.5, while OOD prediction ranges from 0.48 (images with OOD contrast shift) to as low as 0.1 (images with OOD hue). This indicates a deep problem in modern models of the visual cortex—the promise of current image-computable models remains limited to the training image distribution.