Abstract
Image-recognizing deep neural networks now provide the gold standard for the modeling of primate visual cortex, predicting aggregate and individual neural profiles to striking accuracy. Their success in the modeling of rodent visual cortex, on the other hand, has been a bit more meted. Recent findings (Cadena & Others, 2019) have suggested that randomly initialized networks (never trained) provide about as predictive a set of features as the same networks when trained on image recognition, calling into question the use of such networks for the modeling of markedly different brains. We re-examine this finding with a methodology consisting of three components: one) the Allen Institute Brain Observatory two-photon calcium-imaging visual coding dataset (de Vries & Others, 2018); two) a battery of 11 ImageNet-pretrained architectures; and three) a cross-validated nonlinear least squares regression analysis in which we iteratively build a predicted representational dissimilarity matrix from across all features of each model for a given neural site and compare it to the actual representational dissimilarity matrix calculated on the images used by the Allen Institute. Contrary to previous findings, we find that ImageNet-pretrained models almost categorically outperform their randomly initialized counterparts by a large margin (Student’s t(460) = 7.3, p = 1.25e-12, Cohen’s d = .78). However, even the most performant model (SqueezeNet, with mean R2 of 0.116 +|- 0.075 SD) falls far short of the ceiling suggested by the split-half reliability of the neural data (with mean R2 of 0.58 +|- 0.252 SD), suggesting there remains room for substantial innovation in the engineering of both model architectures and training task. More broadly, it deepens the ongoing mystery of how exactly standard neural networks can serve as the model for the rich diversity (and fiendish complexity) of biological brains at scale – even when that scale is the size of a mouse.