Abstract
Models and brain measurements of visual processing have substantially increased in complexity in recent years. Summarizing and comparing their high-dimensional representations to each other requires specialized statistical methods. Here, we introduce new inference methods to evaluate models based on their predicted representational geometries, i.e. based on how well they match the distances or dissimilarities between the representations. Our inference methods are based on cross-validation and bootstrapping. We introduce a novel 2-factor bootstrap technique wrapped around a cross-validation procedure with analytically derived adjustments for the biases induced by 2-factor inflation of measurement noise and the choice of cross-validation folds. We validate our new inference methods using extensive simulations. We first simulate fMRI-like data based on local averages of deep neural network activations for images sampled from ecoset. In these simulations, we have full access to the true data generating process and can thus test a wide range of experiments. Additionally, we performed simulations based on subsampling data from large scale calcium imaging and fMRI experiments. These simulations are less flexible, but we are more confident that the patterns and their variability are representative of true experimental data. In all simulations, our new methods yield good estimates of the variance of model evaluations and thus valid statistical tests. In contrast, uncorrected bootstrap methods substantially overestimate variance and thus yield overly conservative tests. Also, ignoring the desired generalization to new stimuli leads to underestimated variance and thus to overly liberal tests. Similar statistical problems occur whenever other bootstrap methods aim to generalize to new stimuli and new subjects and/or are combined with cross-validation. Our new methods are available as part of the open source rsatoolbox in python at https://github.com/rsagroup/rsatoolbox.