Abstract
Recent work has shown that both the visual cortex and deep neural networks exhibit a power law in the covariance spectra of their representations, suggesting that optimal visual representations have a high-dimensional structure. Of particular interest is the power-law exponent that defines how variance scales across latent dimensions. Here we extracted layer-wise activations from convolutional neural networks (CNNs) pre-trained on a variety of tasks and characterized the power-law exponent in their covariance spectra. We found that CNNs with lower power-law exponents were better models of the visual cortex. We also observed that the covariance spectra differed when spatial information was taken into account. The covariance spectra between convolution channels or within a single spatial location in a convolution filter exhibit a minimum power-law exponent of 1—similar to what has been observed in the visual cortex. However, the covariance spectrum for the full tensor of layer activations could be significantly lower. To investigate further, we developed a method for initializing untrained CNNs with a power-law in the covariance spectrum of their weights. Here we found that the power-law exponent of the weights significantly modulated the exponent in their activations, and a lower power-law exponent improved their ability to model neural activity in the visual cortex. Interestingly, these networks continue to exhibit a minimum power-law exponent of 1 even when the power-law exponent in their weights is far smaller. In sum, our work suggests that the power-law exponent of channel covariance spectra in CNNs is a key factor underlying model-brain correspondence and that there may be fundamental constraints on the power law exponent of deep neural network representations.