Visual acuity is better for vertical and horizontal compared to other orientations. This cross-species phenomenon is often explained by “efficient coding,” whereby more neurons show sharper tuning for the orientations most common in natural vision. However, it is unclear if experience alone can account for such biases. Here, we measured orientation representations in a convolutional neural network, VGG-16, trained on modified versions of ImageNet (rotated by 0°, 22.5°, or 45° counterclockwise of upright). Discriminability for each model was highest near the orientations that were most common in the network's training set. Furthermore, there was an overrepresentation of narrowly tuned units selective for the most common orientations. These effects emerged in middle layers and increased with depth in the network, though this layer-wise pattern may depend on properties of the evaluation stimuli used. Biases emerged early in training, consistent with the possibility that nonuniform representations may play a functional role in the network's task performance. Together, our results suggest that biased orientation representations can emerge through experience with a nonuniform distribution of orientations, supporting the efficient coding hypothesis.

*fft2.m*). We then multiplied the frequency-domain representation by an orientation filter and a spatial frequency filter. The orientation filter consisted of a circular Gaussian (Von Mises) function centered at the desired orientation, with concentration parameter (k) of 35 (full-width at half-max = 11.5°). The spatial frequency filter was a bandpass filter from 0.02 to 0.25 cycles/pixels, with Gaussian smoothed edges (smoothing SD = 0.005 cycles/pixel). After multiplying by these filters, we then replaced the image's phase with random values uniformly sampled between –pi to +pi (to randomize the spatial phase of oriented elements in the image) and transformed back into the spatial domain (using

*ifft2.m*). Next, we cropped the image back to its original size of 224 × 224 pixels, multiplied again by the smoothed circular mask, and converted the image into a 3-channel RGB format. Finally, the luminance in each color channel was normalized to have a mean equal to the mean of that color channel in the training ImageNet images and a standard deviation of 12 units. All image processing for the evaluation image sets was done using MATLAB R2018b (MathWorks, Natick, MA).

*gabor.m*function in MATLAB R2018b (MathWorks). Since all filtering was performed in the Fourier domain, we also used a custom modified version of the

*gabor.m*function which allowed us to directly generate a frequency-domain representation of each filter (Jain & Farrokhnia, 1991). Before filtering each image, we converted it to grayscale, subtracted its background color so that the background was equal to zero, and padded each image with zeros to a size of 1012 × 1012 pixels (this was the size needed to accommodate the lowest frequency filter). Images were then converted into the frequency domain for filtering (using

*fft2.m*) and multiplied by the filter bank. Next, we converted back to the spatial domain and unpadded the image back to its original size (224 × 224 pixels). Finally, we took the magnitude of the filtered image, and averaged the magnitude across all pixel positions to obtain a single value for each filter orientation and spatial frequency. Next, for each image, within each spatial frequency, we converted the orientation magnitude values into an estimated probability distribution by dividing by the sum of the magnitude across all orientations. Since this was done for all orientations of one spatial frequency at a time, this corrects for differences in power across spatial frequency and facilitates combining results across spatial frequency. Results were similar within each spatial frequency individually; we averaged over spatial frequency to produce the final plots (Figure 6B). This analysis was done on the training set images only, which included ∼1,300 images in each of 1,000 categories, for a total of ∼1.3 million images.

*f*(θ) is the unit's measured orientation tuning curve, and

_{i}*v*(θ) is the variance of the unit's responses to the specified orientation. We estimated the slope of the unit's tuning curve at θ based on the difference in its mean response (µ

_{i}_{i}) to sets of images that were Δ = 4° apart (using different values of Δ did not substantially change the results).

*nUnits*is the number of units in the layer. We computed

*FI*(θ) for theta values between 0° and 179°, in steps of 1°. When plotting FI, to aid comparison of this measure across layers with different numbers of units, we divided

_{pop}*FI*by the total number of units in the layer, to capture the average FI per unit. We note that this analysis was performed across all units at each layer, not excluding any units whose spatial receptive field was outside the circular stimulus region. These nonresponsive units contributed zero to the Fisher information sum at early layers. We note also that due to the different numbers of units per layer, the absolute values of FI are not directly comparable across layers.

_{pop}*Q*(θ) is the pooled covariance matrix, computed as:

*Q*′(θ) was the estimated derivative of the covariance matrix, obtained as:

_{1}and θ

_{2}are as defined in the previous section.

*Tr*denotes the trace operation, and

*Q*

^{−1}(θ) is the inverse of the covariance matrix. Values of θ and

*f*′(θ) were as defined in the previous section.

*x*

*nUnits*]), implemented using Scikit-learn in Python 3.6 (Python). We then computed the above expression for Fisher information using the scores for the top N principal components (PCs), for values of N ranging from 2 to 47 (there were 48 images per orientation, so covariance matrix estimates became unstable when using more features). Values were similar for different values of N (several examples of varying N shown in Figure 4B).

^{○}apart (as before, changing Δ did not substantially change the results). This resulted in two “clouds” of points in N-dimensional PC space, corresponding to the two orientations (see Supplementary Figure S2). We first exhaustively computed the Euclidean distances between all pairs of points in different clouds (48^2 = 2,304 total distances). Next, we computed a

*t*-statistic for these distances: the mean of all distance values divided by the standard deviation of all distance values. This measure reflects the reliability of the separation between point clouds corresponding to different orientations. We computed this measure for several different values of N.

*FI*is the sum of the FI values in a range ± 10° around the orientations of interest (0° and 90° for FIB-0, 67.5° and 157.5° for FIB-22, and 45° and 135° for FIB-45), and

_{peaks}*FI*is the sum of the FI values in a range ± 10° around the orientation chosen as a baseline (22.5° and 112.5°). Since FI is necessarily positive, each of these FIB measures can take a value between +1 and –1, with positive values indicating more information near the orientations of interest relative to the baseline (peaks in FI), and negative values indicating less information near the orientations of interest relative to baseline (dips in FI). An analogous method was used to compute the bias in multivariate FI (Figure 4) as well as the multivariate

_{baseline}*t*-statistic (Supplementary Figure S2; see previous section, Multivariate analyses).

*t*-tests between FIB values corresponding to each training set and the random models. Specifically, we tested the hypothesis that the primary form of bias measured in models corresponding to each training set (e.g., FIB-0 for the models trained on upright images, FIB-22 for the models trained on 22.5° rotated images, FIB-45 for the models trained on 45° rotated images) was significantly higher for the models trained on that image set than for the random (not trained) models. Since we generated four replicate models for each training image set, and evaluated each model on four evaluation image sets, there were 16 total FIB values at each layer corresponding to each training set. To compare the FIB values corresponding to each training set against the random models, we first calculated the “real” difference in FIB between the groups, based on comparing the mean of the 16 values for the trained models versus the mean of the 16 values for the random models. Next, we concatenated the values for the trained and random models (32 values total) and randomly shuffled the group labels across all values 10,000 times. For each of these 10,000 shuffles, we then computed the difference between groups based on the shuffled labels (“shuffled” differences). The final

*p*-value was generated by calculating the number of iterations on which the shuffled difference exceeded the real difference and dividing by the number of total iterations. The

*p*-values were FDR corrected across model layers at

*q*= 0.01 using SciPy (Benjamini & Yekutieli, 2001). The same procedure was used to test for differences in FIB-0 between the pretrained model and the control model (note that there was only one replicate for the pretrained model, so this test included only four data points per condition).

*u*is a parameter that describes the center of the unit's tuning function, and

*k*is a concentration parameter that is inversely related to the width of the tuning function. In this formulation, the

*k*parameter modifies both the height and the width of the tuning function. To make it possible to modify the curve's height and width independently, we normalized the Von Mises function to have a height of 1 and a baseline of 0, and then added parameters for the amplitude and baseline, as follows:

*v*(θ) denotes the Von Mises function after normalization. This resulted in a curve with four total parameters: center (

_{n}*u*), concentration parameter (

*k*), amplitude, and baseline.

^{–15}, and amplitude and baseline were allowed to vary freely. To prevent any bias in the center estimates due to the edges of the allowed parameter range, we circularly shifted each curve by a random amount before fitting.

^{2}. To assess the consistency of tuning across different versions of the evaluation image set, we used R

^{2}to assess the fit between the single best-fit Von Mises function (computed using the tuning function averaged over all evaluation image sets) and each individual tuning curve (there were four individual tuning curves, each from one version of the evaluation image set). We then averaged these four R

^{2}values to get a single value. We used a threshold of average R

^{2}> 0.40 to determine which units were sufficiently well-fit by the Von Mises function, and retained the parameters of those fits for further analysis.

*t*-test, FDR corrected

*q*= 0.01). However, there was a small increase in the FIB-0 at the later layers of the randomly initialized model, reflecting a weak cardinal bias (at the deepest layer, the FIB-0 was still more than 5× as large for the pretrained model as for the random model). We return to this issue for more consideration in the Discussion section.

*k*). Units that were not well-fit by a Von Mises were not considered further (approximately 30% of all units, see Methods, Single-unit tuning analysis section and Supplementary Figure S1). Figure 2B shows the distribution of fit centers for all units in four example layers of the pretrained model that were well-fit by a Von Mises function. These distributions show peaks at random locations for the first layer of the network but exhibit narrow peaks around the cardinal orientations for the deeper conv4_3 and fc6 layers (but see Supplementary Figure S3 and Discussion for consideration of whether this layer-wise increase may be stimulus set dependent). In contrast, the randomly initialized model did not show an overrepresentation of cardinal-tuned units (Supplementary Figure S1). In addition, plotting the concentration parameter for each unit versus the center (Figure 2C) shows that for the deepest three layers shown, the most narrowly tuned units (high

*k*) generally have centers close to the cardinal orientations. Together, these findings indicate that middle and deep layers of the pretrained network have a large proportion of units tuned to cardinal orientations, and that many of these units are narrowly tuned.

*t*-test, FDR corrected

*q*= 0.01) (Figure 8A). In contrast, the models trained on images rotated by 22.5° and 45° showed higher values for the FIB-22 and FIB-45, respectively (Figures 8B, 8C). In models trained on images rotated by 22.5°, the FIB-22 significantly exceeded that of the random models at pool2 and all layers deeper than pool2, with the exception of conv3_3 (one-tailed nonparametric

*t*-test, FDR corrected

*q*= 0.01). For the models trained on 45° rotated images, the FIB-45 significantly exceeded that of the random models for conv3_1 and all layers deeper than conv3_1 (one-tailed nonparametric

*t*-test, FDR corrected

*q*= 0.01).

*t*-statistic of distances in principal component space (Figure S2), we found similar values of bias at middle and late model layers. Thus, the extent to which biases are magnified versus merely reproduced at later model layers is not entirely clear. Future work is needed to resolve these issues and determine how low-level biases change with depth in the VGG-16 network.

*ArXiv*. Retrieved from http://arxiv.org/abs/1605.08695

*Neural Computation,*11(1), 91–101, https://doi.org/10.1162/089976699300016827.

*Psychological Bulletin*, 78(4), 266–278, https://doi.org/10.1037/h0033117. [CrossRef] [PubMed]

*Sensory Communication*(pp. 217–234). MIT Press, https://doi.org/10.7551/mitpress/9780262518420.003.0013.

*Perception*, 8(3), 247–253, https://doi.org/10.1068/p080247. [CrossRef] [PubMed]

*Annals of Statistics*, 29, 1165–1188, https://doi.org/10.1214/aos/1013699998. [CrossRef]

*Nature*, 228(5270), 477–478, https://doi.org/10.1038/228477a0. [CrossRef] [PubMed]

*Proceedings of Machine Learning Research*(Vol. 81). Retrieved from PMLR website: http://proceedings.mlr.press/v81/buolamwini18a.html

*PLoS Biology*, 4(4), e92, https://doi.org/10.1371/journal.pbio.0040092. [CrossRef] [PubMed]

*PLoS Computational Biology*, 15(4), e1006897, https://doi.org/10.1371/journal.pcbi.1006897. [CrossRef] [PubMed]

*Vision Research*, 23(2), 129–133, https://doi.org/10.1016/0042-6989(83)90135-9. [CrossRef] [PubMed]

*Visual Neuroscience*, 10(5), 811–825, https://doi.org/10.1017/S0952523800006052. [CrossRef] [PubMed]

*Trends in Cognitive Sciences*, 23(4), 305–317, https://doi.org/10.1016/j.tics.2019.01.009. [CrossRef] [PubMed]

*Proceedings of the National Academy of Sciences, USA*, 95(7), 4002–4006, https://doi.org/10.1073/pnas.95.7.4002. [CrossRef]

*Visual Neuroscience*, 21(1), 39–51, https://doi.org/10.1017/s0952523804041045. [CrossRef] [PubMed]

*2009 IEEE Conference on Computer Vision and Pattern Recognition*(pp. 248–255), https://doi.org/10.1109/CVPRW.2009.5206848.

*Vision Research*, 22(5), 531–544, https://doi.org/10.1016/0042-6989(82)90112-2. [CrossRef] [PubMed]

*Advances in Neural Information Processing Systems,*23, 658–666, http://www.nips.cc.

*Nature Neuroscience*, 14(7), 926–932, https://doi.org/10.1038/nn.2831. [CrossRef] [PubMed]

*Science*, 168(3933), 869–871, https://doi.org/10.1126/science.168.3933.869. [CrossRef] [PubMed]

*The Journal of Neuroscience: The Official Journal of the Society for Neuroscience*, 35(8), 3370–3383, https://doi.org/10.1523/JNEUROSCI.3174-14.2015. [CrossRef] [PubMed]

*Pattern Recognition*, 24(12), 1167–1186, https://doi.org/10.1016/0031-3203(91)90143-S. [CrossRef]

*Current Opinion in Neurobiology*, 55, 121–132, https://doi.org/10.1016/j.conb.2019.02.003. [CrossRef] [PubMed]

*IEEE Transactions on Information Forensics and Security*, 7(6), 1789–1801, https://doi.org/10.1109/TIFS.2012.2214212. [CrossRef]

*Journal of Neuroscience*, 31(39), 13911–13920, https://doi.org/10.1523/JNEUROSCI.2143-11.2011. [CrossRef] [PubMed]

*PLoS Computational Biology*, 12(4), e1004896, https://doi.org/10.1371/journal.pcbi.1004896. [CrossRef] [PubMed]

*Science*, 190(4217), 902–904, https://doi.org/10.1126/science.1188371. [CrossRef] [PubMed]

*Journal of Neurophysiology*, 43(4), 1111–1132, https://doi.org/10.1152/jn.1980.43.4.1111. [CrossRef] [PubMed]

*Journal of Neurophysiology*, 90(1), 204–217, https://doi.org/10.1152/jn.00954.2002. [CrossRef] [PubMed]

*Science*, 186(4169), 1133–1135, https://doi.org/10.1126/science.186.4169.1133. [CrossRef] [PubMed]

*eLife*, 7, https://doi.org/10.7554/eLife.38242.

*Journal of the Optical Society of America. A, Optics and Image Science*, 2(2), 147–155. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/3973752 [CrossRef] [PubMed]

*Journal of Neuroscience*, 40(12), 2538–2552, https://doi.org/10.1523/JNEUROSCI.2760-19.2020. [CrossRef] [PubMed]

*International Journal of Computer Vision*, 115(3), 211–252, https://doi.org/10.1007/s11263-015-0816-y. [CrossRef]

*Journal of Vision*, 14(2), https://doi.org/10.1167/14.2.3.

*Journal of Neurophysiology*, 71(4), 1428–1451, https://doi.org/10.1152/jn.1994.71.4.1428. [CrossRef] [PubMed]

*Vision Research*, 39(23), 3960–3974, https://doi.org/10.1016/S0042-6989(99)00101-7. [CrossRef] [PubMed]

*Journal of Vision*, 19(10), 34b, https://doi.org/10.1167/19.10.34b. [CrossRef]

*Nature Neuroscience*, 18(10), 1509–1517, https://doi.org/10.1038/nn.4105. [CrossRef] [PubMed]

*Vision Research*, 43(22), 2281–2289, https://doi.org/10.1016/S0042-6989(03)00360-2. [CrossRef] [PubMed]

*Proceedings of the National Academy of Sciences*, 111(23), 8619–8624, https://doi.org/10.1073/pnas.1403112111. [CrossRef]