**Primary visual cortex (V1) is the first stage of cortical image processing, and major effort in systems neuroscience is devoted to understanding how it encodes information about visual stimuli. Within V1, many neurons respond selectively to edges of a given preferred orientation: These are known as either simple or complex cells. Other neurons respond to localized center–surround image features. Still others respond selectively to certain image stimuli, but the specific features that excite them are unknown. Moreover, even for the simple and complex cells—the best-understood V1 neurons—it is challenging to predict how they will respond to natural image stimuli. Thus, there are important gaps in our understanding of how V1 encodes images. To fill this gap, we trained deep convolutional neural networks to predict the firing rates of V1 neurons in response to natural image stimuli, and we find that the predicted firing rates are highly correlated (**

*simple*or

*complex cells*, depending on how sensitive their responses are to shifts in the position of the edge. The simple and complex cells are well studied (Lehky, Sejnowski, & Desimone, 1992; David, Vinje, & Gallant, 2004; Montijn, Meijer, Lansink, & Pennartz, 2016). However, many V1 neurons are neither simple nor complex cells, and the classical models of simple and complex cells often fail to predict how those neurons will respond to naturalistic stimuli (Olshausen & Field, 2005). Thus, much of how V1 encodes visual information remains unknown. We use deep learning to address this longstanding problem.

*convolutional neural networks*, have achieved impressive success in increasingly difficult image-classification tasks (Krizhevsky, Sutskever, & Hinton, 2012; LeCun, Bengio, & Hinton, 2015). Recently, these artificial neural networks have been used to study the visual system (Yamins & DiCarlo, 2016), setting the state of the art for predicting stimulus-evoked neural activity in the retina (McIntosh, Maheswaranathan, Nayebi, Ganguli, & Baccus, 2016) and inferior temporal cortex (Yamins et al., 2014). Despite these successes, we have not yet achieved a full understanding of how V1 represents natural images.

*n*, we calculated the mean firing rate

*A*

_{n}_{,}

*evoked by each image*

_{i}*i*by averaging its firing rate across the 20 repeated presentations of that image. The firing rates were calculated over a window from 50 to 100 ms after the image was presented, to account for the signal-propagation delay from retina to V1 (Figure 1D; V1 firing rates increase dramatically at ∼50 ms after stimulus onset). We separately analyzed firing rates computed over a longer (100-ms) window, from 50 to 150 ms after stimulus onset; the results of that analysis are presented in the Discussion section.

^{1}

*n*), where

*i*is the image index,

*A*

_{n}_{,}

*the measured response, and*

_{i}*y*the network's predicted response. The neurons' losses are summed, yielding the total loss used by the optimizer. To ensure that the performance generalizes, the training data were subdivided into data used by the optimizer to train the weights (66% of the images) and another small subset (14% of the images) to stop the training when accuracy stops improving (early stopping).

_{norm}= 1.

_{max}, we followed a bootstrapping procedure (in contrast to Schoppe et al., 2016) where we generated fake data by drawing random numbers from Gaussian distributions with the same statistics as the measured neural data. For each neuron and image, we averaged over 20 of these values to obtain a simulated prediction. We then computed the correlation between these simulated predictions and the neurons' actual mean firing rates to find the maximum correlation CC

_{max}possible given the variability in stimulus-evoked neural firing rates. While we acknowledge that neural firing rates are not Gaussian distributed, the CC

_{max}estimate, being a second-order statistic of the neural firing rates (and their estimates via the predictor networks), is sensitive only to the first- and second-order statistics of the neural data. A Gaussian distribution captures these first- and second-order statistics while making as few assumptions as possible about the higher order statistics in the data (i.e., it is a second-order

*maximum entropy*model). As a result, our use of Gaussian distributions does not affect the reliability of our estimates of CC

_{max}: Using more complex, harder-to-estimate probability distributions would yield the same result. For this reason, we are confident that our bootstrapping procedure, while slightly different from that of Schoppe et al., is comparable to their method.

*A*

_{n}_{,}

*of neuron*

_{i}*n*, where

*x*

_{j}_{,}

*is the*

_{i}*j*th pixel value in image

*i*and the constants

*W*

_{n}_{,}

*and*

_{j}*b*are determined from linear regression using LASSO regularization, a type of L1 (sparse) regularized linear regression. The LASSO regularization parameter was optimized on data from the same experimental session used to optimize the hyperparameters of CNN2. Then, leaving this term fixed, we evaluated the model using cross-validation on data from the other nine experimental sessions.

_{n}^{2}

*z*

_{j}_{,}

*of each feature (indexed by*

_{i}*j*) for each image (indexed by

*i*). We then constructed a linear predictor of the neuron firing rate, from the activations of the sparse-coding features, with prediction

*W*and biases

*b*according to Equation 4, where the variables

*z*

_{i}_{,}

*are BWT wavelet activations.*

_{j}*W*and biases

*b*according to Equation 4, where the variables

*z*

_{i}_{,}

*are VGG activations within the given layer. The five VGG layers we considered are Conv2,1, Conv2,2, Conv3,1, Conv3,2, and Conv3,3 (where Conv*

_{j}*a*,

*b*denotes convolutional layer

*b*within block

*a*).

*n*, where

*σ*(

*x*) is a nonlinear function. A parametric rectified linear was chosen as the nonlinearity because it outperformed a parameterized sigmoid. The parameters of the model were trained in TensorFlow using the same learning process as for the convolutional models, with early stopping as the primary form of regularization.

*σ*(·) is the rectified linear function, and the superscripts

*ℓ*on

*A*, and their reliability CC

_{max}.

*A*is the cell's firing rate indexed

_{i}*i*over the set of

*N*images (Zylberberg & DeWeese, 2013). This index has a value of 0 for neurons that fire equally to all images and a value of 1 for cells that spike in response to only one of the images.

*A*is the neuron's firing rate in response to a grating oriented at angle

_{θ}*θ*. The circular variance is less sensitive to noise than the more commonly used orientation-selectivity index (Mazurek, Kager, & Van Hooser, 2014). Following the results of Mazurek et al. we used thresholds of circular variance < 0.6 to define orientation-selective cells (the simple and complex cells according to Hubel & Wiesel, 1959) and circular variance > 0.75 non-orientation-selective cells. We omitted all other cells from these two groupings.

_{max}. Comparing the predictability of each cell's firing rates with its respective image-selectivity index (Figure 5A) and circular variance (Figure 5B), we found that the predictability depends only weakly on these characteristics. Thus, orientation selectivity and image selectivity are only minor factors in determining how well our model performs.

_{max}(Figure 5F) are both strongly related to the model's performance. Cells with a low mean firing rate

_{max}on predictably set by the neural reliability decreases, the model performance decreases by far more, meaning that overall the model does far worse at predicting the activity of these neurons. Selecting only the reliable neurons, CC

_{max}> 0.80, yields improved predictability, with

_{max}> 0.80).

*M*±

*SEM*), compared with 8.4 ± 0.8 spikes/s for the well-isolated single units (estimated during the 50-ms spike-counting window). Recall that neurons with higher firing rates were generally more predictable (Figure 5C). We thus attribute the higher predictability of the multiunit clusters to their higher mean firing rates.

*Journal of Machine Learning Research*, 13, 281–305.

*bioRxiv*201764, https://doi.org/10.1101/201764.

*Nature Neuroscience*, 18 (11), 1648–1655.

*The Journal of Neuroscience*, 24 (31), 6991–7006.

*The Journal of Physiology*, 148 (3), 574–591.

*Advances in Neural Information Processing Systems*25, 1097–1105.

*Proceedings of the National Academy of Sciences, USA*, 99 (13), 8974–8979.

*Nature*, 521 (7553), 436–444.

*The Journal of Neuroscience*, 12 (9), 3568–3581.

*IEEE Conference on Computer Vision and Pattern Recognition*, 5188–5196.

*Frontiers in Neural Circuits*, 8, 92.

*Advances in Neural Information Processing Systems*29, 1369–1377.

*Cell Reports*, 16 (9), 2486–2498.

*Nature*, 381, 607–609.

*Neural Computation*, 17 (8), 1665–1699.

*Optics Letters*, 40 (11), 2553–2556.

*Neural Networks*, 17 (5), 663–679.

*Frontiers in Computational Neuroscience*, 10, 10.

*arXiv*1409.1556.

*Journal of Machine Learning Research*, 15, 1929–1958.

*The Journal of Neuroscience*, 35 (44), 14829–14841.

*The Journal of Neuroscience*, 30 (6), 2102–2114.

*Neural Computation*, 20 (6), 1537–1564.

*Nature Neuroscience*, 19 (3), 356–365.

*Proceedings of the National Academy of Sciences, USA*, 111 (23), 8619–8624.

*Proceedings of the National Academy of Sciences, USA*, 113 (22), E3140–E3149.

*PLoS Computational Biology*, 9 (8), 1–10.

*PLoS Computational Biology*, 7 (10), 1–12.

^{2}In our experience, SAILnet is much faster to train than SparseNet. We used the publicly available SAILnet code out of the box (http://www.jzlab.org/sailcodes.html), without changing any parameter values except image size.