Abstract
Area V4 is an important intermediate stage of visual processing. However, it has been very difficult to characterize and model the tuning properties of V4 neurons. For example, no current models of V4 neurons can accurately predict responses to natural images. This is in stark contrast to models of V1 and MT, where responses can be predicted well. V4 neurons have large, nonlinear receptive fields, and this makes it difficult to estimate their tuning properties using conventional methods: modeling V4 amounts to solving a high-dimensional non-linear regression problem with limited data. To effectively attack this problem, we first sought to collect as much data as possible by chronically implanting electrode arrays in area V4 of two macaque monkeys. Neurons were recorded while the awake animals viewed clips of large, full color natural movies. The chronic recordings were stable enough that neurons could often be held for several days. This allowed us to collect responses to hundreds of thousands (up to over 1 million) distinct movie frames, for hundreds of different V4 neurons. We then used several different neural network architectures to fit the data obtained from each neuron. The training signals for each fit neural network were the stimulus movie and the response from one neuron. The most successful neural network architecture that we tested was one that reflected insights from the Adelson-Bergen energy model, the scattering transform and deep convolutional neural networks. We call this the deep convolutional energy model. This model is simple and interpretable, and it predicts V4 responses significantly better than previous models. Deep convolutional energy models fit to V4 neurons approach the prediction performance of the best current models of V1 and MT neurons. Interpretation of the fit models provides important insights about the representation of visual information in area V4.
Meeting abstract presented at VSS 2016