Abstract
Computational models of visual processing make quantitative, testable predictions, and the accuracy of these predictions can be used to objectively gauge model goodness. Such quantitative validation is common at the behavioral and neural population levels. However, this is less common at the single-neuron level, partly because of difficulty in data collection, but more importantly because the large variability among neurons requires models to be specific enough to predict the response of a single neuron, yet flexible enough to be fitted to all individual neurons within the recorded population. Such validation has been done for V1 (see Olshausen & Field, 2005), V4 (Cadieu et al, 2007; David et al, 2006) and MT (Rust et al, 2006). Here, we test a model for spatio-temporal processing of action sequences in the primate superior temporal sulcus (STS). The relatively higher position of STS in the visual processing hierarchy (Felleman & Van Essen, 1991) makes this a harder task. Using computer-generated humanoid action sequences, we trained monkeys to recognize multiple actions and recorded from the temporal lobe (Singer & Sheinberg, in press). We then used computational models of the ventral (Serre et al, 2005) and dorsal (Jhuang et al, 2007) streams of visual cortex, coupled with a simple parameter-search procedure to fit either model to 100+ individual neurons, predicting the firing rate of each neuron in response to 64 action sequences, for 90 bins of duration 10ms each. We performed leave-one-out cross-validation, and both models achieved good test-set performance comparable to previous work in V4 and MT, while controls performed significantly worse and close to chance. To the best of our knowledge, this is not only the first instance of quantitative modeling for single neurons this far downstream, but also the first instance of time-series prediction for neurons beyond V1.