Abstract
Specific regions of ventral temporal cortex (VTC) appear to be specialized for the representation of certain visual categories: for example, the visual word form area (VWFA) for words and the fusiform face area (FFA) for faces. However, a computational understanding of how these regions process visual inputs is lacking. Here we develop a fully computable model of responses in VWFA and FFA. We measured BOLD responses in these regions to a wide range of carefully controlled grayscale images while subjects performed different tasks (fixation task: judge color of a small central dot; categorization task: report perceived stimulus category; one-back task: detect image repetitions). Using cross-validation to control for overfitting, we developed a model that accurately accounts for the observed data. The first component of the model is a two-stage cascade of visual processing in which the bottom-up response in VTC (fixation task) is computed as the degree to which low-level stimulus properties match a category template. This reveals how high-level representations are constructed from simple stimulus properties. The second component of the model addresses top-down enhancement of VTC responses produced by performance of a task on the stimulus (categorization and one-back tasks). We show that the enhancement is stimulus-specific and can be modeled as a scaling of the bottom-up representation by the intraparietal sulcus (IPS). The third and final component of the model shows that the IPS response to a given stimulus reflects perceptual decision-making and can be quantitatively predicted using a drift diffusion model. Thus, the top-down scaling induced by the IPS is directly related to the behavioral goals of the subject. In sum, these results provide a unifying account of neural processing in VTC in the form of a model that addresses both bottom-up and top-down effects and quantitatively predicts VTC responses.
Meeting abstract presented at VSS 2016