Abstract
Theories and models of visual categorization suggest that the brain must actively transform its representations of complex input scenes into the low-dimensional manifolds that represent the task. With its involvements in multiple face, object and scene categorizations, the occipito-ventral pathway should be pivotal to such task-dependent representational transformations. However, previous studies using full images didn’t allow us to observe these transformations. Here, we tracked them directly in the brain. We used an experiment where participants (N=10) performed four different 2-Alternative-Forced-Choice categorizations of the same 64 base images of a realistic city scene in different blocks of 1,536 trials. These images comprised varying embedded targets (8 face identities representing 2 genders x 2 expressions x 2 vehicles x 2 pedestrian). Each trial started with a fixation cross, followed by one base image for 150ms, whose pixels were randomly sampled with the Bubbles procedure. We concurrently recorded participants’ categorization responses and source-localized MEG activity. First, we determined the features each participant used for behavior in each task–computing Mutual Information(Pixel visibility; Correct vs. Incorrect). We also reconstructed their dynamic representations of each image pixel on each MEG source–computing MI(Pixel visibility; MEGt source amplitude). We then tracked the representational transformations across the occipito-ventral pathway layers. In each participant and task, we discovered that 80 and 150ms post-stimulus, the broad initial representation of image pixels in occipital cortex progressively transforms across ventral pathway layers into the low-dimensional task-specific feature manifolds (whose contents align with those supporting task-behavior). Occipital cortex also reduces task-irrelevant pixels (until 150ms), when the ventral pathway has identified the task-relevant feature manifolds. Our findings offer new insights into how the occipito-ventral pathways dynamically aligns its features to those of behavior in multiple face, object and scene categorization tasks.