Abstract
Primates rely on hierarchically organized cortical areas in the ventral visual pathway to rapidly and accurately recognize visual objects. Previous work showed that linearly weighted summations of population activity across the macaque inferior temporal (IT) cortex (final stage of the hierarchy) can sufficiently explain object confusion patterns. These “top-readout” decoding models posit that downstream brain regions receiving input from IT drive object categorization. However, such hypothetical regions (e.g., PFC) likely receive inputs from areas all along the ventral stream hierarchy (enabling "side-readouts"). Furthermore, projections connecting early visual areas (e.g., V1) directly to the IT cortex motivate the relevance of "bypass-connections". When do these “side-readouts” and “bypass-connections” become functionally relevant? To guide experimental design in answering this question, we assessed internal representations of deep convolutional neural networks (DCNNs; current best models of ventral stream). We developed novel image-level metrics to quantify how rapidly object-identity solutions (linear separability among objects) evolve across DCNN layers. Thus, we identified images with equal categorization performance at the final DCNN layers (“top-readout”) but that evolved faster or slower along the processing hierarchy. We hypothesized that, for the faster images, primates may benefit from side-readouts or bypass-connections to allow faster access to object-identity solutions. Consistent with this hypothesis (falsifying “top-readout” predictions), we observed that under high behavioral demands (tasks with rapid image presentations, ~33ms, followed by backward masking), monkeys (n=2) showed significantly higher performance for the faster-evolving images. To narrow down the space of candidate brain models, we developed a battery of competing neural network architectures based on hypothetical combinations of side-readouts and bypass-connections. By directly contrasting predictions across these new models, we identified “controversial images'' producing the highest divergence in predicted behavior. Future work will test these predictions with large-scale neural recordings and perturbation datasets from the ventral stream (Kar et al., 2019-2021).