Abstract
Recent goal-driven deep neural network (DNN) models of higher ventral visual cortex have leveraged the rich behavioral task of object recognition to impose powerful top-down constraints on network parameters. DNNs optimized to solve the multi-way object categorization in challenging real-world images have been shown to provide state-of-the-art predictions of neural responses in visual areas throughout the primate ventral pathway. Here, we show that such models can be improved by using a combination of multiple behaviorally realistic tasks as network optimization targets. Specifically, we optimized a DNN to simultaneously solve high level tasks including object categorization and scene classification, as well as intermediate visual tasks including depth estimation, normal map estimation and semantic segmentation. Task optimization was synergistic, in that performance levels for each task in the combined training were higher at a given number of training examples than for models trained on each task separately. Moreover, the model trained on the combined tasks provided improved ability to fit response patterns in neurons from both cortical areas V4 and IT. These results suggest that identifying a richer and more ecologically relevant variety of visual behaviors as network "goals" may lead to substantially improved understanding of the neural computations in the visual system.
Meeting abstract presented at VSS 2018