Abstract
Among the many models of perception that have been proposed, most visual tasks are treated independently (Maximilian et al., 2000; Ma et al., 2011). However, the human visual system is an interconnected hierarchical network, with many neurons in the visual cortex shared to process various visual features, which are then utilized by downstream processes. Inspired by this characteristic of the human visual system, we propose a novel way to simplify models that can be generalized across different visual tasks: by sharing the feature encoder. We tested this new model framework based on the findings of some recent work in psychophysics showing that localization and perceived size of human observers are highly correlated (Wang et al., 2020). In this study, we used a Convolutional Neural Network to model localization and size perception tasks simultaneously. The localization task was to report the location of briefly-presented noise patches, and the size perception task was to discriminate whether an arc shown on the screen was shorter or longer than the average length of all seen arcs. Unlike traditional multi-task neural networks, where the inputs are the same for different tasks, our model can tolerate different types of visual stimuli. Our model is composed of one shared feature encoder, one regressor for localization, and one classifier for size discrimination. During training, the encoder and the regressor are trained first under the localization task, then the classifier for the size discrimination task is fine-tuned. Surprisingly, our results revealed that even though the encoder was never trained for the size discrimination task and the appearance of the visual stimuli across the two tasks was distinct, our model exceeded human performance on both tasks. The approach here provides a possible way to simplify multi-task computational models with shared features and provides insight into joint modeling of visual processes.