Abstract
The visual brain is often conceptualized as a predictive system. Under this view, visual inputs are constantly matched against internal models of what the world should look like, with a higher similarity between the input and model leading to increased processing efficiency. Given our many prior expectations about the structure of everyday environments, such predictions should be very potent in natural scene perception. In a series of experiments, we asked whether perceptual performance is explained by how well scenes are matched to participants' personal internal models of natural scene categories. Participants took part in drawing tasks where they sketched their most typical versions of kitchens and lounges, which we used as descriptors for their internal models. These drawings were then converted into 3d renders. Using these renders in a scene categorization task, we observed better categorization for renders based on participants' own drawings compared to renders based on others' drawings and renders based on arbitrary scene photographs. Further, using a deep neural network (DNN) trained on scene categorization, we investigated whether graded similarity to participants’ own drawings predicted categorization performance. We found that behavioral categorization was better when the DNN's response to a scene was more similar to the DNN's response to the participant’s typical scene of the same category – and more dissimilar to the response to the typical scene of the other category. This effect was specifically observed at late DNN layers, suggesting that perceptual efficiency is determined by high-level visual similarity to the internal model. Together, these results show that perception operates more efficiently in environments that adhere to our internal models of the world. They further highlight that for making progress in understanding natural vision, we need to account for idiosyncrasies in personal priors.