Abstract
The Bayesian brain hypothesis (Knill & Pouget, 2004) suggests that the visual cortex infers properties of a visual scene by computing conditional probabilities of underlying causes given a particular image. Within this framework, top-down spatial and feature-based attention have been modeled as feedback of prior probabilities over locations and features, respectively (Rao, 2004; Chikkerur et al., 2010). The present study investigated whether, given an image and top-down priors, the posterior probability of a feature map could be used to guide simulated eye movements in a computational model performing a visual search task. A two-layer hierarchical generative model (Lee et al., 2009) was trained on images of handwritten digits (MNIST; LeCun et al., 1998) using unsupervised learning. In this model, activity in the first layer represents the conditional probability of features (e.g., oriented lines), given the image, while feedback from the second layer represents the prior over first-layer features. In order to simulate eye movements, the model selected locations within the image based on the maximum posterior probability of the first-layer feature map, given the image and second-layer feedback. The model was tested using a visual search task for a digit (approximately 20×20 pixels) placed among non-digit distractors in a 60×60 pixel search array. For each trial, the model selected the most probable location of a single target digit placed randomly in the search array. In order to quantify the accuracy of the model, correct trials were defined as selecting a location within 10 pixels of the target center. Across 10,000 trials, the model was able to correctly locate the target on 70% of trials (chance: 11%). These results suggest that a visual system conforming to the Bayesian brain hypothesis could accurately perform visual search by using Bayes’ rule to combine bottom-up sensory information with top-down priors.