Abstract
Biologically-inspired deep computational models like the Invariant Visual Search Network (Zhang et al., 2018) can effectively search for a single target in an image. We modified the IVSN model to perform hybrid search, i.e. searching a visual display for any one of several targets held in memory. In this case, the features from multiple targets must be utilized to guide search. This raises the question: what strategies best model human behavior? The model was given an input image consisting of eight grayscale objects on each trial. Either 1, 2 or 4 targets were held in memory. The search image included a single target in each trial. The model generated a separate priority map from the search image for each target held in memory. These priority maps were either combined or selected at random with replacement (randomly cycled) and the output guided fixation. The model generated fixations until the target was found or all locations were visited. Each fixated location was inhibited such that the model never revisited a location. We compared model performance to data from a psychophysics experiment where human observers performed the same task with the same stimuli. Using human target-present data, we assessed human and model performance for the number of fixations needed to find the target and the sequence of locations visited during a trial. Averaging model priority maps across targets in memory produced a poor fit to human performance. Cycling priority maps to guide each successive fixation produced performance similar to humans as if observers might be prioritizing a different member of the memory set on each fixation. Combining priority maps by taking the maximum across maps at each location also produced performance similar to humans. These computational strategies reveal possible mechanisms used by humans during hybrid search when multiple target templates are held in memory.
Acknowledgement: NSF, NIH-NEI EY017001, Harvard Mind, Brain & Behavior Faculty Award to Kreiman & Wolfe