Abstract
HMAX/Neocognitron models of visual cortex use learned hierarchical (sparse) representations to describe visual scenes. These models have reported state-of-the-art accuracy on whole-image labeling tasks using natural still imagery (Serre et al., PNAS 2007). Generalizations of these models (e.g., Brumby et al., AIPR, 2009) allow localized detection of objects within a scene. Itti and Koch (Nature Reviews Neuroscience, 2001) have proposed non-task specific models of visual attention (“saliency maps”), which have been compared to human and animal data using eye-tracking systems. Chikkerur et al. (Vision Research, 2010) have reported using eye-tracking to compare visual fixations on objects in detection tasks within still images (finding pedestrians and vehicles in urban scenes), compared to an extension of an HMAX model that adds a model of attention in parietal cortex.
Here, we describe new work comparing human eye-tracking data for object detection in natural video sequences to task-specific saliency maps generated by a sparse, hierarchical model of the ventral pathway of visual cortex called PANN (Petascale Artificial Neural Network), our high-performance implementation of an HMAX/Neocognitron type model. We explore specific object detection tasks including vehicle detection in aerial video from a low-flying aircraft, for which we collect eye-tracking data from several human subjects.We train our model using hand-marked training data on a few frames, and compare our results to eye-tracking data over an independent set of test video sequences. We also compare our task-specific saliency maps to non-task specific saliency maps (Itti et al., PAMI, 1998; Harel et al., NIPS, 2006). We conclude that activity in model IT cortex projects back to a spatial distribution that correlates well to task-specific visual attention recorded by the eye tracker throughout the video sequences.
Department of Energy LDRD funding.