Abstract
Introduction: Humans show performance benefits for target detection when they view recurring background configurations predictive of target locations (Chun and Jiang, 1998). This performance benefit is known as contextual cueing. We train a simple image-computable neural network on the contextual cueing task and analyze individual neurons in the network to understand how they extract and integrate target and background configuration information. Methods: We trained a CNN on 2240 noisy images containing a target tilted to the left or right. Each image has sixteen locations with one target, six distractors (with four possible orientations), and nine empty locations. Half the images have repeating configurations predictive of target locations but not orientation. The CNN consisted of three strided convolution layers, followed by a dense layer and an output layer with two neurons, and is trained to classify target orientation. We evaluated human (n=6) and CNN performance on the same images but with different noise levels and benchmarked against an approximate Bayesian Ideal Observer (~BIO). We extracted responses of each neuron to these images and calculated ROCs for target detection at each location. Results: The CNN shows similar facilitation with old configurations vs. new as humans and the ~BIO. The convolution layers have separate neurons for background detection and target detection at each location, while the dense layer has neurons tuned jointly to the target and the background and neurons tuned jointly to target at multiple locations. Dense layer neurons tuned to the target experience a change in AUC when an old background is present vs. a new one while convolution layer neurons do not. Conclusion: We show that a simple CNN can show target detection performance benefits from global context similar to humans, and analyze how the network hierarchy increasingly integrates across target, background, and locations.