Abstract
Masking is a key experimental tool to precisely control the visibility of visual stimuli. We explore a new technique for masking images, in particular natural scenes: briefly flashing (e.g., 10 ms) a natural scene followed by a brief flash of a negative version of the same scene. The stimulus and its negative version are inversely related like a photo and its negative and can be obtained by subtracting the stimulus from the maximum palette entry in each respective color channel. This technique is related to experiments showing that for two differently colored lights flashed subsequently, people report a blended version of both stimuli as if the process underlying perception integrates over both stimuli (Efron, 1967). If the visual system perfectly averaged over stimulus and mask, subjects would report seeing a uniform gray patch. Contrariwise, in some areas of the picture subjects reported deviations from mere gray perception. We show (1.) that these areas are systematically related to predictions of computational approaches to saliency (Itti & Koch, 2001, e. g.) such that our masking technique masks everything but the most salient regions of the image; (2.) that this effect cannot be achieved by using standard masking techniques or no masking; (3.) that this masking effect is weakened but not abolished in a dichoptic version of the experiment.
These results have significant implications for the neuronal coding of saliency and for computational approaches to saliency. In particular, we show that a simple thresholded wavelet transform, computing local contrast intensities with positive and negative values corresponding to two different polarities (ON- and OFF-center cells), corresponds to people's percepts. Given the similarities between percepts and saliency maps, this may provide an alternative, and less costly, tool for computational approaches to saliency.
ONR, NGA, Alexander von Humboldt foundation.