Abstract
Deep convolutional neural networks (DNNs) are currently very popular, drawing interest for their high performance on object classification tasks. Additionally, they are being examined for purported parallels between their hierarchical features and those found in systems of biological vision (e.g. Yamins etal., 2014). Human vision has been studied extensively by psychophysics using simple grating stimuli, and many experimental results can be accommodated within a model where linear filters are followed by point-wise non-linearities as well as non-linear interactions between filters (Goris et al., 2013). However, two of the most striking failures of current spatial vision models are their inability to account for the contrast-modulation experiments by Henning et al. (1975) and the plaid-masking experiments by Derrington and Henning (1989). Googlenet and Alexnet are two DNNs performing well on object recognition. We ran contrast-modulated and plaid-masking stimuli through the networks and extracted the layer activations. Since these networks are fully deterministic, we designed an optimal linear decoder around the assumption of late, zero-mean additive noise, where the variance of the noise was calibrated to match human performance on contrast detection experiments. Unlike human observers, neither Alexnet nor Googlenet show any trace in any of their layers of masking by contrast-modulated gratings. Worse still, adding the contrast-modulated mask strongly facilitated detection. Using plaid-masks, Googlenet again showed strong facilitation. Alexnet, on the other hand, shows plaid-masking effects at least qualitatively similar to those found in human observers. However, this was only true for the last layers, the "object" layers, not the early layers. Strong claims that DNNs mirror the human visual system appear premature. Not only do the DNNs fail to show the masking effects found in human observers, different DNNs were found to behave wildly differently to simple stimuli.
Meeting abstract presented at VSS 2016