Purchase this article with an account.
Philip Kellman, Nicholas Baker, Gennady Erlikhman, Hongjing Lu; Classification Images Reveal that Deep Learning Networks Fail to Perceive Illusory Contours. Journal of Vision 2017;17(10):569. doi: 10.1167/17.10.569.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Background: Deep learning networks show impressive object recognition performance and learned filters within them show correlates with neural activity in human visual areas. However, deep learning networks can also be easily fooled by adversarial examples which do not affect human recognition. We used the classification image method developed in psychophysics to probe whether a deep learning model employs the same features as humans in perceiving real and illusory contours. Method: We adapted a standard deep learning network, Convolutional Neural Networks (CNNs), pre-trained on the ImageNet dataset with 1.2 million natural images. The network was trained to perform shape discrimination in the "fat/thin" task (Ringach & Shapley, 1996) by replacing the last decision layer with a perceptron. The perceptron used the set of 4096 activations of the CNN's penultimate layer's units as input and was trained to do a novel classification between 'fat' and 'thin' shapes with 12,000 examples of these stimuli. After training, we tested with real and illusory contour stimuli contaminated with Gaussian luminance noise. The network's decisions on shape discrimination were used to combine noise fields to compute classification images. Results: Networks coupled with the new decision layer discriminated between fat and thin shapes with high accuracy (98.56%). For real contours, classification images showed behavioral receptive fields consistent with human classification images. However, in displays with gaps, where humans perceive illusory contours, the classification images from the CNN failed to reveal behavioral receptive field activity along illusory contours. Conclusions: Deep learning networks trained on natural images can be readily altered by introducing a new decision layer to discriminate between psychophysical stimuli with an extremely high degree of accuracy. However, deep learning networks do not appear to perceive illusory contours from corner inducing elements, a process readily and automatically performed in the human visual system (Gold et al, 2000).
Meeting abstract presented at VSS 2017
This PDF is available to Subscribers Only