Abstract
Convolutional neural networks (CNNs) have attracted considerable attention for their remarkable performance at a variety of cognitive tasks, including visual object recognition. This has led to the proposal that deep learning networks may provide a biologically plausible model of human visual processing (e.g., Yamins et al., 2014; Khaligh-Razavi & Kriegeskorte, 2014). Here, we investigated whether these networks are robust to noisy viewing conditions. If not, are CNNs disrupted by visual noise in a manner that resembles human performance? To this end, we systematically compared the performance of humans and machines across a range of signal-to-noise ratios by presenting object images in varying levels of Gaussian pixel noise or Fourier phase-scrambled noise. Human performance proved far more robust to noise than state-of-the-art CNNs (AlexNet, VGG, and GoogLeNet). Moreover, CNNs were more severely impaired by Gaussian noise while humans had greater difficulty with spatially structured Fourier noise, implying that these CNNs process noisy objects in a qualitatively different manner. Next, we asked whether CNNs can acquire greater robustness by undergoing training with noisy object images. Noise-trained CNNs showed major improvements and successfully generalized to novel noisy images, demonstrating that noise invariance can be achieved by feedforward neural architectures through supervised learning. A layer-specific network analysis revealed that the middle and upper layers underwent the greatest change by acquiring representations that were more robust to visual noise following this training regime. Finally, we evaluated the patterns of error responses made by CNNs and humans by comparing their confusion matrices. After noisy image training, CNNs made patterns of errors that more strongly resembled those made by humans for objects in high levels of noise. Taken together, these results suggest that CNNs provide a promising model for gaining insight into the robustness of human object recognition performance.
Meeting abstract presented at VSS 2018