Abstract
We can visually recognize a variety of materials, such as fabric, glass, and metal. A promising approach to elucidate complex visual mechanisms underlying material perception is to construct and analyze artificial neural networks which recognize materials as humans do. We previously (Liu et al., VSS2016) showed that a convolutional neural network (CNN) could recognize material categories of Flickr Material Database (FMD; Sharan et al., 2013) as accurately as humans. The network was developed from a CNN model for object recognition (Simonyan & Zisserman, 2015), and fine-tuned for material recognition with FMD. To compare the performance of humans and CNNs in a broader context, the present study investigated the perturbation tolerance. Specifically, we distorted the material images of FMD by adding Gaussian noise with several levels of standard deviation, and let human participants and a CNN classify images into ten material categories. The CNN was fine-tuned for material categorization, but only with noiseless FMD images (Liu et al., VSS2016). Results showed that the categorization accuracy dropped with the noise level more steeply for the CNN than for humans, implying lower perturbation tolerance for the CNN than for humans. Furthermore, we examined whether the noise tolerance of the CNN could be improved by adding a de-noising network to the original material-recognition CNN in a cascaded manner. We obtained a major improvement to the level of human performance when using a type of de-noising network that learned to match features represented by the following CNN between the original and noisy images of FMD (Johnson et al., 2016), while we obtained only a minor improvement when using another type of de-noising network without CNN feature matching mechanisms (Isola et al., 2016). Although behavioral similarity does not imply computational similarity, these findings provide novel insights into human visual mechanism for material recognition.
Meeting abstract presented at VSS 2018