Abstract
We study the problem of recognizing the material of an object from its single image. We are interested in how representations learned by convolutional neural networks (CNNs) for object recognition can be transferred for material recognition.We first propose a procedure for transfer learning using a 16-layer CNN model. First, it is trained for object recognition using a dataset of object images (ImageNet). Then, the model is fine-tuned for material recognition using a material dataset created in the authors' lab, which is an extended version of FMD dataset (Sharan et al., IJCV, 2013). The dataset (named EFMD) consists of 10,000 images belonging to the same 10 classes as FMD. Finally, while fixing from 1st to 9th layers, only 10th to 16th layers of the model are further fine-tuned using a training subset of FMD. The model obtained by this procedure achieves prediction accuracy 83%±0.99 on FMD, which is close to the reported human performance 84.9%. In addition, we analyze the effect of representations of local and non-local surface properties on performance by employing models on deformed images of FMD. First, we use two datasets, namely t15 and t30, consisting of images that are deformed "locally" using synthesized texture forms with 15x15 and 30x30 patches, respectively. Then, we use two datasets of images deformed to emphasize non-local properties of surfaces using bilateral and high-pass filters, called bf and hf, respectively. The proposed procedure provides 65.3%±1.13, 69.8%±0.82, 69.2%±1.00, and 64.7%±1.90, while the human performance is reported as 38.7%, 46.9%, 64.8%, and 65.3% on t15, t30, hf, and bf, respectively. This result can be attributed to the locality of representations learned in CNNs implied by the local convolutions. Overall, our results suggest that object representations can be transferred for learning material representations in CNNs to model local surface properties and recognize materials.
Meeting abstract presented at VSS 2016