Journal of Vision Cover Image for Volume 16, Issue 12
August 2016
Volume 16, Issue 12
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2016
Learning Deep Representations of Objects and Materials for Material Recognition
Author Affiliations
  • Xing Liu
    Graduate School of Information Sciences, Tohoku University
  • Mete Ozay
    Graduate School of Information Sciences, Tohoku University
  • Yan Zhang
    Graduate School of Information Sciences, Tohoku University
  • Takayuki Okatani
    Graduate School of Information Sciences, Tohoku University
Journal of Vision September 2016, Vol.16, 177. doi:https://doi.org/10.1167/16.12.177
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Xing Liu, Mete Ozay, Yan Zhang, Takayuki Okatani; Learning Deep Representations of Objects and Materials for Material Recognition . Journal of Vision 2016;16(12):177. https://doi.org/10.1167/16.12.177.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

We study the problem of recognizing the material of an object from its single image. We are interested in how representations learned by convolutional neural networks (CNNs) for object recognition can be transferred for material recognition.We first propose a procedure for transfer learning using a 16-layer CNN model. First, it is trained for object recognition using a dataset of object images (ImageNet). Then, the model is fine-tuned for material recognition using a material dataset created in the authors' lab, which is an extended version of FMD dataset (Sharan et al., IJCV, 2013). The dataset (named EFMD) consists of 10,000 images belonging to the same 10 classes as FMD. Finally, while fixing from 1st to 9th layers, only 10th to 16th layers of the model are further fine-tuned using a training subset of FMD. The model obtained by this procedure achieves prediction accuracy 83%±0.99 on FMD, which is close to the reported human performance 84.9%. In addition, we analyze the effect of representations of local and non-local surface properties on performance by employing models on deformed images of FMD. First, we use two datasets, namely t15 and t30, consisting of images that are deformed "locally" using synthesized texture forms with 15x15 and 30x30 patches, respectively. Then, we use two datasets of images deformed to emphasize non-local properties of surfaces using bilateral and high-pass filters, called bf and hf, respectively. The proposed procedure provides 65.3%±1.13, 69.8%±0.82, 69.2%±1.00, and 64.7%±1.90, while the human performance is reported as 38.7%, 46.9%, 64.8%, and 65.3% on t15, t30, hf, and bf, respectively. This result can be attributed to the locality of representations learned in CNNs implied by the local convolutions. Overall, our results suggest that object representations can be transferred for learning material representations in CNNs to model local surface properties and recognize materials.

Meeting abstract presented at VSS 2016

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×