September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
Understanding Information Processing Mechanisms for Estimating Material Properties of Cloth in Deep Neural Networks
Author Affiliations & Notes
  • Wenyan Bi
    Department of Computer Science, American University
  • Gaurav Kumar
    Samsung R&D Institute India-Bangalore
  • Hendrikje Nienborg
    University of Tuebingen, Werner Reichardt Centre for Integrative Neuroscience
  • Bei Xiao
    Department of Computer Science, American University
Journal of Vision September 2019, Vol.19, 297c. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Wenyan Bi, Gaurav Kumar, Hendrikje Nienborg, Bei Xiao; Understanding Information Processing Mechanisms for Estimating Material Properties of Cloth in Deep Neural Networks. Journal of Vision 2019;19(10):297c. doi:

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Two decades of research in material perception shows that humans can estimate material properties in diverse scenes. However, it still lacks a computational framework that explains these perceptual data. Deep neural network trained on large data has succeeded in object recognition and classification. Will it succeed in estimating material properties of objects in dynamic scenes? One challenge is that it is difficult to obtain clear-cut labels for material properties. In addition, there is no simple relationship between perceived material attributes with the physical measurements. Here we use physics-based cloth simulations as our database and train neural networks with a variety of architectures to understand the computational mechanism of estimating mechanical properties of cloth. We hypothesize that network architectures that matches best with human performance would likely to be adapted by the human visual system. The dataset contain 1764 animations containing moving clothes rendered with 7 bending stiffness values, 6 mass values, 6 textures and in 3 dynamic scenes. First, we finetune a pre-trained ResNET (CNN) to extract features for static frames of maximum deformations with ground-truth parameters as labels. Second, we connect a long-short-term memory network (LSTM) in series with the CNN to train videos with the same labels. Thirdly, we use MDS method to obtain perceptual clusters of these stimuli and used these clusters as labels to train the videos with the same architecture (CNN + LSTM). Finally, we built a two-stage network where the first network performs unsupervised learning to cluster data to get the labels, and the second network uses those labels and learns a discriminative model to classify the videos. We use fisher vectors generated from dense trajectories of videos to represent the data in the first network. We find that the two-stage network outperforms the other architectures and the results are closest to human perception.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.