Abstract
Material classification is the task of distinguishing materials like fabric, wood, ceramic, and so forth. Fine-grained classification aims to distinguish subcategories of a material (e.g., satin fabric versus wool fabric). Fine-grained recognition relies on identifying specific visual attributes (e.g., satin is glossy while wool is not) over contextual cues (e.g., both satin and wool are used as clothing). In paintings, artists carefully place visual cues like highlights on fabrics like silk or satin; as such, we hypothesize that learning a visual recognition model from paintings can be beneficial. In this study, we explored the representations learned by neural networks for the task of distinguishing silk/satin from cotton/wool. We trained separate models on paintings from the Materials In Paintings (MIP) dataset and on photos from Flickr, and extracted evidence heatmaps that indicate which cues are used by the models in unseen test images. We conducted a study with 57 quality-controlled participants on MTurk to analyze which model uses cues that are preferred by humans. Overall, we found the model trained on paintings utilizes cues that are better preferred by humans. Furthermore, we found that both models use cues that are equally preferred when tested on photos of silk/satin. This is interesting as the model trained on paintings has never been explicitly trained to classify photos of satin/silk, and conventional wisdom dictates that the painting model should fail to extract cues that are comparable to the cues used by the photo model in this setting. Furthermore, the painting model generalizes better across domains with respect to prediction accuracy. While this study is limited to two fine-grained classes of fabric, our results show that perceptual depictions in paintings can be useful for guiding deep neural networks.