Abstract
Humans readily distinguish between matte and glossy surfaces, but the underlying visual computations remain poorly understood. Here, we took a data-driven approach to identifying gloss cues, by learning from large numbers of images. We created 75,000 scenes with varying objects, light-probes and viewpoints, each rendered once with a high gloss (mirror-like) material and once with a low gloss (near-matte) material. Texture patterns on the low gloss surfaces ensured that contrast and saturation alone could not distinguish the two classes. 10 participants classified 300 such images as either glossy or matte, achieving 79-93% correct. We then trained a series of classifiers to distinguish the two materials: linear classifiers trained on simple intensity and colour statistics and Portilla-Simoncelli texture statistics, as well as convolutional neural networks (CNNs) with different architectures. For randomly selected renderings, all classifiers correlated very well with humans. However, this was likely due to the high accuracy of both human and model classifiers. We therefore tested observers on further images until we had assembled a 'difficult' test set spanning the range from consistently correctly to consistently incorrectly judged images. For all classifiers, correlations with human judgments dropped precipitously for the difficult set. Using Bayesian hyper-parameter search we then identified CNN architectures that, when trained on the standard renderings, also correlated highest with humans on the difficult test set (not used for training). Finally, we trained Generative Adversarial Networks of varying depths to create images of the two material categories. We tested these images on human observers to arrive at a network depth that is sufficient to imitate those features that humans use to distinguish between high gloss and low gloss textured materials. There are striking similarities between these networks and the CNNs that could predict human judgments.
Meeting abstract presented at VSS 2018