Abstract
Humans effortlessly judge surface gloss, even though the image is formed by a complex interplay between reflectance, object shape, lighting environment and viewpoint. Understanding the full picture of gloss perception requires comprehensive measurements under diverse conditions resembling scene variations in the real world. Here we conducted a large-scale online experiment using 3,888 test images, generated by a factorial combination of 36 objects, 36 illuminations, and 3 viewpoints per environment. Test objects were assigned random body color and specular reflectance. 297 participants were each presented with 84 test images paired with a reference image of a bumpy object under a different illumination. They adjusted the specular reflectance of the reference object until test and reference objects had the same apparent glossiness. Results showed that human judgments and ground-truth specular reflectance often strongly disagreed. Objects with mid to high specular reflectance resulted in vastly different perceived glossiness while objects with low specular reflectance consistently yielded low perceived glossiness. Simple image statistics computed from test objects only partially explained human behaviors. Yet, importantly, the patterns of errors were highly consistent across participants (average correlation coefficient >0.80). Furthermore, participants tended to assign higher glossiness to objects with body colors lying on the red-green axis, presumably because natural lighting environments typically produce yellow-blue specular colors, making the specular reflection perceptually separable from the body color. Although it is widely believed that observers have strong gloss constancy, our results show that humans exhibit systematic errors in gloss judgements once diverse scene factors are considered. Going forward, our datasets are ideal to train deep neural networks to reproduce human-like gloss labels, because human settings and ground-truth are well decorrelated. Networks trained to capture human judgments may learn unique internal representations that give insight into human gloss computations.