September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
Unsupervised Neural Networks Learn Idiosyncrasies of Human Gloss Perception
Author Affiliations & Notes
  • Katherine R Storrs
    Justus-Liebig University Giessen
  • Roland W. Fleming
    Justus-Liebig University Giessen
Journal of Vision September 2019, Vol.19, 213. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Katherine R Storrs, Roland W. Fleming; Unsupervised Neural Networks Learn Idiosyncrasies of Human Gloss Perception. Journal of Vision 2019;19(10):213.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

We suggest that characteristic errors in gloss perception may result from the way we learn vision more generally, through unsupervised objectives such as efficiently encoding and predicting proximal image data. We tested this idea using unsupervised deep learning. We rendered 10,000 images of bumpy surfaces of random albedos, bump heights, and illuminations, half with high-gloss and half with low-gloss reflectance. We trained an unsupervised six-layer convolutional PixelVAE network to generate new images based on this dataset. This network learns to predict pixel values given the previous ones, via a highly compressed 20-dimensional latent representation, and receives no information about gloss, shape or lighting during training. After training, multidimensional scaling revealed that the latent space substantially disentangles high and low gloss images. Indeed, a linear classifier could classify novel images with 97% accuracy. Samples generated by the network look like realistic surfaces, and their apparent glossiness can be modulated by moving systematically through the latent space. We generated 30 such images, and asked 11 observers to rate glossiness. Ratings correlated moderately well with gloss predictions from the model (r = 0.64). To humans, perceived gloss can be modulated by lighting and surface shape. When the model was shown sequences varying only in bump height, it frequently exhibited non-constant gloss values, most commonly, like humans, seeing flatter surfaces as less glossy. We tested whether these failures of constancy in the model matched human perception for specific stimulus sequences. Nine observers viewed triads of images from five sequences in a Maximum Likelihood Difference Scaling experiment. Gloss values within the unsupervised model better predicted human perceptual gloss scales than did pixelwise differences between images (p = 0.042). Unsupervised machine learning may hold the key to fully explicit, image-computable models of how we learn meaningful perceptual dimensions from natural data.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.