August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Unsupervised learning can predict properties of non-rigid mirror objects in ambiguous conditions
Author Affiliations
  • Omer F. Yildiran
    École normale supérieure - PSL, Paris, France
    Justus Liebig University, Giessen, Germany
  • Katherine R. Storrs
    University of Auckland, Auckland, New Zealand
  • Roland W Fleming
    Justus Liebig University, Giessen, Germany
  • Katja Doerschner
    Justus Liebig University, Giessen, Germany
Journal of Vision August 2023, Vol.23, 5026. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Omer F. Yildiran, Katherine R. Storrs, Roland W Fleming, Katja Doerschner; Unsupervised learning can predict properties of non-rigid mirror objects in ambiguous conditions. Journal of Vision 2023;23(9):5026.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Although ‘glossiness’ is an optical property of materials, while ‘softness’ is a mechanical property, there is an intriguing perceptual connection between the two as both specular reflections and shape deformations produce distinctive motion patterns. Observers are generally excellent at determining properties of moving surfaces. However, under certain circumstances, reflections and deformations can actually be confused, with rigidly transforming mirrors appearing non-rigid and deforming matte-textured objects occasionally appearing somewhat shiny. Here, we investigated whether similar broad successes and specific confusions also arise in an unsupervised recurrent neural network (PixelRNN) trained to predict video sequences. We previously found that such networks reproduce several key gloss perception phenomena, including the ‘sticky-reflection’ effect (Doerschner et al., 2011, Current Biology), wherein reflections that move with the surface (instead of sliding across it) appear like matte texture markings. We generated 8,000 20-frame movies of objects with diverse appearances and motions by varying the shape, glossiness, soft-body properties (including rigid objects), illumination environments, texture maps, and various rotational directions and speeds. After training, the PixelRNN could synthesize accurate video predictions up to 10 frames into the future. T-SNE visualization of the embedding of new stimuli in the network’s internal representation revealed a clear clustering and disentanglement of the different object types by the network, as confirmed by logistic linear classification according to their surface glossiness and deformability. Over the course of just the first three frames, classification of softness increased dramatically from chance to 99% accuracy. When participants compared the glossiness and rigidity of pairs of stimuli from the test set, we found that human judgements correlated with predictions derived from the embedding of stimuli in the network’s representation. These findings demonstrate that unsupervised predictive learning can disentangle softness and glossiness properties of objects, much like humans, without any explicit training about the distal object properties.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.